The description says:
Cause IO segments sent to a device for DMA to be merged virtually
by the IOMMU when they happen to have been allocated contiguously.
This doesn't add pressure to the IOMMU allocator. However, some
drivers don't support getting large merged segments coming back
from *_map_sg().
Most drivers don't have this problem; it is safe to say Y here.
It's out of date. Long ago, drivers didn't have a way to tell IOMMUs
about their segment length limit (that is, the maximum segment length
that they can handle). So IOMMUs merged as many segments as possible
and gave too large segments to drivers.
dma_get_max_seg_size() was introduced to solve the above
problem. Device drives can use the API to tell IOMMU about the maximum
segment length that they can handle. In addition, the default limit
(64K) should be safe for everyone.
So this config option seems to be unnecessary.
Note that this config option just enables users to disable the virtual
merging by default. Users can still disable the virtual merging by the
boot parameter.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc initializes swiotlb before parsing the kernel boot options so
swiotlb options (e.g. specifying the swiotlb buffer size) are ignored.
Any time before freeing bootmem works for swiotlb so this patch moves
powerpc's swiotlb initialization after parsing the kernel boot
options, mem_init (as x86 does).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Tested-by: Becky Bruce <beckyb@kernel.crashing.org>
Tested-by: Albert Herranz <albert_herranz@yahoo.es>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When printk() is disabled (CONFIG_PRINTK) at menu item
General setup
-> Configure standard kernel features (for small systems)
-> Enable support for printk
then there should be no printk() calls at all.
Signed-off-by: Márton Németh <nm127@freemail.hu>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits)
perf: Fix unexported generic perf_arch_fetch_caller_regs
perf record: Don't try to find buildids in a zero sized file
perf: export perf_trace_regs and perf_arch_fetch_caller_regs
perf, x86: Fix hw_perf_enable() event assignment
perf, ppc: Fix compile error due to new cpu notifiers
perf: Make the install relative to DESTDIR if specified
kprobes: Calculate the index correctly when freeing the out-of-line execution slot
perf tools: Fix sparse CPU numbering related bugs
perf_event: Fix oops triggered by cpu offline/online
perf: Drop the obsolete profile naming for trace events
perf: Take a hot regs snapshot for trace events
perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot
perf/x86-64: Use frame pointer to walk on irq and process stacks
lockdep: Move lock events under lockdep recursion protection
perf report: Print the map table just after samples for which no map was found
perf report: Add multiple event support
perf session: Change perf_session post processing functions to take histogram tree
perf session: Add storage for seperating event types in report
perf session: Change add_hist_entry to take the tree root instead of session
perf record: Add ID and to recorded event data when recording multiple events
...
This implements a powerpc version of perf_arch_fetch_caller_regs
to get correct call-graphs.
It's implemented in assembly because that way we can be sure there isn't
a stack frame for perf_arch_fetch_caller_regs. If it was in C, gcc might
or might not create a stack frame for it, which would affect the number
of levels we have to skip.
With this, we see results from perf record -e lock:lock_acquire like
this:
# Samples: 24878
#
# Overhead Command Shared Object Symbol
# ........ .............. ................. ......
#
14.99% perf [kernel.kallsyms] [k] ._raw_spin_lock
|
--- ._raw_spin_lock
|
|--25.00%-- .alloc_fd
| (nil)
| |
| |--50.00%-- .anon_inode_getfd
| | .sys_perf_event_open
| | syscall_exit
| | syscall
| | create_counter
| | __cmd_record
| | run_builtin
| | main
| | 0xfd2e704
| | 0xfd2e8c0
| | (nil)
... etc.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: anton@samba.org
Cc: linuxppc-dev@ozlabs.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100318050513.GA6575@drongo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We shouldn't be always setting 'M' in the TLB entry since its reasonable
for somethings to be mapped non-coherent. The PTE should have 'M' set
properly.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Provide generic perf_sample_data initialization
MAINTAINERS: Add Arnaldo as tools/perf/ co-maintainer
perf trace: Don't use pager if scripting
perf trace/scripting: Remove extraneous header read
perf, ARM: Modify kuser rmb() call to compile for Thumb-2
x86/stacktrace: Don't dereference bad frame pointers
perf archive: Don't try to collect files without a build-id
perf_events, x86: Fixup fixed counter constraints
perf, x86: Restrict the ANY flag
perf, x86: rename macro in ARCH_PERFMON_EVENTSEL_ENABLE
perf, x86: add some IBS macros to perf_event.h
perf, x86: make IBS macros available in perf_event.h
hw-breakpoints: Remove stub unthrottle callback
x86/hw-breakpoints: Remove the name field
perf: Remove pointless breakpoint union
perf lock: Drop the buffers multiplexing dependency
perf lock: Fix and add misc documentally things
percpu: Add __percpu sparse annotations to hw_breakpoint
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/booke: Fix breakpoint/watchpoint one-shot behavior
powerpc: Reduce printk from pseries_mach_cpu_die()
powerpc: Move checks in pseries_mach_cpu_die()
powerpc: Reset kernel stack on cpu online from cede state
powerpc: Fix G5 thermal shutdown
powerpc/pseries: Pass CPPR value to H_XIRR hcall
powerpc/booke: Fix a couple typos in the advanced ptrace code
powerpc: Fix SMP build with disabled CPU hotplugging.
powerpc: Dynamically allocate pacas
powerpc/perf: e500 support
powerpc/perf: Build callchain code regardless of hardware event support.
powerpc/cpm2: Checkpatch cleanup
powerpc/86xx: Renaming following split of GE Fanuc joint venture
powerpc/86xx: Convert gef_pic_lock to raw_spinlock
powerpc/qe: Convert qe_ic_lock to raw_spinlock
powerpc/82xx: Convert pci_pic_lock to raw_spinlock
powerpc/85xx: Convert socrates_fpga_pic_lock to raw_spinlock
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (56 commits)
doc: fix typo in comment explaining rb_tree usage
Remove fs/ntfs/ChangeLog
doc: fix console doc typo
doc: cpuset: Update the cpuset flag file
Fix of spelling in arch/sparc/kernel/leon_kernel.c no longer needed
Remove drivers/parport/ChangeLog
Remove drivers/char/ChangeLog
doc: typo - Table 1-2 should refer to "status", not "statm"
tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments
No need to patch AMD-provided drivers/gpu/drm/radeon/atombios.h
devres/irq: Fix devm_irq_match comment
Remove reference to kthread_create_on_cpu
tree-wide: Assorted spelling fixes
tree-wide: fix 'lenght' typo in comments and code
drm/kms: fix spelling in error message
doc: capitalization and other minor fixes in pnp doc
devres: typo fix s/dev/devm/
Remove redundant trailing semicolons from macros
fix typo "definetly" -> "definitely" in comment
tree-wide: s/widht/width/g typo in comments
...
Fix trivial conflict in Documentation/laptops/00-INDEX
This converts powerpc to use the generic pci_set_dma_mask and
pci_set_consistent_dma_mask (drivers/pci/pci.c).
The generic pci_set_dma_mask does what powerpc's pci_set_dma_mask does.
Unlike powerpc's pci_set_consistent_dma_mask, the gneric
pci_set_consistent_dma_mask sets only coherent_dma_mask. It doesn't work
for powerpc? pci_set_consistent_dma_mask API should set only
coherent_dma_mask?
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Greg KH <greg@kroah.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add generic implementations of the old and really old uname system calls.
Note that sh only implements sys_olduname but not sys_oldolduname, but I'm
not going to bother with another ifdef for that special case.
m32r implemented an old uname but never wired it up, so kill it, too.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On an architecture that supports 32-bit compat we need to override the
reported machine in uname with the 32-bit value. Instead of doing this
separately in every architecture introduce a COMPAT_UTS_MACHINE define in
<asm/compat.h> and apply it directly in sys_newuname().
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a generic implementation of the ipc demultiplexer syscall. Except for
s390 and sparc64 all implementations of the sys_ipc are nearly identical.
There are slight differences in the types of the parameters, where mips
and powerpc as the only 64-bit architectures with sys_ipc use unsigned
long for the "third" argument as it gets casted to a pointer later, while
it traditionally is an "int" like most other paramters. frv goes even
further and uses unsigned long for all parameters execept for "ptr" which
is a pointer type everywhere. The change from int to unsigned long for
"third" and back to "int" for the others on frv should be fine due to the
in-register calling conventions for syscalls (we already had a similar
issue with the generic sys_ptrace), but I'd prefer to have the arch
maintainers looks over this in details.
Except for that h8300, m68k and m68knommu lack an impplementation of the
semtimedop sub call which this patch adds, and various architectures have
gets used - at least on i386 it seems superflous as the compat code on
x86-64 and ia64 doesn't even bother to implement it.
[akpm@linux-foundation.org: add sys_ipc to sys_ni.c]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix:
arch/powerpc/kernel/perf_event.c:1334: error: 'power_pmu_notifier' undeclared (first use in this function)
arch/powerpc/kernel/perf_event.c:1334: error: (Each undeclared identifier is reported only once
arch/powerpc/kernel/perf_event.c:1334: error: for each function it appears in.)
arch/powerpc/kernel/perf_event.c:1334: error: implicit declaration of function 'power_pmu_notifier'
arch/powerpc/kernel/perf_event.c:1334: error: implicit declaration of function 'register_cpu_notifier'
Due to commit 3f6da390 (perf: Rework and fix the arch CPU-hotplug hooks).
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the hw_perf_event_*() hotplug hooks in favour of per PMU hotplug
notifiers. This has the advantage of reducing the static weak interface
as well as exposing all hotplug actions to the PMU.
Use this to fix x86 hotplug usage where we did things in ONLINE which
should have been done in UP_PREPARE or STARTING.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
LKML-Reference: <20100305154128.736225361@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This makes it easier to extend perf_sample_data and fixes a bug on arm
and sparc, which failed to set ->raw to NULL, which can cause crashes
when combined with PERF_SAMPLE_RAW.
It also optimizes PowerPC and tracepoint, because the struct
initialization is forced to zero out the whole structure.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Jean Pihet <jpihet@mvista.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Jamie Iles <jamie.iles@picochip.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: stable@kernel.org
LKML-Reference: <20100304140100.315416040@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Another fix for the extended ptrace patches in the -next tree.
The handling of breakpoints and watchpoints is inconsistent. When a
breakpoint or watchpoint is hit, the interrupt handler is clearing the
proper bits in the dbcr* registers, but leaving the dac* and iac* registers
alone. The ptrace code to delete the break/watchpoints checks the dac* and
iac* registers for zero to determine if they are enabled. Instead, they
should check the dbcr* bits.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cpu hotplug (offline) without dlpar operation will place cpu
in cede state and the extended_cede_processor() function will
return when resumed.
Kernel stack pointer needs to be reset before
start_secondary() is called to continue the online operation.
Added new function start_secondary_resume() to do the above
steps.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/booke: Fix a couple typos in the advanced ptrace code
Found and fixed a couple typos in the advanced ptrace patches.
(These patches are currently in benh's next tree.)
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev list <Linuxppc-dev@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On 64-bit kernels we currently have a 512 byte struct paca_struct for
each cpu (usually just called "the paca"). Currently they are statically
allocated, which means a kernel built for a large number of cpus will
waste a lot of space if it's booted on a machine with few cpus.
We can avoid that by only allocating the number of pacas we need at
boot. However this is complicated by the fact that we need to access
the paca before we know how many cpus there are in the system.
The solution is to dynamically allocate enough space for NR_CPUS pacas,
but then later in boot when we know how many cpus we have, we free any
unused pacas.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Constify struct sysfs_ops.
This is part of the ops structure constification
effort started by Arjan van de Ven et al.
Benefits of this constification:
* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime
* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime
* potentially better optimized code as the compiler
can assume that the const data cannot be changed
* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing
Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This implements perf_event support for the Freescale embedded performance
monitor, based on the existing perf_event.c that supports server/classic
chips.
Some limitations:
- Performance monitor interrupts are regular EE interrupts, and thus you
can't profile places with interrupts disabled. We may want to implement
soft IRQ-disabling, with perfmon interrupts exempted and treated as NMIs.
- When trying to schedule multiple event groups at once, and using
restricted events, situations could arise where scheduling fails even
though it would be possible. Consider three groups, each with two events.
One group has restricted events, the others don't. The two non-restricted
groups are scheduled, then one is removed, which happens to occupy the two
counters that can't do restricted events. The remaining non-restricted
group will not be moved to the non-restricted-capable counters to make
room if the restricted group tries to be scheduled.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
It's also useful for software events, as well as future support for
other types of hardware counters.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
SRR1 stores more information that just the MSR value. It also stores
valuable information about the type of interrupt we received, for
example whether the storage interrupt we just got was because of a
missing htab entry or not.
We use that information to speed up the exit path.
Now if we get preempted before we can interpret the shadow_msr values,
we get into vcpu_put which then calls the MSR handler, which then sets
all the SRR1 information bits in shadow_msr to 0. Great.
So let's preserve the SRR1 specific bits in shadow_msr whenever we set
the MSR. They don't hurt.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We need to explicitly only giveup VSX in KVM, so let's export that
specific function to module space.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently we're racy when doing the transition from IR=1 to IR=0, from
the module memory entry code to the real mode SLB switching code.
To work around that I took a look at the RTAS entry code which is faced
with a similar problem and did the same thing:
A small helper in linear mapped memory that does mtmsr with IR=0 and
then RFIs info the actual handler.
Thanks to that trick we can safely take page faults in the entry code
and only need to be really wary of what to do as of the SLB switching
part.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We're being horribly racy right now. All the entry and exit code hijacks
random fields from the PACA that could easily be used by different code in
case we get interrupted, for example by a #MC or even page fault.
After discussing this with Ben, we figured it's best to reserve some more
space in the PACA and just shove off some vcpu state to there.
That way we can drastically improve the readability of the code, make it
less racy and less complex.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (88 commits)
powerpc: Fix lwsync feature fixup vs. modules on 64-bit
powerpc: Convert pmc_owner_lock to raw_spinlock
powerpc: Convert die.lock to raw_spinlock
powerpc: Convert tlbivax_lock to raw_spinlock
powerpc: Convert mpic locks to raw_spinlock
powerpc: Convert pmac_pic_lock to raw_spinlock
powerpc: Convert big_irq_lock to raw_spinlock
powerpc: Convert feature_lock to raw_spinlock
powerpc: Convert i8259_lock to raw_spinlock
powerpc: Convert beat_htab_lock to raw_spinlock
powerpc: Convert confirm_error_lock to raw_spinlock
powerpc: Convert ipic_lock to raw_spinlock
powerpc: Convert native_tlbie_lock to raw_spinlock
powerpc: Convert beatic_irq_mask_lock to raw_spinlock
powerpc: Convert nv_lock to raw_spinlock
powerpc: Convert context_lock to raw_spinlock
powerpc/85xx: Add NOR, LEDs and PIB support for MPC8568E-MDS boards
powerpc/86xx: Enable VME driver on the GE SBC610
powerpc/86xx: Enable VME driver on the GE PPC9A
powerpc/86xx: Add MSI section to GE PPC9A DTS
...
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (48 commits)
x86/PCI: Prevent mmconfig memory corruption
ACPI: Use GPE reference counting to support shared GPEs
x86/PCI: use host bridge _CRS info by default on 2008 and newer machines
PCI: augment bus resource table with a list
PCI: add pci_bus_for_each_resource(), remove direct bus->resource[] refs
PCI: read bridge windows before filling in subtractive decode resources
PCI: split up pci_read_bridge_bases()
PCIe PME: use pci_pcie_cap()
PCI PM: Run-time callbacks for PCI bus type
PCIe PME: use pci_is_pcie()
PCI / ACPI / PM: Platform support for PCI PME wake-up
ACPI / ACPICA: Multiple system notify handlers per device
ACPI / PM: Add more run-time wake-up fields
ACPI: Use GPE reference counting to support shared GPEs
PCI PM: Make it possible to force using INTx for PCIe PME signaling
PCI PM: PCIe PME root port service driver
PCI PM: Add function for checking PME status of devices
PCI: mark is_pcie obsolete
PCI: set PCI_PREF_RANGE_TYPE_64 in pci_bridge_check_ranges
PCI: pciehp: second try to get big range for pcie devices
...
Since the cpu argument to hw_perf_group_sched_in() is always
smp_processor_id(), simplify the code a little by removing this argument
and using the current cpu where needed.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1265890918.5396.3.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6: (41 commits)
of: remove undefined request_OF_resource & release_OF_resource
of/sparc: Remove sparc-local declaration of allnodes and devtree_lock
of: move definition of of_chosen into common code.
of: remove unused extern reference to devtree_lock
of: put default string compare and #a/s-cell values into common header
of/flattree: Don't assume HAVE_LMB
of: protect linux/of.h with CONFIG_OF
proc_devtree: fix THIS_MODULE without module.h
of: Remove old and misplaced function declarations
of/flattree: Make the kernel accept ePAPR style phandle information
of/flattree: endian-convert members of boot_param_header
of: assume big-endian properties, adding conversions where necessary
of: use __be32 for cell value accessors
of/flattree: use OF_ROOT_NODE_{SIZE,ADDR}_CELLS DEFAULT for fdt parsing
of/flattree: use callback to setup initrd from /chosen
proc_devtree: include linux/of.h
of: make set_node_proc_entry private to proc_devtree.c
of: include linux/proc_fs.h
of/flattree: merge early_init_dt_scan_memory() common code
of: add 'of_' prefix to machine_is_compatible()
...
No functional change; this converts loops that iterate from 0 to
PCI_BUS_NUM_RESOURCES through pci_bus resource[] table to use the
pci_bus_for_each_resource() iterator instead.
This doesn't change the way resources are stored; it merely removes
dependencies on the fact that they're in a table.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Now that we return the new resource start position, there is no
need to update "struct resource" inside the align function.
Therefore, mark the struct resource as const.
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
As suggested by Linus, align functions should return the start
of a resource, not void. An update of "res->start" is no longer
necessary.
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pmc_owner_lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
die.lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
big_irq_lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
24 is offset between the opcode past bl and past rfi. This makes it more
obvious.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
powerpc/booke: Add support for advanced debug registers
From: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Based on patches originally written by Torez Smith.
This patch defines context switch and trap related functionality
for BookE specific Debug Registers. It adds support to ptrace()
for setting and getting BookE related Debug Registers
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Torez Smith <lnxtorez@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <dwg@au1.ibm.com>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Sergio Durigan Junior <sergiodj@br.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@br.ibm.com>
Cc: linuxppc-dev list <Linuxppc-dev@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Extended ptrace interface
From: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Based on patches originally written by Torez Smith.
Add a new extended ptrace interface so that user-space has a single
interface for powerpc, without having to know the specific layout
of the debug registers.
Implement:
PPC_PTRACE_GETHWDEBUGINFO
PPC_PTRACE_SETHWDEBUG
PPC_PTRACE_DELHWDEBUG
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Acked-by: David Gibson <dwg@au1.ibm.com>
Cc: Torez Smith <lnxtorez@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Sergio Durigan Junior <sergiodj@br.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@br.ibm.com>
Cc: linuxppc-dev list <Linuxppc-dev@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/booke: Introduce new CONFIG options for advanced debug registers
From: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Introduce new config options to simplify the ifdefs pertaining to the
advanced debug registers for booke and 40x processors:
CONFIG_PPC_ADV_DEBUG_REGS - boolean: true for dac-based processors
CONFIG_PPC_ADV_DEBUG_IACS - number of IAC registers
CONFIG_PPC_ADV_DEBUG_DACS - number of DAC registers
CONFIG_PPC_ADV_DEBUG_DVCS - number of DVC registers
CONFIG_PPC_ADV_DEBUG_DAC_RANGE - DAC ranges supported
Beginning conservatively, since I only have the facilities to test 440
hardware. I believe all 40x and booke platforms support at least 2 IAC
and 2 DAC registers. For 440, 4 IAC and 2 DVC registers are enabled, as
well as the DAC ranges.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Acked-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I often get asked if BAD interrupts are really bad. On some boxes (eg
IBM machines running a hypervisor) there are valid cases where are
presented with an interrupt that is not for us. These cases are common
enough to show up as thousands of BAD interrupts a day.
Tone them down by calling them spurious. Since they can be a significant cause
of OS jitter, we may as well log them per cpu so we know where they are
occurring.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
With NO_HZ it is useful to know how often the decrementer is going off. The
patch below adds an entry for it and also adds it into the /proc/stat
summaries.
While here, I added performance monitoring and machine check exceptions.
I found it useful to keep an eye on the PMU exception rate
when using the perf tool. Since it's possible to take a completely
handled machine check on a System p box it also sounds like a good idea to
keep a machine check summary.
The event naming matches x86 to keep gratuitous differences to a minimum.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On a large machine I noticed the columns of /proc/interrupts failed to line up
with the header after CPU9. At sufficiently large numbers of CPUs it becomes
impossible to line up the CPU number with the counts.
While fixing this I noticed x86 has a number of updates that we may as well
pull in. On PowerPC we currently omit an interrupt completely if there is no
active handler, whereas on x86 it is printed if there is a non zero count.
The x86 code also spaces the first column correctly based on nr_irqs.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
PowerPC is currently using asm-generic/hardirq.h which statically allocates an
NR_CPUS irq_stat array. Switch to an arch specific implementation which uses
per cpu data:
On a kernel with NR_CPUS=1024, this saves quite a lot of memory:
text data bss dec hex filename
8767938 2944132 1636796 13348866 cbb002 vmlinux.baseline
8767779 2944260 1505724 13217763 c9afe3 vmlinux.irq_cpustat
A saving of around 128kB.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Rather than defining of_chosen in each arch, it can be defined for all
in driver/of/base.c
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michal Simek <monstr@monstr.eu>
Neither the powerpc nor the microblaze code use devtree_lock anymore.
Remove the extern reference.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michal Simek <monstr@monstr.eu>
We don't always have lmb available, so make arches provide an
early_init_dt_alloc_memory_arch() to handle the allocation of
memory in the fdt code.
When we don't have lmb.h included, we need asm/page.h for __va.
Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michal Simek <monstr@monstr.eu>
The boot_param_header has big-endian fields, so change the types to
__be32, and perform endian conversion when we access them.
Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
At present, the fdt code sets the kernel-wide initrd_start and
initrd_end variables when parsing /chosen. On ARM, we only set these
once the bootmem has been reserved.
This change adds an arch hook to setup the initrd from the device
tree:
void early_init_dt_setup_initrd_arch(unsigned long start,
unsigned long end);
The arch-specific code can then setup the initrd however it likes.
Compiled on powerpc, with CONFIG_BLK_DEV_INITRD=y and =n.
Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Merge common code between PowerPC and Microblaze architectures.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Michal Simek <monstr@monstr.eu>
machine is compatible is an OF-specific call. It should have
the of_ prefix to protect the global namespace.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Michal Simek <monstr@monstr.eu>
Merge common function between powerpc, sparc and microblaze. Code is
identical for powerpc and microblaze, but adds a lock (and release) of
the devtree_lock on sparc.
Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Merge common code between PowerPC and Microblaze
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The clockevent multiplier and shift is useful information, but we
only need to print it once.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
RTAS should never cause an exception but if it does (for example accessing
outside our RMO) then we might go a long way through the kernel before
oopsing. If we unset MSR_RI we should at least stop things on exception
exit.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Frans Pop <elendil@planet.nl>
Cc: linuxppc-dev@ozlabs.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We use firmware_has_feature quite a lot these days, so it's worth putting
powerpc_firmware_features into __read_mostly.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add printout of last accessed sysfs file, added to x86 in
ae87221d3c (sysfs: crash debugging)
Also add the notify_die hook that allows us to print out the ftrace
buffer on oops. This is useful in conjunction with ftrace function_graph:
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=128 NUMA pSeries
last sysfs file: /sys/class/net/tunl0/type
Dumping ftrace buffer:
...
0) | .sysrq_handle_crash() {
0) 0.476 us | .hash_page();
0) 0.488 us | .xmon_fault_handler();
0) | .bad_page_fault() {
0) | .search_exception_tables() {
0) 0.590 us | .search_module_extables();
0) 2.546 us | }
0) | .printk() {
0) | .vprintk() {
0) 0.488 us | ._raw_spin_lock();
0) 0.572 us | .emit_log_char();
Showing the function graph of a sysrq-c crash.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
String constants that are continued on subsequent lines with \
are not good.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Replace platfrom -> platform.
This is a frequent spelling bug.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Updated variant of a patch by Joel Schopp.
The field containing the number of supported cores which we pass to
firmware via the ibm,client-architecture call was set by a previous
patch statically as high as is possible (NR_CPUS).
However, that value isn't quite right for a system that supports
multiple threads per core, thus permitting the firmware to assign
more cores to a Linux partition than it can really cope with.
This patch improves it by using the device-tree to determine the
number of threads supported by the processors in order to adjust
the value passed to firmware.
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds 2 fields to the ibm_architecture_vec array.
The first of these fields indicates the number of cores which Linux can
boot. It does not account for SMT, so it may result in cpus assigned to
Linux which cannot be booted. A second patch follows that dynamically
updates this for SMT.
The second field just indicates that our OS is Linux, and not another
OS. The system may or may not use this hint to performance tune
settings for Linux.
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Using perf to trace L1 dcache misses and dumping data addresses I found a few
variables taking a lot of misses. Since they are almost never written, they
should go into the __read_mostly section.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The cputime code has a few places that do per_cpu(, smp_processor_id()).
Replace them with __get_cpu_var().
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Here are the powerpc bits to remove TIF_ABI_PENDING now that
set_personality() is called at the appropriate place in exec.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add missing call to pci_fixup_device(pci_fixup_early, ...) when
building the pci_dev from scratch off the Open Firmware device-tree
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add missing hookup to existing pci_slot when building the pci_dev from
scratch off the Open Firmware device-tree
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We are missing these when building the pci_dev from scratch off
the Open Firmware device-tree
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
In struct device_node, the phandle is named 'linux_phandle' for PowerPC
and MicroBlaze, and 'node' for SPARC. There is no good reason for the
difference, it is just an artifact of the code diverging over a couple
of years. This patch renames both to simply .phandle.
Note: the .node also existed in PowerPC/MicroBlaze, but the only user
seems to be arch/powerpc/platforms/powermac/pfunc_core.c. It doesn't
look like the assignment between .linux_phandle and .node is
significantly different enough to warrant the separate code paths
unless ibm,phandle properties actually appear in Apple device trees.
I think it is safe to eliminate the old .node property and use
phandle everywhere.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge common code between PowerPC and MicroBlaze
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge common code between PowerPC and Microblaze
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When running perf across all cpus with backtracing (-a -g), sometimes we
get samples without associated backtraces:
23.44% init [kernel] [k] restore
11.46% init eeba0c [k] 0x00000000eeba0c
6.77% swapper [kernel] [k] .perf_ctx_adjust_freq
5.73% init [kernel] [k] .__trace_hcall_entry
4.69% perf libc-2.9.so [.] 0x0000000006bb8c
|
|--11.11%-- 0xfffa941bbbc
It turns out the backtrace code has a check for the idle task and the IP
sampling does not. This creates problems when profiling an interrupt
heavy workload (in my case 10Gbit ethernet) since we get no backtraces
for interrupts received while idle (ie most of the workload).
Right now x86 and sh check that current is not NULL, which should never
happen so remove that too.
Idle task's exclusion must be performed from the core code, on top
of perf_event_attr:exclude_idle.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
LKML-Reference: <20100118054707.GT12666@kryten>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Move the defintion and lock helper routines for the cpu hotplug driver
lock from pseries to powerpc code to avoid build breaks for platforms
other than pseries that use cpu hotplug.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It looks like the previous patch sent out to move RTAS and
other items from /proc/ppc64 to /proc/powerpc missed a few
files needed for RAS and DLPAR functionality.
Original Patch here:
http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/076096.html
This patch updates the remaining files to be created under /proc/powerpc.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The newly added fixup for buggy dcbX insn's has
a bug that always trigger a kernel TLB walk so a user space
dcbX insn will cause a Kernel Machine Check if it hits DTLB error.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We noticed that recent kernels didn't boot on our 1GHz Canyonlands 460EX
boards anymore. As it seems, patch 8d165db1 [powerpc: Improve
decrementer accuracy] introduced this problem. The routine div_sc()
overflows with shift = 32 resulting in this incorrect setup:
time_init: decrementer frequency = 1000.000012 MHz
time_init: processor frequency = 1000.000012 MHz
clocksource: timebase mult[400000] shift[22] registered
clockevent: decrementer mult[33] shift[32] cpu[0]
This patch now introduces a local div_dc64() version of this function
so that this overflow doesn't happen anymore.
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Detlev Zundel <dzu@denx.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It seems there is a thinko in the TLB invalidation code that makes the
tlbie in the loop executed just once. The intended check was probably
'gt', not 'lt'.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Various kernel asm modifies SRR0/SRR1 just before executing
a rfi. If such code crosses a page boundary you risk a TLB miss
which will clobber SRR0/SRR1. Avoid this by always pinning
kernel instruction TLB space.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI/cardbus: Add a fixup hook and fix powerpc
PCI: change PCI nomenclature in drivers/pci/ (non-comment changes)
PCI: change PCI nomenclature in drivers/pci/ (comment changes)
PCI: fix section mismatch on update_res()
PCI: add Intel 82599 Virtual Function specific reset method
PCI: add Intel USB specific reset method
PCI: support device-specific reset methods
PCI: Handle case when no pci device can provide cache line size hint
PCI/PM: Propagate wake-up enable for PCIe devices too
vgaarbiter: fix a typo in the vgaarbiter Documentation
This patch fixes the handling of VSX alignment faults in little-endian
mode (the current code assumes the processor is in big-endian mode).
The patch also makes the handlers clear the top 8 bytes of the register
when handling an 8 byte VSX load.
This is based on 2.6.32.
Signed-off-by: Neil Campbell <neilc@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The cardbus code creates PCI devices without ever going through the
necessary fixup bits and pieces that normal PCI devices go through.
There's in fact a commented out call to pcibios_fixup_bus() in there,
it's commented because ... it doesn't work.
I could make pcibios_fixup_bus() do the right thing on powerpc easily
but I felt it cleaner instead to provide a specific hook pci_fixup_cardbus
for which a weak empty implementation is provided by the PCI core.
This fixes cardbus on powerbooks and probably all other PowerPC
platforms which was broken completely for ever on some platforms and
since 2.6.31 on others such as PowerBooks when we made the DMA ops
mandatory (since those are setup by the fixups).
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'next' of git://git.secretlab.ca/git/linux-2.6: (23 commits)
powerpc: fix up for mmu_mapin_ram api change
powerpc: wii: allow ioremap within the memory hole
powerpc: allow ioremap within reserved memory regions
wii: use both mem1 and mem2 as ram
wii: bootwrapper: add fixup to calc useable mem2
powerpc: gamecube/wii: early debugging using usbgecko
powerpc: reserve fixmap entries for early debug
powerpc: wii: default config
powerpc: wii: platform support
powerpc: wii: hollywood interrupt controller support
powerpc: broadway processor support
powerpc: wii: bootwrapper bits
powerpc: wii: device tree
powerpc: gamecube: default config
powerpc: gamecube: platform support
powerpc: gamecube/wii: flipper interrupt controller support
powerpc: gamecube/wii: udbg support for usbgecko
powerpc: gamecube/wii: do not include PCI support
powerpc: gamecube/wii: declare as non-coherent platforms
powerpc: gamecube/wii: introduce GAMECUBE_COMMON
...
Fix up conflicts in arch/powerpc/mm/fsl_booke_mmu.c.
Hopefully even close to correctly.
* 'module' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
modpost: fix segfault with short symbol names
module: handle ppc64 relocating kcrctabs when CONFIG_RELOCATABLE=y
Kbuild: clear marker out of modpost
module: make MODULE_SYMBOL_PREFIX into a CONFIG option
ARM: unexport symbols used to implement floating point emulation
ARM: use unified discard definition in linker script
x86: don't export inline function
sparc64: don't export static inline pci_ functions
Use bitmap library and kill some unused iommu helper functions.
1. s/iommu_area_free/bitmap_clear/
2. s/iommu_area_reserve/bitmap_set/
3. Use bitmap_find_next_zero_area instead of find_next_zero_area
This cannot be simple substitution because find_next_zero_area
doesn't check the last bit of the limit in bitmap
4. Remove iommu_area_free, iommu_area_reserve, and find_next_zero_area
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
powerpc applies relocations to the kcrctab. They're absolute symbols,
but it's not completely unreasonable: other archs may too, but the
relocation is often 0.
http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-November/077972.html
Inspired-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Name space cleanup. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
Further name space cleanup. No functional change
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.
Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
m68k: rename global variable vmalloc_end to m68k_vmalloc_end
percpu: add missing per_cpu_ptr_to_phys() definition for UP
percpu: Fix kdump failure if booted with percpu_alloc=page
percpu: make misc percpu symbols unique
percpu: make percpu symbols in ia64 unique
percpu: make percpu symbols in powerpc unique
percpu: make percpu symbols in x86 unique
percpu: make percpu symbols in xen unique
percpu: make percpu symbols in cpufreq unique
percpu: make percpu symbols in oprofile unique
percpu: make percpu symbols in tracer unique
percpu: make percpu symbols under kernel/ and mm/ unique
percpu: remove some sparse warnings
percpu: make alloc_percpu() handle array types
vmalloc: fix use of non-existent percpu variable in put_cpu_var()
this_cpu: Use this_cpu_xx in trace_functions_graph.c
this_cpu: Use this_cpu_xx for ftrace
this_cpu: Use this_cpu_xx in nmi handling
this_cpu: Use this_cpu operations in RCU
this_cpu: Use this_cpu ops for VM statistics
...
Fix up trivial (famous last words) global per-cpu naming conflicts in
arch/x86/kvm/svm.c
mm/slab.c
Add support for using the USB Gecko adapter as an early debugging
console on the Nintendo GameCube and Wii video game consoles.
The USB Gecko is a 3rd party memory card interface adapter that provides
a EXI (External Interface) to USB serial converter.
Signed-off-by: Albert Herranz <albert_herranz@yahoo.es>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This patch extends the cputable entry of the 750CL to also match
the 750CL-based "Broadway" cpu found on the Nintendo Wii.
As of this patch, the following "Broadway" design revision levels have
been seen in the wild:
- DD1.2 (87102)
- DD2.0 (87200)
Signed-off-by: Albert Herranz <albert_herranz@yahoo.es>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (151 commits)
powerpc: Fix usage of 64-bit instruction in 32-bit altivec code
MAINTAINERS: Add PowerPC patterns
powerpc/pseries: Track previous CPPR values to correctly EOI interrupts
powerpc/pseries: Correct pseries/dlpar.c build break without CONFIG_SMP
powerpc: Make "intspec" pointers in irq_host->xlate() const
powerpc/8xx: DTLB Miss cleanup
powerpc/8xx: Remove DIRTY pte handling in DTLB Error.
powerpc/8xx: Start using dcbX instructions in various copy routines
powerpc/8xx: Restore _PAGE_WRITETHRU
powerpc/8xx: Add missing Guarded setting in DTLB Error.
powerpc/8xx: Fixup DAR from buggy dcbX instructions.
powerpc/8xx: Tag DAR with 0x00f0 to catch buggy instructions.
powerpc/8xx: Update TLB asm so it behaves as linux mm expects.
powerpc/8xx: Invalidate non present TLBs
powerpc/pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
pseries/pseries: Add code to online/offline CPUs of a DLPAR node
powerpc: stop_this_cpu: remove the cpu from the online map.
powerpc/pseries: Add kernel based CPU DLPAR handling
sysfs/cpu: Add probe/release files
powerpc/pseries: Kernel DLPAR Infrastructure
...
New helper - sys_mmap_pgoff(); switch syscalls to using it.
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Merge common code between PowerPC and Microblaze. This patch
splits the arch-specific stuff out into a new function,
early_init_dt_scan_chosen_arch().
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A cell is firmly established as a u32. No need to do an ugly typedef
to redefine it to cell_t. Eliminate the unnecessary typedef so that
it doesn't have to be added to the of_fdt header file
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge common code between PowerPC and Microblaze
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
Merge common code between PowerPC and Microblaze
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
Merge common code between PowerPC and Microblaze
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
e821ea70f3 introduced a bug by copying
some 64-bit originated code as-is to be used by both 32 and 64-bit
but this code contains a 64-bit ony "cmpdi" instruction.
This changes it to cmpwi, which is fine since VRSAVE can only contains
a 32-bit value anyway.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@kernel.org>
Writing a driver using SCLPC on the MPC5200B I detected, that the
intspec arrays to map irqs to Linux virq cannot be const, because the
mapping and xlate functions only take non const pointers. All those
functions do not modify the intspec, so a const pointer could be used.
Signed-off-by: Roman Fietze <roman.fietze@telemotive.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use symbolic constant for PRESENT and avoid branching.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There is no need to do set the DIRTY bit directly in DTLB Error.
Trap to do_page_fault() and let the generic MM code do the work.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that 8xx can fixup dcbX instructions, start using them
where possible like every other PowerPc arch do.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
8xx has not had WRITETHRU due to lack of bits in the pte.
After the recent rewrite of the 8xx TLB code, there are
two bits left. Use one of them to WRITETHRU.
Perhaps use the last SW bit to PAGE_SPECIAL or PAGE_FILE?
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
only DTLB Miss did set this bit, DTLB Error needs too otherwise
the setting is lost when the page becomes dirty.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is an assembler version to fixup DAR not being set
by dcbX, icbi instructions. There are two versions, one
uses selfmodifing code, the other uses a
jump table but is much bigger(default).
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
dcbz, dcbf, dcbi, dcbst and icbi do not set DAR when they
cause a DTLB Error. Dectect this by tagging DAR with 0x00f0
at every exception exit that modifies DAR.
Test for DAR=0x00f0 in DataTLBError and bail
to handle_page_fault().
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Update the TLB asm to make proper use of _PAGE_DIRY and _PAGE_ACCESSED.
Get rid of _PAGE_HWWRITE too.
Pros:
- I/D TLB Miss never needs to write to the linux pte.
- _PAGE_ACCESSED is only set on TLB Error fixing accounting
- _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
when a page has been made dirty.
- Proper RO/RW mapping of user space.
- Free up 2 SW TLB bits in the linux pte(add back _PAGE_WRITETHRU ?)
- kernel RO/user NA support.
Cons:
- A few more instructions in the TLB Miss routines.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Remove the CPU from the online map to prevent smp_call_function
from sending messages to a stopped CPU.
Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Version 3 of this patch is updated with documentation added to
Documentation/ABI. There are no changes to any of the C code from v2
of the patch.
In order to support kernel DLPAR of CPU resources we need to provide an
interface to add (probe) and remove (release) the resource from the system.
This patch Creates new generic probe and release sysfs files to facilitate
cpu probe/release. The probe/release interface provides for allowing each
arch to supply their own routines for implementing the backend of adding
and removing cpus to/from the system.
This also creates the powerpc specific stubs to handle the arch callouts
from writes to the sysfs files.
The creation and use of these files is regulated by the
CONFIG_ARCH_CPU_PROBE_RELEASE option so that only architectures that need the
capability will have the files created.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers, init: Limit the number of per cpu calibration bootup messages
posix-cpu-timers: optimize and document timer_create callback
clockevents: Add missing include to pacify sparse
x86: vmiclock: Fix printk format
x86: Fix printk format due to variable type change
sparc: fix printk for change of variable type
clocksource/events: Fix fallout of generic code changes
nohz: Allow 32-bit machines to sleep for more than 2.15 seconds
nohz: Track last do_timer() cpu
nohz: Prevent clocksource wrapping during idle
nohz: Type cast printk argument
mips: Use generic mult/shift factor calculation for clocks
clocksource: Provide a generic mult/shift factor calculation
clockevents: Use u32 for mult and shift factors
nohz: Introduce arch_needs_cpu
nohz: Reuse ktime in sub-functions of tick_check_idle.
time: Remove xtime_cache
time: Implement logarithmic time accumulation
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
security/tomoyo: Remove now unnecessary handling of security_sysctl.
security/tomoyo: Add a special case to handle accesses through the internal proc mount.
sysctl: Drop & in front of every proc_handler.
sysctl: Remove CTL_NONE and CTL_UNNUMBERED
sysctl: kill dead ctl_handler definitions.
sysctl: Remove the last of the generic binary sysctl support
sysctl net: Remove unused binary sysctl code
sysctl security/tomoyo: Don't look at ctl_name
sysctl arm: Remove binary sysctl support
sysctl x86: Remove dead binary sysctl support
sysctl sh: Remove dead binary sysctl support
sysctl powerpc: Remove dead binary sysctl support
sysctl ia64: Remove dead binary sysctl support
sysctl s390: Remove dead sysctl binary support
sysctl frv: Remove dead binary sysctl support
sysctl mips/lasat: Remove dead binary sysctl support
sysctl drivers: Remove dead binary sysctl support
sysctl crypto: Remove dead binary sysctl support
sysctl security/keys: Remove dead binary sysctl support
sysctl kernel: Remove binary sysctl logic
...
We can kill unused swiotlb variable.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since they are static inline functions.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The code under "if (is_global_init())" is bogus, and is_global_init()
itself is not right in mt case.
Contrary to what the comment says, nowadays force_sig_info() does kill
init even if the handler is SIG_DFL. Note that force_sig_info() clears
SIGNAL_UNKILLABLE exactly for this case.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The typename member of struct irq_chip was kept for migration purposes
and is obsolete since more than 2 years. Fix up the leftovers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@ozlabs.org
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The ioctl is only used for powermac systems and reads a partition
number from an array which is initialized at boot time way before the
nvram code is initialized. So it's safe to switch to unlocked_ioctl.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge common code between PowerPC and MicroBlaze
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: Michal Simek <monstr@monstr.eu>
Merge common code between PowerPC and MicroBlaze
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: Michal Simek <monstr@monstr.eu>
Merge common code between PowerPC and MicroBlaze
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: Michal Simek <monstr@monstr.eu>
Merge common code between PowerPC and Microblaze
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: Michal Simek <monstr@monstr.eu>
Merge common code between PowerPC and Microblaze
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: Michal Simek <monstr@monstr.eu>
Merge common code between PowerPC and MicroBlaze
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: Michal Simek <monstr@monstr.eu>
Merge common code between PowerPC and Microblaze
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: Michal Simek <monstr@monstr.eu>
Merge common code between Microblaze and PowerPC.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: Michal Simek <monstr@monstr.eu>
Re-write the code so its more standalone and fixed some issues:
* Bump'd # of CAM entries to 64 to support e500mc
* Make the code handle MAS7 properly
* Use pr_cont instead of creating a string as we go
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
For consistency drop & in front of every proc_handler. Explicity
taking the address is unnecessary and it prevents optimizations
like stubbing the proc_handlers to NULL.
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier
to struct timekeeper" the clock multiplier of vsyscall is updated with
the unmodified clock multiplier of the clock source and not with the
NTP adjusted multiplier of the timekeeper.
This causes user space observerable time warps:
new CLOCK-warp maximum: 120 nsecs, 00000025c337c537 -> 00000025c337c4bf
Add a new argument "mult" to update_vsyscall() and hand in the
timekeeping internal NTP adjusted multiplier.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tony Luck <tony.luck@intel.com>
LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Resolve the conflict between v2.6.32-rc7 where dn_def_dev_handler
gets a small bug fix and the sysctl tree where I am removing all
sysctl strategy routines.
powerpc grew a new warning due to the type change of clockevent->mult.
The architectures which use parts of the generic time keeping
infrastructure tripped over my wrong assumption that
clocksource_register is only used when GENERIC_TIME=y.
I should have looked and also I should have known better. These
renitent Gaul villages are racking my nerves. Some serious deprecating
is due.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Now that sys_sysctl is a generic wrapper around /proc/sys .ctl_name
and .strategy members of sysctl tables are dead code. Remove them.
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
This enables us to avoid printing swiotlb memory info when we
initialize swiotlb. After swiotlb initialization, we could find
that we don't need swiotlb.
This patch removes the code to print swiotlb memory info in
swiotlb_init() and exports the function to do that.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
Cc: tony.luck@intel.com
Cc: benh@kernel.crashing.org
LKML-Reference: <1257849980-22640-9-git-send-email-fujita.tomonori@lab.ntt.co.jp>
[ -v2: merge up conflict ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
something-bility is spelled as something-blity
so a grep for 'blit' would find these lines
this is so trivial that I didn't split it by subsystem / copy
additional maintainers - all changes are to comments
The only purpose is to get fewer false positives when grepping
around the kernel sources.
Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This uses compat_alloc_userspace to remove the various
hacks to allow do_sysctl to write to throuh oldlenp.
The rest of our mature compat syscall helper facitilies
are used as well to ensure we have a nice clean maintainable
compat syscall that can be used on all architectures.
The motiviation for a generic compat sysctl (besides the
obvious hack removal) is to reduce the number of compat
sysctl defintions out there so I can refactor the
binary sysctl implementation.
ppc already used the name compat_sys_sysctl so I remove the
ppcs version here.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Doing so causes xtime to be negative which crashes the timekeeping
code in funny ways when doing suspend/resume
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In order to access fields in the PACA from assembly code, we need
to generate offsets using asm-offsets.c.
So let's add the new PACA related bits, we just introduced!
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We want to be able to build KVM as a module. To enable us doing so, we
need some more exports from core Linux parts.
This patch exports all functions and variables that are required for KVM.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to access some VCPU fields from assembly code. In order to get
the proper offsets, we have to define them in asm-offsets.c.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to run some KVM trampoline code in real mode. Unfortunately, real mode
only covers 8MB on Cell so we need to squeeze ourselves as low as possible.
Also, we need to trap interrupts to get us back from guest state to host state
without telling Linux about it.
This patch adds interrupt traps and includes the KVM code that requires real
mode in the real mode parts of Linux.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This trivial patch changes memcpy_(to|from)io as to transfer as many
32-bit words as possible in 32-bit accesses (in the current solution,
the last 32-bit word was transferred as 4 byte accesses).
Signed-off-by: Albrecht Dreß <albrecht.dress@arcor.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Defining CONFIG_SPARSE_IRQ enables generic code that gets rid of the
static irq_desc array, and replaces it with an array of pointers to
irq_descs.
It also allows node local allocation of irq_descs, however we
currently don't have the information available to do that, so we just
allocate them on all on node 0.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move the default case out of the if, ie. when we're just displaying
an irq. And consolidate all the odd cases at the top, ie. printing
the header and footer.
And in the process cope with sparse irq_descs.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mark all functions which are only called from nvram_init() __init.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
nvram_clear_error_log() calls ppc_md.nvram_write() even when
nvram_error_log_index is -1 (invalid). The nvram_write() function does
not check for a negative offset.
Check nvram_error_log_index as the other nvram log functions do.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
nvram_find_partition() has no user. The call site was removed in the
arch/powerpc move, but the function stayed. Remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently each available hugepage size uses a slightly different
pagetable layout: that is, the bottem level table of pointers to
hugepages is a different size, and may branch off from the normal page
tables at a different level. Every hugepage aware path that needs to
walk the pagetables must therefore look up the hugepage size from the
slice info first, and work out the correct way to walk the pagetables
accordingly. Future hardware is likely to add more possible hugepage
sizes, more layout options and more mess.
This patch, therefore reworks the handling of hugepage pagetables to
reduce this complexity. In the new scheme, instead of having to
consult the slice mask, pagetable walking code can check a flag in the
PGD/PUD/PMD entries to see where to branch off to hugepage pagetables,
and the entry also contains the information (eseentially hugepage
shift) necessary to then interpret that table without recourse to the
slice mask. This scheme can be extended neatly to handle multiple
levels of self-describing "special" hugepage pagetables, although for
now we assume only one level exists.
This approach means that only the pagetable allocation path needs to
know how the pagetables should be set out. All other (hugepage)
pagetable walking paths can just interpret the structure as they go.
There already was a flag bit in PGD/PUD/PMD entries for hugepage
directory pointers, but it was only used for debug. We alter that
flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable
pointer (normally it would be 1 since the pointer lies in the linear
mapping). This means that asm pagetable walking can test for (and
punt on) hugepage pointers with the same test that checks for
unpopulated page directory entries (beq becomes bge), since hugepage
pointers will always be positive, and normal pointers always negative.
While we're at it, we get rid of the confusing (and grep defeating)
#defining of hugepte_shift to be the same thing as mmu_huge_psizes.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
get_irq_desc() is a powerpc-specific version of irq_to_desc(). That
is reason enough to remove it, but it also doesn't know about sparse
irq_desc support which irq_to_desc() does (when we enable it).
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The CHRP code has some fishy timer based code to scan the RTAS event
log, which uses a 1KB stack buffer and doesn't even use the results.
The pSeries code as a nicer daemon that allows userspace to read the
event log and basically uses the same RTAS interface
This patch moves rtasd.c out of platform/pseries and makes it usable
by CHRP, after removing the old crufty event log mechanism in there.
The nvram logging part of the daemon is still only available on 64-bit
since the underlying nvram management routines aren't currently shared.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some of the stuff in /proc/ppc64 such as the RTAS bits are actually
useful to some 32-bit platforms. Rename the file, and create a
symlink on 64-bit for backward compatibility
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch updates percpu related symbols in powerpc such that percpu
symbols are unique and don't clash with local symbols. This serves
two purposes of decreasing the possibility of global percpu symbol
collision and allowing dropping per_cpu__ prefix from percpu symbols.
* arch/powerpc/kernel/perf_callchain.c: s/callchain/cpu_perf_callchain/
* arch/powerpc/kernel/setup-common.c: s/pvr/cpu_pvr/
* arch/powerpc/platforms/pseries/dtl.c: s/dtl/cpu_dtl/
* arch/powerpc/platforms/cell/interrupt.c: s/iic/cpu_iic/
Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
which cause name clashes" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Add some dummy symbols for the branches at 0xf00, 0xf20 and 0xf40,
otherwise hits end up in trap_0e which is confusing to the user.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If CONFIG_PPC_ISERIES isn't defined we end up with
iseries_check_pending_irqs and do_work at the same address.
perf ends up picking iseries_check_pending_irqs which creates
confusing backtraces. Hide it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Kernel modules should be able to place their debug output inside our
powerpc debugfs directory.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We can monitor the effectiveness of our power management of both the
kernel and hypervisor by probing the timer interrupt. For example, on
this box we see 10.37s timer interrupts on an idle core:
<idle>-0 [010] 3900.671297: timer_interrupt_entry: pt_regs=c0000000ce1e7b10
<idle>-0 [010] 3900.671302: timer_interrupt_exit: pt_regs=c0000000ce1e7b10
<idle>-0 [010] 3911.042963: timer_interrupt_entry: pt_regs=c0000000ce1e7b10
<idle>-0 [010] 3911.042968: timer_interrupt_exit: pt_regs=c0000000ce1e7b10
<idle>-0 [010] 3921.414630: timer_interrupt_entry: pt_regs=c0000000ce1e7b10
<idle>-0 [010] 3921.414635: timer_interrupt_exit: pt_regs=c0000000ce1e7b10
Since we have a 207MHz decrementer it will go negative and fire every 10.37s
even if Linux is completely idle.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds powerpc-specific tracepoints for interrupt entry and exit.
While we already have generic irq_handler_entry and irq_handler_exit
tracepoints there are cases on our virtualised powerpc machines where an
interrupt is presented to the OS, but subsequently handled by the hypervisor.
This means no OS interrupt handler is invoked.
Here is an example on a POWER6 machine with the patch below applied:
<idle>-0 [006] 3243.949840744: irq_entry: pt_regs=c0000000ce31fb10
<idle>-0 [006] 3243.949850520: irq_exit: pt_regs=c0000000ce31fb10
<idle>-0 [007] 3243.950218208: irq_entry: pt_regs=c0000000ce323b10
<idle>-0 [007] 3243.950224080: irq_exit: pt_regs=c0000000ce323b10
<idle>-0 [000] 3244.021879320: irq_entry: pt_regs=c000000000a63aa0
<idle>-0 [000] 3244.021883616: irq_handler_entry: irq=87 handler=eth0
<idle>-0 [000] 3244.021887328: irq_handler_exit: irq=87 return=handled
<idle>-0 [000] 3244.021897408: irq_exit: pt_regs=c000000000a63aa0
Here we see two phantom interrupts (no handler was invoked), followed
by a real interrupt for eth0. Without the tracepoints in this patch we
would have missed the phantom interrupts.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
perf_event wants a separate event for alignment and emulation faults,
so create another emulation event. This will make it easy to hook in
perf_event at one spot.
We pass in regs which will be required for these events.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In continuous sampling mode we want the SDAR to update. While we can
select between dcache misses and ERAT (L1-TLB) misses, a decent default
is to enable both.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When we take an exception and the SDAR isn't synchronised we currently
log 0 as the address. Unfortunately this is a pretty common value, so
use ~0UL instead.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Based on an original patch by Valentine Barshak <vbarshak@ru.mvista.com>
Use preempt_schedule_irq to prevent infinite irq-entry and
eventual stack overflow problems with fast-paced IRQ sources.
This kind of problems has been observed on the PASemi Electra IDE
controller. We have to make sure we are soft-disabled before calling
preempt_schedule_irq and hard disable interrupts after that
to avoid unrecoverable exceptions.
This patch also moves the "clrrdi r9,r1,THREAD_SHIFT" out of
the #ifdef CONFIG_PPC_BOOK3E scope, since r9 is clobbered
and has to be restored in both cases.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix the following 3 issues:
arch/powerpc/kernel/process.c: In function 'arch_randomize_brk':
arch/powerpc/kernel/process.c:1183: error: 'mmu_highuser_ssize' undeclared (first use in this function)
arch/powerpc/kernel/process.c:1183: error: (Each undeclared identifier is reported only once
arch/powerpc/kernel/process.c:1183: error: for each function it appears in.)
arch/powerpc/kernel/process.c:1183: error: 'MMU_SEGSIZE_1T' undeclared (first use in this function)
In file included from arch/powerpc/kernel/setup_64.c:60:
arch/powerpc/include/asm/mmu-hash64.h:132: error: redefinition of 'struct mmu_psize_def'
arch/powerpc/include/asm/mmu-hash64.h:159: error: expected identifier or '(' before numeric constant
arch/powerpc/include/asm/mmu-hash64.h:396: error: conflicting types for 'mm_context_t'
arch/powerpc/include/asm/mmu-book3e.h:184: error: previous declaration of 'mm_context_t' was here
cc1: warnings being treated as errors
arch/powerpc/kernel/pci_64.c: In function 'pcibios_unmap_io_space':
arch/powerpc/kernel/pci_64.c💯 error: unused variable 'res'
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The ABI specifies a 64K alignment, we need to map the vDSO accordingly
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Due to missing segment assignments the .text section was put in the NOTES
segment (and marked as NOTE section), and the .got was put in the DYNAMIC
segment.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The architecture defines that if MSR PR is set we are in problem state
irrespective of the HV bit. This fixes perf events to reflect this.
Also, on bare metal systems, samples taken in Linux will now be reported
as kernel rather than hypervisor.
Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: paulus@samba.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge common code between Microblaze and PowerPC, and make it available
to Sparc
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Michal Simek <monstr@monstr.eu>
Acked-by: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
making a powerpc target with PCI support, shows the
following warning:
MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x10430): Section mismatch in reference from the
function pcibios_allocate_bus_resources() to the function .init.text:reparent_resources()
The function pcibios_allocate_bus_resources() references
the function __init reparent_resources().
This is often because pcibios_allocate_bus_resources lacks a __init
annotation or the annotation of reparent_resources is wrong.
This patch fix this warning by removing the __init
annotation before reparent_resources.
Signed-off-by: Heiko Schocher <hs@denx.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Here's a patch that adds the ppc750 CL cpu as supported by oprofile.
Signed-off-by: Dragos Tatulea <dtatulea@ixiacom.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to align before the output section. Having the align inside
the output section causes the linker to put some filler in there,
which makes it a non-empty section, but this section isn't assigned to
a segment so you get a warning from the linker.
Signed-off-by: Sean MacLennan <smaclennan@pikatech.com>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
'acc' isn't used anywhere and thus triggers gcc warning, which causes
build error with CONFIG_PPC_DISABLE_WERROR=n (default):
cc1: warnings being treated as errors
arch/powerpc/kernel/kgdb.c: In function 'gdb_regs_to_pt_regs':
arch/powerpc/kernel/kgdb.c:289: warning: unused variable 'acc'
make[1]: *** [arch/powerpc/kernel/kgdb.o] Error 1
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The mod_return_to_handler needs to switch to the kernel TOC before
jumping to a the kernel code. It currently does this by looking
at the kernel function data and retrieves the TOC that way.
Not only is this inefficient, it also breaks with a relocatable kernel.
The PACA contains the kernel TOC and we can easily retrieve it that
way.
Reported-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When the function graph tracer is enabled, it replaces the return address
with a hook back to the tracer. This makes back traces see the hook instead
of the actual return address.
The current code also shows the real address by checking if the return
address jumps to the return_to_handler. If it is, is also prints out
the saved real return address.
On powerpc64, some modules may return to mod_return_to_handler, which
is not checked. This patch will also show the real address if a return
is to mod_return_to_handler as well.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On machines without the ibm,client-architecture-support call we were missing a
newline. We may as well print the full name in all its glory too - its
ibm,client-architecture-support, not ibm,client-architecture as I mistakenly
wrote (a name only an IBM architect could love).
For my penance I will write out ibm,client-architecture-support 100 times.
Before:
Calling ibm,client-architecture...command line: root=/dev/sda6 console=hvc0 quiet
After:
Calling ibm,client-architecture-support... not implemented
command line: root=/dev/sda6 console=hvc0
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
perf_counter uses arch_vma_name() to detect a vdso region which in turn uses
current->mm->context.vdso_base. We need to initialise this before doing
the mmap or else we fail to detect the vdso.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If we are using 1TB segments and we are allowed to randomise the heap, we can
put it above 1TB so it is backed by a 1TB segment. Otherwise the heap will be
in the bottom 1TB which always uses 256MB segments and this may result in a
performance penalty.
This functionality is disabled when heap randomisation is turned off:
echo 1 > /proc/sys/kernel/randomize_va_space
which may be useful when trying to allocate the maximum amount of 16M or 16G
pages.
On a microbenchmark that repeatedly touches 32GB of memory with a stride of
256MB + 4kB (designed to stress 256MB segments while still mapping nicely into
the L1 cache), we see the improvement:
Force malloc to use heap all the time:
# export MALLOC_MMAP_MAX_=0 MALLOC_TRIM_THRESHOLD_=-1
Disable heap randomization:
# echo 1 > /proc/sys/kernel/randomize_va_space
# time ./test
12.51s
Enable heap randomization:
# echo 2 > /proc/sys/kernel/randomize_va_space
# time ./test
1.70s
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Sometimes this is used to hold a simple offset, and sometimes
it is used to hold a pointer. This patch changes it to a union containing
void * and dma_addr_t. get/set accessors are also provided, because it was
getting a bit ugly to get to the actual data.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The former is no longer really accurate with the swiotlb case now
a possibility. I also move it into dma-mapping.h - it no longer
needs to be in dma.c, and there are about to be some more accessors
that should all end up in the same place. A comment is added to
indicate that this function is not used in configs where there is no
simple dma offset, such as the iommu case.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (39 commits)
cpumask: Move deprecated functions to end of header.
cpumask: remove unused deprecated functions, avoid accusations of insanity
cpumask: use new-style cpumask ops in mm/quicklist.
cpumask: use mm_cpumask() wrapper: x86
cpumask: use mm_cpumask() wrapper: um
cpumask: use mm_cpumask() wrapper: mips
cpumask: use mm_cpumask() wrapper: mn10300
cpumask: use mm_cpumask() wrapper: m32r
cpumask: use mm_cpumask() wrapper: arm
cpumask: Use accessors for cpu_*_mask: um
cpumask: Use accessors for cpu_*_mask: powerpc
cpumask: Use accessors for cpu_*_mask: mips
cpumask: Use accessors for cpu_*_mask: m32r
cpumask: remove arch_send_call_function_ipi
cpumask: arch_send_call_function_ipi_mask: s390
cpumask: arch_send_call_function_ipi_mask: powerpc
cpumask: arch_send_call_function_ipi_mask: mips
cpumask: arch_send_call_function_ipi_mask: m32r
cpumask: arch_send_call_function_ipi_mask: alpha
cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: ia64
...
* remove asm/atomic.h inclusion from linux/utsname.h --
not needed after kref conversion
* remove linux/utsname.h inclusion from files which do not need it
NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
due to some personality stuff it _is_ needed -- cowardly leave ELF-related
headers and files alone.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the accessors rather than frobbing bits directly (the new versions
are const).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
We're weaning the core code off handing cpumask's around on-stack.
This introduces arch_send_call_function_ipi_mask(), and by defining
it, the old arch_send_call_function_ipi is defined by the core code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (30 commits)
Use macros for .data.page_aligned section.
Use macros for .bss.page_aligned section.
Use new __init_task_data macro in arch init_task.c files.
kbuild: Don't define ALIGN and ENTRY when preprocessing linker scripts.
arm, cris, mips, sparc, powerpc, um, xtensa: fix build with bash 4.0
kbuild: add static to prototypes
kbuild: fail build if recordmcount.pl fails
kbuild: set -fconserve-stack option for gcc 4.5
kbuild: echo the record_mcount command
gconfig: disable "typeahead find" search in treeviews
kbuild: fix cc1 options check to ensure we do not use -fPIC when compiling
checkincludes.pl: add option to remove duplicates in place
markup_oops: use modinfo to avoid confusion with underscored module names
checkincludes.pl: provide usage helper
checkincludes.pl: close file as soon as we're done with it
ctags: usability fix
kernel hacking: move STRIP_ASM_SYMS from General
gitignore usr/initramfs_data.cpio.bz2 and usr/initramfs_data.cpio.lzma
kbuild: Check if linker supports the -X option
kbuild: introduce ld-option
...
Fix trivial conflict in scripts/basic/fixdep.c
Make all seq_operations structs const, to help mitigate against
revectoring user-triggerable function pointers.
This is derived from the grsecurity patch, although generated from scratch
because it's simpler than extracting the changes from there.
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_event, powerpc: Fix compilation after big perf_counter rename
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
trivial: fix typo in aic7xxx comment
trivial: fix comment typo in drivers/ata/pata_hpt37x.c
trivial: typo in kernel-parameters.txt
trivial: fix typo in tracing documentation
trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c
trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c
trivial: remove unnecessary semicolons
trivial: Fix duplicated word "options" in comment
trivial: kbuild: remove extraneous blank line after declaration of usage()
trivial: improve help text for mm debug config options
trivial: doc: hpfall: accept disk device to unload as argument
trivial: doc: hpfall: reduce risk that hpfall can do harm
trivial: SubmittingPatches: Fix reference to renumbered step
trivial: fix typos "man[ae]g?ment" -> "management"
trivial: media/video/cx88: add __init/__exit macros to cx88 drivers
trivial: fix typo in CONFIG_DEBUG_FS in gcov doc
trivial: fix missing printk space in amd_k7_smp_check
trivial: fix typo s/ketymap/keymap/ in comment
trivial: fix typo "to to" in multiple files
trivial: fix typos in comments s/DGBU/DBGU/
...
This fixes two places in the powerpc perf_event (perf_counter) code
where 'list_entry' needs to be changed to 'group_entry', but were
missed in commit 65abc865 ("perf_counter: Rename list_entry ->
group_entry, counter_list -> group_list").
This also changes 'event' back to 'counter' in a couple of
contexts:
* Field and function names that deal with the limited-function
counters: it's really the hardware counters whose function is
limited, not the events that they count. Hence:
MAX_LIMITED_HWEVENTS -> MAX_LIMITED_HWCOUNTERS
limited_event -> limited_counter
freeze/thaw_limited_events -> freeze/thaw_limited_counters
* The machine-specific PMU description struct (struct power_pmu): this
renames 'n_event' back to 'n_counter' since it really describes how
many hardware counters the machine has. (Renaming this back avoids
a compile error in each of the machine-specific PMU back-ends where
they initialize their power_pmu struct.)
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <19128.4280.813369.589704@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- provide compatibility Kconfig entry for existing PERF_COUNTERS .config's
- provide courtesy copy of old perf_counter.h, for user-space projects
- small indentation fixups
- fix up MAINTAINERS
- fix small x86 printout fallout
- fix up small PowerPC comment fallout (use 'counter' as in register)
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 5622f295 ("x86, perf_counter, bts: Optimize BTS overflow
handling") removed the regs field from struct perf_sample_data and
added a regs parameter to perf_counter_overflow(). This breaks the
build on powerpc (and Sparc) as reported by Sachin Sant:
arch/powerpc/kernel/perf_counter.c: In function 'record_and_restart':
arch/powerpc/kernel/perf_counter.c:1165: error: unknown field 'regs' specified in initializer
This adjusts arch/powerpc/kernel/perf_counter.c to correspond with the
new struct perf_sample_data and perf_counter_overflow().
[ v2: also fix Sparc, Markus Metzger <markus.t.metzger@intel.com> ]
Reported-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Markus Metzger <markus.t.metzger@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: benh@kernel.crashing.org
Cc: linuxppc-dev@ozlabs.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <19127.8400.376239.586120@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch changes the remaining direct references to
.data.page_aligned in C and assembly code to use the macros in
include/linux/linkage.h.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
ld-option is misnamed as it test options to gcc, not to ld.
Renamed it to reflect this.
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (34 commits)
time: Prevent 32 bit overflow with set_normalized_timespec()
clocksource: Delay clocksource down rating to late boot
clocksource: clocksource_select must be called with mutex locked
clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash
timers: Drop a function prototype
clocksource: Resolve cpu hotplug dead lock with TSC unstable
timer.c: Fix S/390 comments
timekeeping: Fix invalid getboottime() value
timekeeping: Fix up read_persistent_clock() breakage on sh
timekeeping: Increase granularity of read_persistent_clock(), build fix
time: Introduce CLOCK_REALTIME_COARSE
x86: Do not unregister PIT clocksource on PIT oneshot setup/shutdown
clocksource: Avoid clocksource watchdog circular locking dependency
clocksource: Protect the watchdog rating changes with clocksource_mutex
clocksource: Call clocksource_change_rating() outside of watchdog_lock
timekeeping: Introduce read_boot_clock
timekeeping: Increase granularity of read_persistent_clock()
timekeeping: Update clocksource with stop_machine
timekeeping: Add timekeeper read_clock helper functions
timekeeping: Move NTP adjusted clock multiplier to struct timekeeper
...
Fix trivial conflict due to MIPS lemote -> loongson renaming.
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (75 commits)
PCI hotplug: clean up acpi_run_hpp()
PCI hotplug: acpiphp: use generic pci_configure_slot()
PCI hotplug: shpchp: use generic pci_configure_slot()
PCI hotplug: pciehp: use generic pci_configure_slot()
PCI hotplug: add pci_configure_slot()
PCI hotplug: clean up acpi_get_hp_params_from_firmware() interface
PCI hotplug: acpiphp: don't cache hotplug_params in acpiphp_bridge
PCI hotplug: acpiphp: remove superfluous _HPP/_HPX evaluation
PCI: Clear saved_state after the state has been restored
PCI PM: Return error codes from pci_pm_resume()
PCI: use dev_printk in quirk messages
PCI / PCIe portdrv: Fix pcie_portdrv_slot_reset()
PCI Hotplug: convert acpi_pci_detect_ejectable() to take an acpi_handle
PCI Hotplug: acpiphp: find bridges the easy way
PCI: pcie portdrv: remove unused variable
PCI / ACPI PM: Propagate wake-up enable for devices w/o ACPI support
ACPI PM: Replace wakeup.prepared with reference counter
PCI PM: Introduce device flag wakeup_prepared
PCI / ACPI PM: Rework some debug messages
PCI PM: Simplify PCI wake-up code
...
Fixed up conflict in arch/powerpc/kernel/pci_64.c due to OF device tree
scanning having been moved and merged for the 32- and 64-bit cases. The
'needs_freset' initialization added in 6e19314cc ("PCI/powerpc: support
PCIe fundamental reset") is now in arch/powerpc/kernel/pci_of_scan.c.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits)
powerpc/nvram: Enable use Generic NVRAM driver for different size chips
powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline
powerpc/ps3: Workaround for flash memory I/O error
powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead
powerpc/perf_counters: Reduce stack usage of power_check_constraints
powerpc: Fix bug where perf_counters breaks oprofile
powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops
powerpc/irq: Improve nanodoc
powerpc: Fix some late PowerMac G5 with PCIe ATI graphics
powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT
powerpc/book3e: Add missing page sizes
powerpc/pseries: Fix to handle slb resize across migration
powerpc/powermac: Thermal control turns system off too eagerly
powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()
powerpc/405ex: support cuImage via included dtb
powerpc/405ex: provide necessary fixup function to support cuImage
powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC
powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support.
powerpc/44x: Update Arches defconfig
powerpc/44x: Update Arches dts
...
Fix up conflicts in drivers/char/agp/uninorth-agp.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
powerpc64: convert to dynamic percpu allocator
sparc64: use embedding percpu first chunk allocator
percpu: kill lpage first chunk allocator
x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
percpu: update embedding first chunk allocator to handle sparse units
percpu: use group information to allocate vmap areas sparsely
vmalloc: implement pcpu_get_vm_areas()
vmalloc: separate out insert_vmalloc_vm()
percpu: add chunk->base_addr
percpu: add pcpu_unit_offsets[]
percpu: introduce pcpu_alloc_info and pcpu_group_info
percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
percpu: add @align to pcpu_fc_alloc_fn_t
percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
percpu: drop @static_size from first chunk allocators
percpu: generalize first chunk allocator selection
percpu: build first chunk allocators selectively
percpu: rename 4k first chunk allocator to page
percpu: improve boot messages
percpu: fix pcpu_reclaim() locking
...
Fix trivial conflict as by Tejun Heo in kernel/sched.c
* 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
perf tools: Avoid unnecessary work in directory lookups
perf stat: Clean up statistics calculations a bit more
perf stat: More advanced variance computation
perf stat: Use stddev_mean in stead of stddev
perf stat: Remove the limit on repeat
perf stat: Change noise calculation to use stddev
x86, perf_counter, bts: Do not allow kernel BTS tracing for now
x86, perf_counter, bts: Correct pointer-to-u64 casts
x86, perf_counter, bts: Fail if BTS is not available
perf_counter: Fix output-sharing error path
perf trace: Fix read_string()
perf trace: Print out in nanoseconds
perf tools: Seek to the end of the header area
perf trace: Fix parsing of perf.data
perf trace: Sample timestamps as well
perf_counter: Introduce new (non-)paranoia level to allow raw tracepoint access
perf trace: Sample the CPU too
perf tools: Work around strict aliasing related warnings
perf tools: Clean up warnings list in the Makefile
perf tools: Complete support for dynamic strings
...
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (59 commits)
x86/gart: Do not select AGP for GART_IOMMU
x86/amd-iommu: Initialize passthrough mode when requested
x86/amd-iommu: Don't detach device from pt domain on driver unbind
x86/amd-iommu: Make sure a device is assigned in passthrough mode
x86/amd-iommu: Align locking between attach_device and detach_device
x86/amd-iommu: Fix device table write order
x86/amd-iommu: Add passthrough mode initialization functions
x86/amd-iommu: Add core functions for pd allocation/freeing
x86/dma: Mark iommu_pass_through as __read_mostly
x86/amd-iommu: Change iommu_map_page to support multiple page sizes
x86/amd-iommu: Support higher level PTEs in iommu_page_unmap
x86/amd-iommu: Remove old page table handling macros
x86/amd-iommu: Use 2-level page tables for dma_ops domains
x86/amd-iommu: Remove bus_addr check in iommu_map_page
x86/amd-iommu: Remove last usages of IOMMU_PTE_L0_INDEX
x86/amd-iommu: Change alloc_pte to support 64 bit address space
x86/amd-iommu: Introduce increase_address_space function
x86/amd-iommu: Flush domains if address space size was increased
x86/amd-iommu: Introduce set_dte_entry function
x86/amd-iommu: Add a gneric version of amd_iommu_flush_all_devices
...
Remove the reliance on a staticly defined NVRAM size, allowing
platforms to support NVRAMs with sizes differing from the standard.
A fall back value is provided for platforms not supporting this extension.
Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman reported stack-frame size warnings being produced
for power_check_constraints(), which uses an 8*8 array of u64 and
two 8*8 arrays of unsigned long, which are currently allocated on the
stack, along with some other smaller variables. These arrays come
to 1.5kB on 64-bit or 1kB on 32-bit, which is a bit too much for the
stack.
This fixes the problem by putting these arrays in the existing
per-cpu cpu_hw_counters struct. This is OK because two of the call
sites have interrupts disabled already; for the third call site we
use get_cpu_var, which disables preemption, so we know we won't
get a context switch while we're in power_check_constraints().
Note that power_check_constraints() can be called during context
switch but is not called from interrupts.
Reported-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: <stable@kernel.org)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently there is a bug where if you use oprofile on a pSeries
machine, then use perf_counters, then use oprofile again, oprofile
will not work correctly; it will lose the PMU configuration the next
time the hypervisor does a partition context switch, and thereafter
won't count anything.
Maynard Johnson identified the sequence causing the problem:
- oprofile setup calls ppc_enable_pmcs(), which calls
pseries_lpar_enable_pmcs, which tells the hypervisor that we want
to use the PMU, and sets the "PMU in use" flag in the lppaca.
This flag tells the hypervisor whether it needs to save and restore
the PMU config.
- The perf_counter code sets and clears the "PMU in use" flag directly
as it context-switches the PMU between tasks, and leaves it clear
when it finishes.
- oprofile setup, called for a new oprofile run, calls ppc_enable_pmcs,
which does nothing because it has already been called. In particular
it doesn't set the "PMU in use" flag.
This fixes the problem by arranging for ppc_enable_pmcs to always set
the "PMU in use" flag. It makes the perf_counter code call
ppc_enable_pmcs also rather than calling the lower-level function
directly, and removes the setting of the "PMU in use" flag from
pseries_lpar_enable_pmcs, since that is now done in its caller.
This also removes the declaration of pasemi_enable_pmcs because it
isn't defined anywhere.
Reported-by: Maynard Johnson <mpjohn@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: <stable@kernel.org)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The following commit introduced a compile error since it removed
the implementation of smp_85xx_basic_setup:
commit 77c0a700c1
Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Fri Aug 28 14:25:04 2009 +1000
powerpc: Properly start decrementer on BookE secondary CPUs
Make it so that smp_ops probe() and setup_cpu() can be set to NULL.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
By default, the EEH framework on powerpc does what's known as a "hot
reset" during recovery of a PCI Express device. We've found a case
where the device needs a "fundamental reset" to recover properly. The
current PCI error recovery and EEH frameworks do not support this
distinction.
The attached patch makes changes to EEH to utilize the new bit field.
Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Signed-off-by: Richard Lary <rlary@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
I had the codes for L1 D-cache load accesses and misses swapped
around, and the wrong codes for LL-cache accesses and misses.
This corrects them.
Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
LKML-Reference: <19103.8514.709300.585484@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Switch to using the Power ISA defined PTE format when we have a 64-bit
PTE. This makes the code handling between fsl-booke and book3e-64
similiar for TLB faults.
Additionally this lets use take advantage of the page size encodings and
full permissions that the HW PTE defines.
Also defined _PMD_PRESENT, _PMD_PRESENT_MASK, and _PMD_BAD since the
32-bit ppc arch code expects them.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The SLB can change sizes across a live migration, which was not
being handled, resulting in possible machine crashes during
migration if migrating to a machine which has a smaller max SLB
size than the source machine. Fix this by first reducing the
SLB size to the minimum possible value, which is 32, prior to
migration. Then during the device tree update which occurs after
migration, we make the call to ensure the SLB gets updated. Also
add the slb_size to the lparcfg output so that the migration
tools can check to make sure the kernel has this capability
before allowing migration in scenarios where the SLB size will change.
BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid
breaking ppc32 build
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The two versions are doing almost exactly the same thing. No need to
maintain them as separate files. This patch also has the side effect
of making the PCI device tree scanning code available to 32 bit powerpc
machines, but no board ports actually make use of this feature at this
point.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This moves the code to start the decrementer on 40x and BookE into
a separate function which is now called from time_init() and
secondary_time_init(), before the respective clock sources are
registered. We also remove the 85xx specific code for doing it
from the platform code.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some of the PCI features we have in ppc32 we will need on ppc64
platforms in the future. These include support for:
* ppc_md.pci_exclude_device
* indirect config cycles
* early config cycles
We also simplified the logic in fake_pci_bus() to assume it will always
get a valid pci_controller. Since all current callers seem to pass it
one.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The PCI device tree scanning code in pci_64.c is some useful functionality.
It allows PCI devices to be described in the device tree instead of being
probed for, which in turn allows pci devices to use all of the device tree
facilities to describe complex PCI bus architectures like GPIO and IRQ
routing (perhaps not a common situation for desktop or server systems,
but useful for embedded systems with on-board PCI devices).
This patch moves the device tree scanning into pci-common.c so it is
available for 32-bit powerpc machines too.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
PPC_OF is always selected for arch/powerpc. This patch removes the stale
#defines
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We now search through TLBnCFG looking for the first array that has IPROT
support (we assume that there is only one). If that TLB has hardware
entry select (HES) support we use the existing code and with the proper
TLB select (the HES code still needs to clean up bolted entries from
firmware). The non-HES code is pretty similiar to the 32-bit FSL Book-E
code but does make some new assumtions (like that we have tlbilx) and
simplifies things down a bit.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Not all 64-bit Book-3E parts will have fixed IVORs so add a function that
cpusetup code can call to setup the base IVORs (0..15) to match the fixed
offsets. We need to 'or' part of interrupt_base_book3e into the IVORs
since on parts that have them the IVPR doesn't extend as far down.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Match what we do on 32-bit Book-E processors and enable the decrementer
in generic_calibrate_decr. We need to make sure we disable the
decrementer early in boot since we currently use lazy (soft) interrupt
on 64-bit Book-E and possible get a decrementer exception before we
are ready for it.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move the default cpu entry table for CONFIG_PPC_BOOK3E_64 to the
very end since we will probably want to support both 32-bit and
64-bit kernels for some processors that are higher up in the list.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This converts uses dma_map_ops struct (in include/linux/dma-mapping.h)
instead of POWERPC homegrown dma_mapping_ops.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now swiotlb_pci_dma_ops is identical to swiotlb_dma_ops; we can use
swiotlb_dma_ops with any devices. This removes swiotlb_pci_dma_ops.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds max_direct_dma_addr to struct dev_archdata to remove
addr_needs_map in struct dma_mapping_ops. It also converts
dma_capable() to use max_direct_dma_addr.
max_direct_dma_addr is initialized in pci_dma_dev_setup_swiotlb(),
called via ppc_md.pci_dma_dev_setup hook.
For further information:
http://marc.info/?t=124719060200001&r=1&w=2
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Time time taken for a single cpu online operation on a pseries machine
is as follows:
Dedicated LPAR (POWER6): ~220ms.
Shared LPAR (POWER5) : ~240ms.
Of this time, approximately 200ms is taken up by __cpu_up(). This is because
we poll every 200ms to check if the new cpu has notified it's presence
through the cpu_callin_map. We repeat this operation until the new cpu sets
the value in cpu_callin_map or 5 seconds elapse, whichever comes earlier.
However, using completion_structs instead of polling loops,
the time taken by the new processor to indicate it's presence has
found to be less than 1ms on pseries. This method however may not
work on all powerpc platforms due to the time-base synchronization code.
Keeping this in mind, we could reduce msleep polling interval from
200ms to 1ms while retaining the 5 second timeout.
With this, the time taken for a cpu online operation changes as follows:
Dedicated LPAR (POWER6): 20-25ms.
Shared LPAR (POWER5) : 60-80ms.
In both these cases, it was found that the code polls through the loop
only once indicating that 1ms is a reasonable value, atleast on pseries.
The code needs testing on other powerpc platforms.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The ptrace POKETEXT interface allows a process to modify the text pages of
a child process being ptraced, usually to insert breakpoints via trap
instructions. The kernel eventually calls copy_to_user_page, which in turn
calls __flush_icache_range to invalidate the icache lines for the child
process.
However, this function does not work on 44x due to the icache being virtually
indexed. This was noticed by a breakpoint being triggered after it had been
cleared by ltrace on a 440EPx board. The convenient solution is to do a
flash invalidate of the icache in the __flush_icache_range function.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is an attempt at cleaning up a bit the way we handle execute
permission on powerpc. _PAGE_HWEXEC is gone, _PAGE_EXEC is now only
defined by CPUs that can do something with it, and the myriad of
#ifdef's in the I$/D$ coherency code is reduced to 2 cases that
hopefully should cover everything.
The logic on BookE is a little bit different than what it was though
not by much. Since now, _PAGE_EXEC will be set by the generic code
for executable pages, we need to filter out if they are unclean and
recover it. However, I don't expect the code to be more bloated than
it already was in that area due to that change.
I could boast that this brings proper enforcing of per-page execute
permissions to all BookE and 40x but in fact, we've had that now for
some time as a side effect of my previous rework in that area (and
I didn't even know it :-) We would only enable execute permission if
the page was cache clean and we would only cache clean it if we took
and exec fault. Since we now enforce that the later only work if
VM_EXEC is part of the VMA flags, we de-fact already enforce per-page
execute permissions... Unless I missed something
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix the following build problem on powerpc:
arch/powerpc/kernel/time.c: In function 'read_persistent_clock':
arch/powerpc/kernel/time.c:788: error: 'return' with a value, in function returning void
arch/powerpc/kernel/time.c:791: error: 'return' with a value, in function returning void
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: dwalker@fifo99.com
Cc: johnstul@us.ibm.com
LKML-Reference: <20090822222313.74b9619c@skybase>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently _edata does not include several data sections, this causes
the kernel's report of memory usage at boot to not match reality, and
also prevents kmemleak from working - because it scan between _sdata
and _edata for pointers to allocated memory.
This mirrors a similar change made recently to the x86 linker script.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Make it possible to enable GCOV code coverage measurement on powerpc.
Lightly tested on 64-bit, seems to work as expected.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The kernel.h macro DIV_ROUND_CLOSEST performs the computation (x + d/2)/d
but is perhaps more readable.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@haskernel@
@@
#include <linux/kernel.h>
@depends on haskernel@
expression x,__divisor;
@@
- (((x) + ((__divisor) / 2)) / (__divisor))
+ DIV_ROUND_CLOSEST(x,__divisor)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Evaluate mem kernel parameter for early memory allocations. If mem is set
no allocation in the region above the given boundary is allowed. The current
code doesn't take care about this and allocate memory above the given mem
boundary.
Signed-off-by: Benjamin Krill <ben@codiert.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This contains all the bits that didn't fit in previous patches :-) This
includes the actual exception handlers assembly, the changes to the
kernel entry, other misc bits and wiring it all up in Kconfig.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds the TLB miss handler assembly, the low level TLB flush routines
along with the necessary hook for dealing with our virtual page tables
or indirect TLB entries that need to be flushes when PTE pages are freed.
There is currently no support for hugetlbfs
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds various fields in the PACA that are for use specifically
by Book3E processors, such as exception save areas, current pgd
pointer, special exceptions kernel stacks etc...
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds the PTE and pgtable format definitions, along with changes
to the kernel memory map and other definitions related to implementing
support for 64-bit Book3E. This also shields some asm-offset bits that
are currently only relevant on 32-bit
We also move the definition of the "linux" page size constants to
the common mmu.h file and add a few sizes that are relevant to
embedded processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Those definitions are currently declared extern in the .c file where
they are used, move them to a header file instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, a single ifdef covers SLB related bits and more generic ppc64
related bits, split this in two separate ifdef's since 64-bit BookE will
need one but not the other.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Our 64-bit hash context handling has no init function, but 64-bit Book3E
will use the common mmu_context_nohash.c code which does, so define an
empty inline mmu_context_init() for 64-bit server and call it from
our 64-bit setup_arch()
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
enter_prom() used to save and restore registers such as CTR, XER etc..
which are volatile, or SRR0,1... which we don't care about. This
removes a bunch of useless code and while at it turns an mtmsrd into
an MTMSRD macro which will be useful to Book3E.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The truncate syscall has a signed long parameter, so when using a 32-
bit userspace with a 64-bit kernel the argument is zero-extended
instead of sign-extended. Adding the compat_sys_truncate function
fixes the issue.
This was noticed during an LSB truncate test failure. The test was
checking for the correct error number set when truncate is called with
a length of -1. The test can be found at:
http://bzr.linuxfoundation.org/lsb/devel/runtime-test?cmd=inventory;rev=stewb%40linux-foundation.org-20090626205411-sfb23cc0tjj7jzgm;path=modules/vsx-pcts/tset/POSIX.os/files/truncate/
BenH: Added compat_sys_ftruncate() as well, same issue.
Signed-off-by: Chase Douglas <cndougla@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The STAB code used on Power3 and RS/64 uses a second scratch SPRG to
save a GPR in order to decide whether to go to do_stab_bolted_* or
to handle a normal data access exception.
This prevents our scheme of freeing SPRG3 which is user visible for
user uses since we cannot use SPRG0 which, on RS/64, seems to be
read-only for supervisor mode (like POWER4).
This reworks the STAB exception entry to use the PACA as temporary
storage instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The kernel uses SPRG registers for various purposes, typically in
low level assembly code as scratch registers or to hold per-cpu
global infos such as the PACA or the current thread_info pointer.
We want to be able to easily shuffle the usage of those registers
as some implementations have specific constraints realted to some
of them, for example, some have userspace readable aliases, etc..
and the current choice isn't always the best.
This patch should not change any code generation, and replaces the
usage of SPRN_SPRGn everywhere in the kernel with a named replacement
and adds documentation next to the definition of the names as to
what those are used for on each processor family.
The only parts that still use the original numbers are bits of KVM
or suspend/resume code that just blindly needs to save/restore all
the SPRGs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The file include/asm/exception.h contains definitions
that are specific to exception handling on 64-bit server
type processors.
This renames the file to exception-64s.h to reflect that
fact and avoid confusion.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On 64bit applications the VDSO is the only thing in segment 0. Since the VDSO
is position independent we can remove the hint and let get_unmapped_area pick
an area. This will mean the vdso will be near other mmaps and will share
an SLB entry:
10000000-10001000 r-xp 00000000 08:06 5778459 /root/context_switch_64
10010000-10011000 r--p 00000000 08:06 5778459 /root/context_switch_64
10011000-10012000 rw-p 00001000 08:06 5778459 /root/context_switch_64
fffa92ae000-fffa92b0000 rw-p 00000000 00:00 0
fffa92b0000-fffa9453000 r-xp 00000000 08:06 4334051 /lib64/power6/libc-2.9.so
fffa9453000-fffa9462000 ---p 001a3000 08:06 4334051 /lib64/power6/libc-2.9.so
fffa9462000-fffa9466000 r--p 001a2000 08:06 4334051 /lib64/power6/libc-2.9.so
fffa9466000-fffa947c000 rw-p 001a6000 08:06 4334051 /lib64/power6/libc-2.9.so
fffa947c000-fffa9480000 rw-p 00000000 00:00 0
fffa9480000-fffa94a8000 r-xp 00000000 08:06 4333852 /lib64/ld-2.9.so
fffa94b3000-fffa94b4000 rw-p 00000000 00:00 0
fffa94b4000-fffa94b7000 r-xp 00000000 00:00 0 [vdso] <----- here I am
fffa94b7000-fffa94b8000 r--p 00027000 08:06 4333852 /lib64/ld-2.9.so
fffa94b8000-fffa94bb000 rw-p 00028000 08:06 4333852 /lib64/ld-2.9.so
fffa94bb000-fffa94bc000 rw-p 00000000 00:00 0
fffe4c10000-fffe4c25000 rw-p 00000000 00:00 0 [stack]
On a microbenchmark that bounces a token between two 64bit processes over pipes
and calls gettimeofday each iteration (to access the VDSO), our context switch
rate goes from 268k to 277k ctx switches/sec (tested on a 4GHz POWER6).
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds support for tracing callchains for powerpc, both 32-bit
and 64-bit, and both in the kernel and userspace, from PMU interrupt
context.
The first three entries stored for each callchain are the NIP (next
instruction pointer), LR (link register), and the contents of the LR
save area in the second stack frame (the first is ignored because the
ABI convention on powerpc is that functions save their return address
in their caller's stack frame). Because leaf functions don't have to
save their return address (LR value) and don't have to establish a
stack frame, it's possible for either or both of LR and the second
stack frame's LR save area to have valid return addresses in them.
This is basically impossible to disambiguate without either reading
the code or looking at auxiliary information such as CFI tables.
Since we don't want to do either of those things at interrupt time,
we store both LR and the second stack frame's LR save area.
Once we get past the second stack frame, there is no ambiguity; all
return addresses we get are reliable.
For kernel traces, we check whether they are valid kernel instruction
addresses and store zero instead if they are not (rather than
omitting them, which would make it impossible for userspace to know
which was which). We also store zero instead of the second stack
frame's LR save area value if it is the same as LR.
For kernel traces, we check for interrupt frames, and for user traces,
we check for signal frames. In each case, since we're starting a new
trace, we store a PERF_CONTEXT_KERNEL/USER marker so that userspace
knows that the next three entries are NIP, LR and the second stack frame
for the interrupted context.
We read user memory with __get_user_inatomic. On 64-bit, if this
PMU interrupt occurred while interrupts are soft-disabled, and
there is no MMU hash table entry for the page, we will get an
-EFAULT return from __get_user_inatomic even if there is a valid
Linux PTE for the page, since hash_page isn't reentrant. Thus we
have code here to read the Linux PTE and access the page via the
kernel linear mapping. Since 64-bit doesn't use (or need) highmem
there is no need to do kmap_atomic. On 32-bit, we don't do soft
interrupt disabling, so this complication doesn't occur and there
is no need to fall back to reading the Linux PTE, since hash_page
(or the TLB miss handler) will get called automatically if necessary.
Note that we cannot get PMU interrupts in the interval during
context switch between switch_mm (which switches the user address
space) and switch_to (which actually changes current to the new
process). On 64-bit this is because interrupts are hard-disabled
in switch_mm and stay hard-disabled until they are soft-enabled
later, after switch_to has returned. So there is no possibility
of trying to do a user stack trace when the user address space is
not current's address space.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This provides a mechanism to allow the perf_counters code to access
user memory in a PMU interrupt routine. Such an access can cause
various kinds of interrupt: SLB miss, MMU hash table miss, segment
table miss, or TLB miss, depending on the processor. This commit
only deals with 64-bit classic/server processors, which use an MMU
hash table. 32-bit processors are already able to access user memory
at interrupt time. Since we don't soft-disable on 32-bit, we avoid
the possibility of reentering hash_page or the TLB miss handlers,
since they run with interrupts disabled.
On 64-bit processors, an SLB miss interrupt on a user address will
update the slb_cache and slb_cache_ptr fields in the paca. This is
OK except in the case where a PMU interrupt occurs in switch_slb,
which also accesses those fields. To prevent this, we hard-disable
interrupts in switch_slb. Interrupts are already soft-disabled at
this point, and will get hard-enabled when they get soft-enabled
later.
This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice,
and to make sure that it clears the slb_cache_ptr when called from
other callers than switch_slb, the existing routine is renamed to
__slb_flush_and_rebolt, which is called by switch_slb and the new
version of slb_flush_and_rebolt.
Similarly, switch_stab (used on POWER3 and RS64 processors) gets a
hard_irq_disable() to protect the per-cpu variables used there and
in ste_allocate.
If a MMU hashtable miss interrupt occurs, normally we would call
hash_page to look up the Linux PTE for the address and create a HPTE.
However, hash_page is fairly complex and takes some locks, so to
avoid the possibility of deadlock, we check the preemption count
to see if we are in a (pseudo-)NMI handler, and if so, we don't call
hash_page but instead treat it like a bad access that will get
reported up through the exception table mechanism. An interrupt
whose handler runs even though the interrupt occurred when
soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI
handler, which should use nmi_enter()/nmi_exit() rather than
irq_enter()/irq_exit().
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The persistent clock of some architectures (e.g. s390) have a
better granularity than seconds. To reduce the delta between the
host clock and the guest clock in a virtualized system change the
read_persistent_clock function to return a struct timespec.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134811.013873340@de.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Now that percpu allows arbitrary embedding of the first chunk,
powerpc64 can easily be converted to dynamic percpu allocator.
Convert it. powerpc supports several large page sizes. Cap atom_size
at 1M. There isn't much to gain by going above that anyway.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Conflicts:
arch/sparc/kernel/smp_64.c
arch/x86/kernel/cpu/perf_counter.c
arch/x86/kernel/setup_percpu.c
drivers/cpufreq/cpufreq_ondemand.c
mm/percpu.c
Conflicts in core and arch percpu codes are mostly from commit
ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all
the first chunk allocators into mm/percpu.c, the changes are moved
from arch code to mm/percpu.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
perf_counter: Zero dead bytes from ftrace raw samples size alignment
perf_counter: Subtract the buffer size field from the event record size
perf_counter: Require CAP_SYS_ADMIN for raw tracepoint data
perf_counter: Correct PERF_SAMPLE_RAW output
perf tools: callchain: Fix bad rounding of minimum rate
perf_counter tools: Fix libbfd detection for systems with libz dependency
perf: "Longum est iter per praecepta, breve et efficax per exempla"
perf_counter: Fix a race on perf_counter_ctx
perf_counter: Fix tracepoint sampling to be part of generic sampling
perf_counter: Work around gcc warning by initializing tracepoint record unconditionally
perf tools: callchain: Fix sum of percentages to be 100% by displaying amount of ignored chains in fractal mode
perf tools: callchain: Fix 'perf report' display to be callchain by default
perf tools: callchain: Fix spurious 'perf report' warnings: ignore empty callchains
perf record: Fix the -A UI for empty or non-existent perf.data
perf util: Fix do_read() to fail on EOF instead of busy-looping
perf list: Fix the output to not include tracepoints without an id
perf_counter/powerpc: Fix oops on cpus without perf_counter hardware support
perf stat: Fix tool option consistency: rename -S/--scale to -c/--scale
perf report: Add debug help for the finding of symbol bugs - show the symtab origin (DSO, build-id, kernel, etc)
perf report: Fix per task mult-counter stat reporting
...
On an iMac G5, the b43 driver is failing to initialise because trying to
set the dma mask to 30-bit fails. Even though there's only 512MiB of RAM
in the machine anyway:
https://bugzilla.redhat.com/show_bug.cgi?id=514787
We should probably let it succeed if the available RAM in the system
doesn't exceed the requested limit.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If we have the powerpc perf_counter backend compiled in, but
the cpu we are running on is one where we don't support the
PMU, we currently oops in hw_perf_group_sched_in if we try to
use any counters, because ppmu is NULL in that case, and we
unconditionally dereference ppmu.
This fixes the problem by adding a check if ppmu is NULL at the
beginning of hw_perf_group_sched_in, and also at the beginning
of the other functions that get called from the perf_counter
core, i.e. hw_perf_disable, hw_perf_enable, and
hw_perf_counter_setup.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: benh@kernel.crashing.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If the current CPU doesn't support performance counters,
cur_cpu_spec->oprofile_cpu_type can be NULL. The current
perf_counter modules don't test for that case and would thus
crash at boot time.
Bug reported by David Woodhouse.
Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul Mackerras <paulus@samba.org>
LKML-Reference: <19066.48028.446975.501454@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For powerpc with CONFIG_VIRT_CPU_ACCOUNTING
jiffies_to_cputime(1) is not compile time constant and run time
calculations are quite expensive. To optimize we use
precomputed value. For all other architectures is is
preprocessor definition.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
LKML-Reference: <1248862529-6063-5-git-send-email-sgruszka@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
phys_to_dma() and dma_to_phys() are used instead of
swiotlb_phys_to_bus() and swiotlb_bus_to_phys().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
swiotlb doesn't use swiotlb_arch_address_needs_mapping(); it uses
dma_capalbe(). We can remove unnecessary
swiotlb_arch_address_needs_mapping().
We can remove swiotlb_addr_needs_map() and is_buffer_dma_capable() in
swiotlb_pci_addr_needs_map() too; dma_capable() handles the features
that both provide.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
swiotlb_bus_to_virt is unncessary; we can use swiotlb_bus_to_phys and
phys_to_virt instead.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
When moving load_up_altivec to vector.S a typo in a comment caused a
thinko setting the wrong variable.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On booke processors, gdb is seeing spurious SIGTRAPs when setting a
watchpoint.
user_disable_single_step() simply quits when the DAC is non-zero. It should
be clearing the DBCR0_IC and DBCR0_BT bits from the dbcr0 register and
TIF_SINGLESTEP from the thread flag.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT
This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
perf report: Add "Fractal" mode output - support callchains with relative overhead rate
perf_counter tools: callchains: Manage the cumul hits on the fly
perf report: Change default callchain parameters
perf report: Use a modifiable string for default callchain options
perf report: Warn on callchain output request from non-callchain file
x86: atomic64: Inline atomic64_read() again
x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()
x86: atomic64: Improve atomic64_xchg()
x86: atomic64: Export APIs to modules
x86: atomic64: Improve atomic64_read()
x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP
x86: atomic64: Fix unclean type use in atomic64_xchg()
x86: atomic64: Make atomic_read() type-safe
x86: atomic64: Reduce size of functions
x86: atomic64: Improve atomic64_add_return()
x86: atomic64: Improve cmpxchg8b()
x86: atomic64: Improve atomic64_read()
x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file
x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too
perf report: Annotate variable initialization
...
Discarded sections in different archs share some commonality but have
considerable differences. This led to linker script for each arch
implementing its own /DISCARD/ definition, which makes maintaining
tedious and adding new entries error-prone.
This patch makes all linker scripts to move discard definitions to the
end of the linker script and use the common DISCARDS macro. As ld
uses the first matching section definition, archs can include default
discarded sections by including them earlier in the linker script.
ia64 is notable because it first throws away some ia64 specific
subsections and then include the rest of the sections into the final
image, so those sections must be discarded before the inclusion.
defconfig compile tested for x86, x86-64, powerpc, powerpc64, ia64,
alpha, sparc, sparc64 and s390. Michal Simek tested microblaze.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Tested-by: Michal Simek <monstr@monstr.eu>
Cc: linux-arch@vger.kernel.org
Cc: Michal Simek <monstr@monstr.eu>
Cc: microblaze-uclinux@itee.uq.edu.au
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Tony Luck <tony.luck@intel.com>
Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix
changes. As alpha in percpu tree uses 'weak' attribute instead of
inline assembly, there's no need for __used attribute.
Conflicts:
arch/alpha/include/asm/percpu.h
arch/mn10300/kernel/vmlinux.lds.S
include/linux/percpu-defs.h
POWER7 has the same PR/HV bit layout as POWER6, so set the flag.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: a.p.zijlstra@chello.nl
Cc: benh@kernel.crashing.org
LKML-Reference: <20090701030701.GI3563@kryten>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The function udbg_44x_as1_flush() has the wrong prototype causing
a warning when enabling 440 early debug.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
dev_set_name() takes a format string, so use it properly and avoid
a warning with recent gcc's
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Several platforms use their own copy of what is essentially the same code,
using RTAS to synchronize the timebases when bringing up new CPUs. This
moves it all into a single common implementation and additionally
turns the spinlock into a raw spinlock since the former can rely on
the timebase not being frozen when spinlock debugging is enabled, and finally
masks interrupts while the timebase is disabled.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
RTAS currently uses a normal spinlock. However it can be called from
contexts where this is not necessarily a good idea. For example, it
can be called while syncing timebases, with the core timebase being
frozen. Unfortunately, that will deadlock in case of lock contention
when spinlock debugging is enabled as the spin lock debugging code
will try to use __delay() which ... relies on the timebase being
enabled.
Also RTAS can be used in some low level IRQ handling code path so it
may as well be a raw spinlock for -rt sake.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Based on initial work from: Dale Farnsworth <dale@farnsworth.org>
Add the low level irq tracing hooks for 32-bit powerpc needed
to enable full lockdep functionality.
The approach taken to deal with the code in entry_32.S is that
we don't trace all the transitions of MSR:EE when we just turn
it off to peek at TI_FLAGS without races. Only when we are
calling into C code or returning from exceptions with a state
that have changed from what lockdep thinks.
There's a little bugger though: If we take an exception that
keeps interrupts enabled (such as an alignment exception) while
interrupts are enabled, we will call trace_hardirqs_on() on the
way back spurriously. Not a big deal, but to get rid of it would
require remembering in pt_regs that the exception was one of the
type that kept interrupts enabled which we don't know at this
stage. (Well, we could test all cases for regs->trap but that
sucks too much).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Kumar Gala <galak@kernel.crashing.org>
The 32-bit kernel relies on some memory being mapped covering
the kernel text,data and bss at least, early during boot before
the full MMU setup is done. On 32-bit "classic" processors, this
is done using BAT registers.
On 601, the size of BATs is limited to 8M and we use 2 of them
for that initial mapping. This can become quite tight when enabling
features like lockdep, so let's use a 3rd one to bump that mapping
from 16M to 24M. We keep the 4th BAT free as it can be useful for
debugging early boot code to map things like serial ports.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For some reason we've had an explicit KERN_INFO for GPR dumps. With
recent changes we get output like:
<6>GPR00: 00000000 ef855eb0 ef858000 00000001 000000d0 f1000000 ffbc8000 ffffffff
The KERN_INFO is causing the <6>. Don't see any reason to keep it
around.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The old PowerSurge SMP (ie, dual or quad 604 machines) code has
numerous issues in modern world.
One is cpu_possible_map is set too late (the device-tree is bogus)
so we fail to allocate the interrupt stacks and crash. Another
problem is the fact the timebase is frozen by the bringup of the
second CPU so the delays in the generic code will hang, we need
to move some of the calling procedure to inside the powermac code.
This makes it boot again for me
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
x86 throws away .discard section but no other archs do. Also,
.discard is not thrown away while linking modules. Make every arch
and module linking throw it away. This will be used to define dummy
variables for percpu declarations and definitions.
This patch is based on Ivan Kokshaysky's alpha percpu patch.
[ Impact: always throw away everything in .discard ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Bryan Wu <cooloney@kernel.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
perfcounter: Handle some IO return values
perf_counter: Push perf_sample_data through the swcounter code
perf_counter tools: Define and use our own u64, s64 etc. definitions
perf_counter: Close race in perf_lock_task_context()
perf_counter, x86: Improve interactions with fast-gup
perf_counter: Simplify and fix task migration counting
perf_counter tools: Add a data file header
perf_counter: Update userspace callchain sampling uses
perf_counter: Make callchain samples extensible
perf report: Filter to parent set by default
perf_counter tools: Handle lost events
perf_counter: Add event overlow handling
fs: Provide empty .set_page_dirty() aop for anon inodes
perf_counter: tools: Makefile tweaks for 64-bit powerpc
perf_counter: powerpc: Add processor back-end for MPC7450 family
perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
perf_counter: powerpc: Change how processor-specific back-ends get selected
perf_counter: powerpc: Use unsigned long for register and constraint values
perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
perf_counter tools: Add and use isprint()
...
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
tracing/urgent: warn in case of ftrace_start_up inbalance
tracing/urgent: fix unbalanced ftrace_start_up
function-graph: add stack frame test
function-graph: disable when both x86_32 and optimize for size are configured
ring-buffer: have benchmark test print to trace buffer
ring-buffer: do not grab locks in nmi
ring-buffer: add locks around rb_per_cpu_empty
ring-buffer: check for less than two in size allocation
ring-buffer: remove useless compile check for buffer_page size
ring-buffer: remove useless warn on check
ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index
tracing: update sample event documentation
tracing/filters: fix race between filter setting and module unload
tracing/filters: free filter_string in destroy_preds()
ring-buffer: use commit counters for commit pointer accounting
ring-buffer: remove unused variable
ring-buffer: have benchmark test handle discarded events
ring-buffer: prevent adding write in discarded area
tracing/filters: strloc should be unsigned short
tracing/filters: operand can be negative
...
Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually
In case gcc does something funny with the stack frames, or the return
from function code, we would like to detect that.
An arch may implement passing of a variable that is unique to the
function and can be saved on entering a function and can be tested
when exiting the function. Usually the frame pointer can be used for
this purpose.
This patch also implements this for x86. Where it passes in the stack
frame of the parent function, and will test that frame on exit.
There was a case in x86_32 with optimize for size (-Os) where, for a
few functions, gcc would align the stack frame and place a copy of the
return address into it. The function graph tracer modified the copy and
not the actual return address. On return from the funtion, it did not go
to the tracer hook, but returned to the parent. This broke the function
graph tracer, because the return of the parent (where gcc did not do
this funky manipulation) returned to the location that the child function
was suppose to. This caused strange kernel crashes.
This test detected the problem and pointed out where the issue was.
This modifies the parameters of one of the functions that the arch
specific code calls, so it includes changes to arch code to accommodate
the new prototype.
Note, I notice that the parsic arch implements its own push_return_trace.
This is now a generic function and the ftrace_push_return_trace should be
used instead. This patch does not touch that code.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Fix up the number of cells for the values of CPC925 Memory Controller,
and setup related platform device during system booting up, against
which CPC925 Memory Controller EDAC driver would be matched.
Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds support for the performance monitor hardware on the
MPC7450 family of processors (7450, 7451, 7455, 7447/7457, 7447A,
7448), used in the later Apple G4 powermacs/powerbooks and other
machines. These machines have 6 hardware counters with a unique
set of events which can be counted on each counter, with some
events being available on multiple counters.
Raw event codes for these processors are (PMC << 8) + PMCSEL.
If PMC is non-zero then the event is that selected by the given
PMCSEL value for that PMC (hardware counter). If PMC is zero
then the event selected is one of the low-numbered ones that are
common to several PMCs. In this case PMCSEL must be <= 22 and
the event is what that PMCSEL value would select on PMC1 (but
it may be placed any other PMC that has the same event for that
PMCSEL value).
For events that count cycles or occurrences that exceed a threshold,
the threshold requested can be specified in the 0x3f000 bits of the
raw event codes. If the event uses the threshold multiplier bit
and that bit should be set, that is indicated with the 0x40000 bit
of the raw event code.
This fills in some of the generic cache events. Unfortunately there
are quite a few blank spaces in the table, partly because these
processors tend to count cache hits rather than cache accesses.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linuxppc-dev@ozlabs.org
Cc: benh@kernel.crashing.org
LKML-Reference: <19000.55631.802122.696927@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This abstracts a few things in arch/powerpc/kernel/perf_counter.c
that are specific to 64-bit kernels, and provides definitions for
32-bit kernels. In particular,
* Only 64-bit has MMCRA and the bits in it that give information
about a PMU interrupt (sampled PR, HV, slot number etc.)
* Only 64-bit has the lppaca and the lppaca->pmcregs_in_use field
* Use of SDAR is confined to 64-bit for now
* Only 64-bit has soft/lazy interrupt disable and therefore
pseudo-NMIs (interrupts that occur while interrupts are soft-disabled)
* Only 64-bit has PMC7 and PMC8
* Only 64-bit has the MSR_HV bit.
This also fixes the types used in a couple of places, where we were
using long types for things that need to be 64-bit.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linuxppc-dev@ozlabs.org
Cc: benh@kernel.crashing.org
LKML-Reference: <19000.55590.634126.876084@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
At present, the powerpc generic (processor-independent) perf_counter
code has list of processor back-end modules, and at initialization,
it looks at the PVR (processor version register) and has a switch
statement to select a suitable processor-specific back-end.
This is going to become inconvenient as we add more processor-specific
back-ends, so this inverts the order: now each back-end checks whether
it applies to the current processor, and registers itself if so.
Furthermore, instead of looking at the PVR, back-ends now check the
cur_cpu_spec->oprofile_cpu_type string and match on that.
Lastly, each back-end now specifies a name for itself so the core can
print a nice message when a back-end registers itself.
This doesn't provide any support for unregistering back-ends, but that
wouldn't be hard to do and would allow back-ends to be modules.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linuxppc-dev@ozlabs.org
Cc: benh@kernel.crashing.org
LKML-Reference: <19000.55529.762227.518531@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This changes the powerpc perf_counter back-end to use unsigned long
types for hardware register values and for the value/mask pairs used
in checking whether a given set of events fit within the hardware
constraints. This is in preparation for adding support for the PMU
on some 32-bit powerpc processors. On 32-bit processors the hardware
registers are only 32 bits wide, and the PMU structure is generally
simpler, so 32 bits should be ample for expressing the hardware
constraints. On 64-bit processors, unsigned long is 64 bits wide,
so using unsigned long vs. u64 (unsigned long long) makes no actual
difference.
This makes some other very minor changes: adjusting whitespace to line
things up in initialized structures, and simplifying some code in
hw_perf_disable().
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linuxppc-dev@ozlabs.org
Cc: benh@kernel.crashing.org
LKML-Reference: <19000.55473.26174.331511@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This enables the perf_counter subsystem on 32-bit powerpc. Since we
don't have any support for hardware counters on 32-bit powerpc yet,
only software counters can be used.
Besides selecting HAVE_PERF_COUNTERS for 32-bit powerpc as well as
64-bit, the main thing this does is add an implementation of
set_perf_counter_pending(). This needs to arrange for
perf_counter_do_pending() to be called when interrupts are enabled.
Rather than add code to local_irq_restore as 64-bit does, the 32-bit
set_perf_counter_pending() generates an interrupt by setting the
decrementer to 1 so that a decrementer interrupt will become pending
in 1 or 2 timebase ticks (if a decrementer interrupt isn't already
pending). When interrupts are enabled, timer_interrupt() will be
called, and some new code in there calls perf_counter_do_pending().
We use a per-cpu array of flags to indicate whether we need to call
perf_counter_do_pending() or not.
This introduces a couple of new Kconfig symbols: PPC_HAVE_PMU_SUPPORT,
which is selected by processor families for which we have hardware PMU
support (currently only PPC64), and PPC_PERF_CTRS, which enables the
powerpc-specific perf_counter back-end.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linuxppc-dev@ozlabs.org
Cc: benh@kernel.crashing.org
LKML-Reference: <19000.55404.103840.393470@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* akpm: (182 commits)
fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset
fbdev: *bfin*: fix __dev{init,exit} markings
fbdev: *bfin*: drop unnecessary calls to memset
fbdev: bfin-t350mcqb-fb: drop unused local variables
fbdev: blackfin has __raw I/O accessors, so use them in fb.h
fbdev: s1d13xxxfb: add accelerated bitblt functions
tcx: use standard fields for framebuffer physical address and length
fbdev: add support for handoff from firmware to hw framebuffers
intelfb: fix a bug when changing video timing
fbdev: use framebuffer_release() for freeing fb_info structures
radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb?
s3c-fb: CPUFREQ frequency scaling support
s3c-fb: fix resource releasing on error during probing
carminefb: fix possible access beyond end of carmine_modedb[]
acornfb: remove fb_mmap function
mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF
mb862xxfb: restrict compliation of platform driver to PPC
Samsung SoC Framebuffer driver: add Alpha Channel support
atmel-lcdc: fix pixclock upper bound detection
offb: use framebuffer_alloc() to allocate fb_info struct
...
Manually fix up conflicts due to kmemcheck in mm/slab.c
Now we have __initconst, we can finally move the external declarations for
the various Linux logo structures to <linux/linux_logo.h>.
James' ack dates back to the previous submission (way to long ago), when the
logos were still __initdata, which caused failures on some platforms with some
toolchain versions.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: James Simmons <jsimmons@infradead.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* create mm/init-mm.c, move init_mm there
* remove INIT_MM, initialize init_mm with C99 initializer
* unexport init_mm on all arches:
init_mm is already unexported on x86.
One strange place is some OMAP driver (drivers/video/omap/) which
won't build modular, but it's already wants get_vm_area() export.
Somebody should look there.
[akpm@linux-foundation.org: add missing #includes]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the option to build the code under arch/powerpc with -Werror.
The intention is to make it harder for people to inadvertantly introduce
warnings in the arch/powerpc code. It needs to be configurable so that
if a warning is introduced, people can easily work around it while it's
being fixed.
The option is a negative, ie. don't enable -Werror, so that it will be
turned on for allyes and allmodconfig builds.
The default is n, in the hope that developers will build with -Werror,
that will probably lead to some build breaks, I am prepared to be flamed.
It's not enabled for math-emu, which is a steaming pile of warnings.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently the kernel expects the additional four IBAT and DBAT registers
to be available, but doesn't enable these registers on 745x CPUs, which
have them disabled after reset. Thus set the HIGH_BAT_EN bit in HID0
register, if the corresponding MMU feature is defined.
Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Some boot loaders may not enable L1 instruction/data cache. Check if
data and instruction caches are enabled, and enable them if needed.
Signed-off-by: Nate Case <ncase@xes-inc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (103 commits)
powerpc: Fix bug in move of altivec code to vector.S
powerpc: Add support for swiotlb on 32-bit
powerpc/spufs: Remove unused error path
powerpc: Fix warning when printing a resource_size_t
powerpc/xmon: Remove unused variable in xmon.c
powerpc/pseries: Fix warnings when printing resource_size_t
powerpc: Shield code specific to 64-bit server processors
powerpc: Separate PACA fields for server CPUs
powerpc: Split exception handling out of head_64.S
powerpc: Introduce CONFIG_PPC_BOOK3S
powerpc: Move VMX and VSX asm code to vector.S
powerpc: Set init_bootmem_done on NUMA platforms as well
powerpc/mm: Fix a AB->BA deadlock scenario with nohash MMU context lock
powerpc/mm: Fix some SMP issues with MMU context handling
powerpc: Add PTRACE_SINGLEBLOCK support
fbdev: Add PLB support and cleanup DCR in xilinxfb driver.
powerpc/virtex: Add ml510 reference design device tree
powerpc/virtex: Add Xilinx ML510 reference design support
powerpc/virtex: refactor intc driver and add support for i8259 cascading
powerpc/virtex: Add support for Xilinx PCI host bridge
...
This fixes a couple of compile warnings that crept into the powerpc
perf_counter code recently:
CC arch/powerpc/kernel/perf_counter.o
arch/powerpc/kernel/perf_counter.c: In function 'record_and_restart':
arch/powerpc/kernel/perf_counter.c:1016: warning: unused variable 'addr'
arch/powerpc/kernel/perf_counter.c: In function 'hw_perf_counter_init':
arch/powerpc/kernel/perf_counter.c:891: warning: 'ev' may be used uninitialized in this function
Stephen Rothwell reported this against linux-next as well.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18998.12884.787039.22202@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 28794d34 ("powerpc/kconfig: Kill PPC_MULTIPLATFORM"), added
CONFIG_PPC_OF_BOOT_TRAMPOLINE to control the buliding of prom_init.o
However the Makefile still unconditionally builds prom_init_check,
the script that checks prom_init.o for symbol usage, and so in turn
prom_init.o is still always being built. (it's not linked though)
So surround all the prom_init_check logic with an ifeq block testing
if CONFIG_PPC_OF_BOOT_TRAMPOLINE is set.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When CONFIG_RELOCATABLE is enabled, PHYSICAL_START is actually a
variable of type phys_addr_t. That means to print it we need to
cast to unsigned long long and use llx.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently we are wasting time calling the generic calibrate_delay()
function. We don't need it since our implementation of __delay() is
based on the CPU timebase. So instead, we use our own small
implementation that initializes loops_per_jiffy to something sensible
to make the few users like spinlock debug be happy
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Start documenting HAVE_PERF_COUNTERS requirements
perf_counter: Add forward/backward attribute ABI compatibility
perf record: Explicity program a default counter
perf_counter: Remove PERF_TYPE_RAW special casing
perf_counter: PERF_TYPE_HW_CACHE is a hardware counter too
powerpc, perf_counter: Fix performance counter event types
perf_counter/x86: Add a quirk for Atom processors
perf_counter tools: Remove one L1-data alias
Sachin Sant reported these compiler errors:
CC arch/powerpc/kernel/power7-pmu.o
arch/powerpc/kernel/power7-pmu.c:297: error: PERF_COUNT_CPU_CYCLES undeclared here (not in a function)
Which happened because a last-minute rename of symbols crossed with
the Power7 support patch.
Fix this by using the new symbol names.
Reported-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
LKML-Reference: <1244788494.5554.1.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Everyone cut and paste this comment from my original one. We now do
it generically, so cut the comments.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Amerigo Wang <amwang@redhat.com>
The patch that moved to vector.S and made common between 32 and 64-bit the
altivec code had a nasty bug on 32-bit (did I really test that ?) which
causes the kernel to blr back into userspace ... oops :-)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The top (fastest) and last level (biggest) caches are the most
interesting ones, performance wise.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
[ Fixed the Nehalem LL table to LLC Reference/Miss events ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This adds tables of event codes for the generalized cache events for
all the currently supported powerpc processors: POWER{4,5,5+,6,7} and
PPC970*, plus powerpc-specific code to use these tables when a
generalized cache event is requested.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18992.36430.933526.742969@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This adds the back-end for the PMU on POWER7 processors. POWER7
has 4 fully-programmable counters and two fixed-function counters
(which do respect the freeze conditions, can generate interrupts,
and are writable, unlike PMC5/6 on POWER5+/6).
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18992.36329.189378.17992@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We currently log hw.sample_period for PERF_SAMPLE_PERIOD, however this is
incorrect. When we adjust the period, it will only take effect the next
cycle but report it for the current cycle. So when we adjust the period
for every cycle, we're always wrong.
Solve this by keeping track of the last_period.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For easy extension of the sample data, put it in a structure.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch includes the basic infrastructure to use swiotlb
bounce buffering on 32-bit powerpc. It is not yet enabled on
any platforms. Probably the most interesting bit is the
addition of addr_needs_map to dma_ops - we need this as
a dma_op because the decision of whether or not an addr
can be mapped by a device is device-specific.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
resource_size_t is 64 bits on PowerPC 64.
Gets rid of this warning:
arch/powerpc/kernel/pci_64.c: In function 'pcibios_map_io_space':
arch/powerpc/kernel/pci_64.c:504: warning: format '%016lx' expects type 'long unsigned int', but argument 2 has type 'resource_size_t'
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is a random collection of added ifdef's around portions of
code that only mak sense on server processors. Using either
CONFIG_PPC_STD_MMU_64 or CONFIG_PPC_BOOK3S as seems appropriate.
This is meant to make the future merging of Book3E 64-bit support
easier.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch has no effect other than re-ordering PACA fields on
current server CPUs. It however is a pre-requisite for future
support of BookE 64-bit processors. Various parts of the PACA
struct are now moved under some ifdef's, either the new
CONFIG_PPC_BOOK3S or CONFIG_PPC_STD_MMU_64, whatever seems more
appropriate.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.craashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To prepare for future support of Book3E 64-bit PowerPC processors,
which use a completely different exception handling, we move that
code to a new exceptions-64s.S file.
This file is #included from head_64.S due to some of the absolute
address requirements which can currently only be fulfilled from
within that file.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, load_up_altivec and give_up_altivec are duplicated
in 32-bit and 64-bit. This creates a common implementation that
is moved away from head_32.S, head_64.S and misc_64.S and into
vector.S, using the same macros we already use for our common
implementation of load_up_fpu.
I also moved the VSX code over to vector.S though in that case
I didn't make it build on 32-bit (yet).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reworked by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds block-step support on powerpc, including a PTRACE_SINGLEBLOCK
request for ptrace.
The BookE implementation is tweaked to fire a single step after a
block step in order to mimmic the server behaviour.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Counter type is a frequently used value and we do a lot of
bit juggling by encoding and decoding it from attr->config.
Clean this up by creating a separate attr->type field.
Also clean up the various similarly complex user-space bits
all around counter attribute management.
The net improvement is significant, and it will be easier
to add a new major type (which is what triggered this cleanup).
(This changes the ABI, all tools are adapted.)
(PowerPC build-tested.)
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit b23f3325 ("perf_counter: Rename various fields") fixed up
most of the uses of the renamed fields, but missed one instance
of "record_type" in powerpc-specific code which needs to be changed
to "sample_type", and a "PERF_RECORD_ADDR" in the same statement that
needs to be changed to "PERF_SAMPLE_ADDR", causing compilation
errors on powerpc. This fixes it.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18983.3111.770392.800486@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When using interrupting counters and limited (non-interrupting)
counters at the same time, it's possible that we get an
interrupt in write_mmcr0() after writing MMCR0 but before we
have set up the counters using limited PMCs. What happens then
is that we get into perf_counter_interrupt() with
counter->hw.idx = 0 for the limited counters, leading to the
"oops trying to read PMC0" error message being printed.
This fixes the problem by making perf_counter_interrupt()
robust against counter->hw.idx being zero (the counter is just
ignored in that case) and also by changing write_mmcr0() to
write MMCR0 initially with the counter overflow interrupt
enable bits masked (set to 0). If the MMCR0 value requested by
the caller has either of those bits set, we write MMCR0 again
with the requested value of those bits after setting up the
limited counters properly.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Stephane Eranian <eranian@googlemail.com>
LKML-Reference: <18982.17684.138182.954599@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit ef923214 ("perf_counter: powerpc: use u64 for event
codes internally") introduced a bug where the return value from
function find_alternative_bdecode gets put into a u64 variable
and later tested to see if it is < 0. The effect is that we
get extra, bogus event code alternatives on POWER5 and POWER5+,
leading to error messages such as "oops compute_mmcr failed"
being printed and counters not counting properly.
This fixes it by using s64 for the return type of
find_alternative_bdecode and for the local variable that the
caller puts the value in. It also makes the event argument a
u64 on POWER5+ for consistency.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Stephane Eranian <eranian@googlemail.com>
LKML-Reference: <18982.17586.666132.90983@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The structure isn't hw only and when I read event, I think about those
things that fall out the other end. Rename the thing.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Stephane Eranian <eranian@googlemail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
A few renames:
s/irq_period/sample_period/
s/irq_freq/sample_freq/
s/PERF_RECORD_/PERF_SAMPLE_/
s/record_type/sample_type/
And change both the new sample_type and read_format to u64.
Reported-by: Stephane Eranian <eranian@googlemail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It was __devinit, but it is also within a CONFIG_HOTPLUG guarded section
of code, so the __devinit does nothing but cause the following warning:
WARNING: vmlinux.o(.text+0x107a8): Section mismatch in reference from the function pcibios_finish_adding_to_bus() to the function .devinit.text:pcibios_claim_one_bus()
The function pcibios_finish_adding_to_bus() references
the function __devinit pcibios_claim_one_bus().
This is often because pcibios_finish_adding_to_bus lacks a __devinit
annotation or the annotation of pcibios_claim_one_bus is wrong.
It is also only (externally) used in arch/powerpc/kernel/of_platform.c
which cannot be built as a module so don't export it.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There's no need to wrap PPC_INST_NOP in a static inline.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
These macros were used in the original port, but since commit
e4486fe316 (ftrace, use probe_kernel API to modify code) they
are unused.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use ppc_function_entry() from code-patching.h.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch updates the output from /proc/ppc64/lparcfg to display the
processor virtualization resource allocations for a shared processor
partition.
This information is already gathered via the h_get_ppp call, we just
have to make sure that the ibm,partition-performance-parameters-level
property is >= 1 to ensure that the information is valid.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6
based - to pick up the latest upstream fixes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The implementation we just revived has issues, such as using a
Kconfig-defined virtual address area in kernel space that nothing
actually carves out (and thus will overlap whatever is there),
or having some dependencies on being self contained in a single
PTE page which adds unnecessary constraints on the kernel virtual
address space.
This fixes it by using more classic PTE accessors and automatically
locating the area for consistent memory, carving an appropriate hole
in the kernel virtual address space, leaving only the size of that
area as a Kconfig option. It also brings some dma-mask related fixes
from the ARM implementation which was almost identical initially but
grew its own fixes.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This implements interrupt throttling on powerpc. Since we don't have
individual count enable/disable or interrupt enable/disable controls
per counter, this simply sets the hardware counter to 0, meaning that
it will not interrupt again until it has counted 2^31 counts, which
will take at least 2^30 cycles assuming a maximum of 2 counts per
cycle. Also, we set counter->hw.period_left to the maximum possible
value (2^63 - 1), so we won't report overflows for this counter for
the forseeable future.
The unthrottle operation restores counter->hw.period_left and the
hardware counter so that we will once again report a counter overflow
after counter->hw.irq_period counts.
[ Impact: new perfcounters robustness feature on PowerPC ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <18971.35823.643362.446774@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If CONFIG_PPC_EMULATED_STATS is enabled, make available counters for the
various classes of emulated instructions under
/sys/kernel/debug/powerpc/emulated_instructions/ (assumed debugfs is mounted on
/sys/kernel/debug). Optionally (controlled by
/sys/kernel/debug/powerpc/emulated_instructions/do_warn), rate-limited warnings
can be printed to the console when instructions are emulated.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I have been looking at sources of OS jitter and notice that after a long
NO_HZ idle period we wakeup too early:
relative time (us) event
timer irq exit
999946.405 timer irq entry
4.835 timer irq exit
21.685 timer irq entry
3.540 timer (tick_sched_timer) entry
Here we slept for just under a second then took a timer interrupt that did
nothing. 21.685 us later we wake up again and do the work.
We set a rather low shift value of 16 for the decrementer clockevent, which I
think is causing this issue. On this box we have a 207MHz decrementer and see:
clockevent: decrementer mult[3501] shift[16] cpu[0]
For calculations of large intervals this mult/shift combination could be
off by a significant amount. I notice the sparc code has a loop that iterates
to find a mult/shift combination that maximises the shift value while
keeping mult under 32bit. With the patch below we get:
clockevent: decrementer mult[35015c20] shift[32] cpu[15]
And we no longer see the spurious wakeups.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The pcnet32 driver has had in its device id table for some time a match
against the "broken" vendor ID. No need to fake out the vendor ID
with an explicit fixup.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* Removed building setup-irq on ppc32, we don't use it anymore
* Remove duplicate prototype for setup_grackle() code that needs it
gets it from <asm/grackle.h>
* Removed gratuitous pci_io_size type differences between ppc32/ppc64
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There doesn't appear to be any specific reason that we need to setup the
pseries specific notifier in generic arch pci code. Move it into pseries
land.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We shouldn't directly access sysdata to get the device node but call
pci_bus_to_OF_node() for this purpose.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds the PowerPC 2.06 tlbie mnemonics and keeps backwards
compatibilty for CPUs before 2.06.
Only useful for bare metal systems.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We should no longer have any irq code that needs __do_IRQ(), so
remove the fallback to __do_IRQ().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The guts of do_IRQ() isn't really the right place to be documenting
the ppc_md.get_irq() interface. So move the comment into machdep.h
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Makes do_IRQ() shorter and clearer.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Rather than a giant ifdef in the body of do_IRQ(), including a
dangling else, move the irq stack logic into a separate routine and
do the ifdef there.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 9e35ad38 ("perf_counter: Rework the perf counter
disable/enable") added code to the powerpc hw_perf_enable (renamed
from hw_perf_restore) to test cpuhw->disabled and return immediately
if it is not set (i.e. if the PMU is already enabled).
Unfortunately the test got added before cpuhw was initialized,
resulting in an oops the first time hw_perf_enable got called.
This fixes it by moving the initialization of cpuhw to before
cpuhw->disabled is tested.
[ Impact: fix oops-causing bug on powerpc ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <18960.56772.869734.304631@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I don't think anything guarantees that the objects in data.page_aligned
are a multiple of PAGE_SIZE, thus the section may end on any boundary.
So the following section, .data.cacheline_aligned needs an explicit
alignment.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
After upgrading my distcc boxes from gcc 4.2.2 to 4.4.0, the function
graph tracer broke. This was discovered on my x86 boxes.
The issue is that gcc used the same register for an output as it did for
an input in an asm statement. I first thought this was a bug in gcc and
reported it. I was notified that gcc was correct and that the output had
to be flagged as an "early clobber".
I noticed that powerpc had the same issue and this patch fixes it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
pr_debug() can now result in code being generated even when #DEBUG
is not defined. That's not really desirable in the ftrace code
which we want to be snappy.
With CONFIG_DYNAMIC_DEBUG=y:
size before:
text data bss dec hex filename
3334 672 4 4010 faa arch/powerpc/kernel/ftrace.o
size after:
text data bss dec hex filename
2616 360 4 2980 ba4 arch/powerpc/kernel/ftrace.o
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This uses values from the MMCRA, SIAR and SDAR registers on
powerpc to supply more precise information for overflow events,
including a data address when PERF_RECORD_ADDR is specified.
Since POWER6 uses different bit positions in MMCRA from earlier
processors, this converts the struct power_pmu limited_pmc5_6
field, which only had 0/1 values, into a flags field and
defines bit values for its previous use (PPMU_LIMITED_PMC5_6)
and a new flag (PPMU_ALT_SIPR) to indicate that the processor
uses the POWER6 bit positions rather than the earlier
positions. It also adds definitions in reg.h for the new and
old positions of the bit that indicates that the SIAR and SDAR
values come from the same instruction.
For the data address, the SDAR value is supplied if we are not
doing instruction sampling. In that case there is no guarantee
that the address given in the PERF_RECORD_ADDR subrecord will
correspond to the instruction whose address is given in the
PERF_RECORD_IP subrecord.
If instruction sampling is enabled (e.g. because this counter
is counting a marked instruction event), then we only supply
the SDAR value for the PERF_RECORD_ADDR subrecord if it
corresponds to the instruction whose address is in the
PERF_RECORD_IP subrecord. Otherwise we supply 0.
[ Impact: support more PMU hardware features on PowerPC ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <18955.37028.48861.555309@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Although the perf_counter API allows 63-bit raw event codes,
internally in the powerpc back-end we had been using 32-bit
event codes. This expands them to 64 bits so that we can add
bits for specifying threshold start/stop events and instruction
sampling modes later.
This also corrects the return value of can_go_on_limited_pmc;
we were returning an event code rather than just a 0/1 value in
some circumstances. That didn't particularly matter while event
codes were 32-bit, but now that event codes are 64-bit it
might, so this fixes it.
[ Impact: extend PowerPC perfcounter interfaces from u32 to u64 ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <18955.36874.472452.353104@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Instead of specifying the irq_period for a counter, provide a target interrupt
frequency and dynamically adapt the irq_period to match this frequency.
[ Impact: new perf-counter attribute/feature ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <20090515132018.646195868@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The current disable/enable mechanism is:
token = hw_perf_save_disable();
...
/* do bits */
...
hw_perf_restore(token);
This works well, provided that the use nests properly. Except we don't.
x86 NMI/INT throttling has non-nested use of this, breaking things. Therefore
provide a reference counter disable/enable interface, where the first disable
disables the hardware, and the last enable enables the hardware again.
[ Impact: refactor, simplify the PMU disable/enable logic ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
A couple of issues crept in since about 2.6.27 related to accessing PCI
device ROMs on various powerpc machines.
First, historically, we don't allocate the ROM resource in the resource
tree. I'm not entirely certain of why, I susepct they often contained
garbage on x86 but it's hard to tell. This causes the current generic
code to always call pci_assign_resource() when trying to access the said
ROM from sysfs, which will try to re-assign some new address regardless
of what the ROM BAR was already set to at boot time. This can be a
problem on hypervisor platforms like pSeries where we aren't supposed
to move PCI devices around (and in fact probably can't).
Second, our code that generates the PCI tree from the OF device-tree
(instead of doing config space probing) which we mostly use on pseries
at the moment, didn't set the (new) flag IORESOURCE_SIZEALIGN on any
resource. That means that any attempt at re-assigning such a resource
with pci_assign_resource() would fail due to resource_alignment()
returning 0.
This fixes this by doing these two things:
- The code that calculates resource flags based on the OF device-node
is improved to set IORESOURCE_SIZEALIGN on any valid BAR, and while at
it also set IORESOURCE_READONLY for ROMs since we were lacking that too
- We now allocate ROM resources as part of the resource tree. However
to limit the chances of nasty conflicts due to busted firmwares, we
only do it on the second pass of our two-passes allocation scheme,
so that all valid and enabled BARs get precedence.
This brings pSeries back the ability to access PCI ROMs via sysfs (and
thus initialize various video cards from X etc...).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
My previous pach for fixing the oprofile CPU type got somewhat mismerged
(by my fault) when it collided with another related patch. This should
finally (fingers crossed) fix the whole thing.
We make sure we keep the -old- oprofile type and CPU type whenever
one of them was specified in the first pass through the function.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We're currently choking on mem=4g (and above) due to memory_limit
being specified as an unsigned long. Make memory_limit
phys_addr_t to fix this.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
commit 2657dd4e30 introduced a
bug where we would now always override the "real" oprofile CPU
type with the "compatible" one provided by a pseudo-PVR in the
device-tree which is incorrect and breaks oprofile on all current
configs since the "compatible" ones aren't yet recognized.
This fixes it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Oprofile is changing the naming it is using for the compatibility modes.
Instead of having compat-power<x>, oprofile will go to family naming
convention and use ibm-compat-v<x>. Currently only ibm-compat-v1 will
be defined.
The notion of compatibility events just started with POWER6. So there is
no way that any other tool could exist that is using these
oprofile_cpu_type strings we want to change.
Signed-off-by: Mike Wolf <mjw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
POWER5+ and POWER6 have two hardware counters with limited functionality:
PMC5 counts instructions completed in run state and PMC6 counts cycles
in run state. (Run state is the state when a hardware RUN bit is 1;
the idle task clears RUN while waiting for work to do and sets it when
there is work to do.)
These counters can't be written to by the kernel, can't generate
interrupts, and don't obey the freeze conditions. That means we can
only use them for per-task counters (where we know we'll always be in
run state; we can't put a per-task counter on an idle task), and only
if we don't want interrupts and we do want to count in all processor
modes.
Obviously some counters can't go on a limited hardware counter, but there
are also situations where we can only put a counter on a limited hardware
counter - if there are already counters on that exclude some processor
modes and we want to put on a per-task cycle or instruction counter that
doesn't exclude any processor mode, it could go on if it can use a
limited hardware counter.
To keep track of these constraints, this adds a flags argument to the
processor-specific get_alternatives() functions, with three bits defined:
one to say that we can accept alternative event codes that go on limited
counters, one to say we only want alternatives on limited counters, and
one to say that this is a per-task counter and therefore events that are
gated by run state are equivalent to those that aren't (e.g. a "cycles"
event is equivalent to a "cycles in run state" event). These flags
are computed for each counter and stored in the counter->hw.counter_base
field (slightly wonky name for what it does, but it was an existing
unused field).
Since the limited counters don't freeze when we freeze the other counters,
we need some special handling to avoid getting skew between things counted
on the limited counters and those counted on normal counters. To minimize
this skew, if we are using any limited counters, we read PMC5 and PMC6
immediately after setting and clearing the freeze bit. This is done in
a single asm in the new write_mmcr0() function.
The code here is specific to PMC5 and PMC6 being the limited hardware
counters. Being more general (e.g. having a bitmap of limited hardware
counter numbers) would have meant more complex code to read the limited
counters when freezing and unfreezing the normal counters, with
conditional branches, which would have increased the skew. Since it
isn't necessary for the code to be more general at this stage, it isn't.
This also extends the back-ends for POWER5+ and POWER6 to be able to
handle up to 6 counters rather than the 4 they previously handled.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <18936.19035.163066.892208@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch renames struct hw_perf_counter_ops into struct pmu. It
introduces a structure to describe a cpu specific pmu (performance
monitoring unit). It may contain ops and data. The new name of the
structure fits better, is shorter, and thus better to handle. Where it
was appropriate, names of function and variable have been changed too.
[ Impact: cleanup ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-7-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
powerpc/ps3: Fix build error on UP
powerpc/cell: Select PCI for IBM_CELL_BLADE AND CELLEB
powerpc: ppc32 needs elf_read_implies_exec()
powerpc/86xx: Add device_type entry to soc for ppc9a
powerpc/44x: Correct memory size calculation for denali-based boards
maintainers: Fix PowerPC 4xx git tree
powerpc: fix for long standing bug noticed by gcc 4.4.0
Revert "powerpc: Add support for early tlbilx opcode"
Commit edada399 broke the build on 64-bit powerpc because it moved the
__ftr_alt_* sections of a file away from the .text section, causing
link failures due to relative conditional branch targets being too far
away from the branch instructions. This happens on pretty much all
64-bit powerpc configs.
This change reverts commit edada399 while preserving the update from
the *.refok sections to .ref.text that has happened since.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Requested-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rather than adding .ref.text to the powerpc linker script so that we
can use __REF on the powerpc architecture, it seems simpler to switch
to using the generic TEXT_TEXT macro.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
powerpc: Fix modular build of ide-pmac when mediabay is built in
powerpc/pasemi: Fix build error on UP
powerpc: Make macintosh/mediabay driver depend on CONFIG_BLOCK
maintainers: Fix PS3 patterns
powerpc/ps3: Fix CONFIG_PS3_FLASH=n build warning
powerpc/32: Don't clobber personality flags on exec
powerpc: Fix crash on CPU hotplug
powerpc/85xx: Remove defconfigs that mpc85xx_{smp_}defconfig cover
powerpc/85xx: Added SMP defconfig
powerpc/85xx: Enabled a bunch of FSL specific drivers/options
powerpc/85xx: Updated generic mpc85xx_defconfig
powerpc: don't disable SATA interrupts on Freescale MPC8610 HPCD
fsl_rio: Pass the proper device to dma mapping routines
powerpc: Fix of_node_put() exit path in of_irq_map_one()
powerpc/5200: defconfig updates
powerpc/5200: Add FLASH nodes to lite5200 device tree
powerpc/device-tree: Document MTD nodes with multiple "reg" tuples
powerpc/of-device-tree: Factor MTD physmap bindings out of booting-without-of
powerpc/5200: Bring the legacy fsl_spi_platform_data hooks back
This reverts commit e996557740. Our HW
guys were able to fix this so it never sees the light of day.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Pass clocksource pointer to the read() callback for clocksources. This
allows us to share the callback between multiple instances.
[hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods]
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
powerpc: pseries/dtl.c should include asm/firmware.h
powerpc: Fix data-corrupting bug in __futex_atomic_op
powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset()
powerpc: Allow 256kB pages with SHMEM
powerpc: Document new FSL I2C bindings and cleanup
powerpc/mm: Fix compile warning
powerpc/85xx: TQM8548: update defconfig
powerpc/85xx: TQM8548: use proper phy-handles for enet2 and enet3
powerpc/85xx: TQM85xx: correct address of LM75 I2C device nodes
powerpc: Add support for early tlbilx opcode
powerpc: Fix tlbilx opcode
Impact: fix potential deadlocks on powerpc
Now that the core is using in_nmi() (added in e30e08f6, "perf_counter:
fix NMI race in task clock"), we need the powerpc perf_counter_interrupt
to call nmi_enter() and nmi_exit() in those cases where the interrupt
happens when interrupts are soft-disabled.
If interrupts were soft-enabled, we can treat it as a regular interrupt
and do irq_enter/irq_exit around the whole routine. This lets us get rid
of the test_perf_counter_pending() call at the end of
perf_counter_interrupt, thus simplifying things a little.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18909.31952.873098.336615@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Paul suggested we allow for data addresses to be recorded along with
the traditional IPs as power can provide these.
For now, only the software pagefault events provide data addresses,
but in the future power might as well for some events.
x86 doesn't seem capable of providing this atm.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130409.394816925@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: enable access to hardware feature
POWER processors have the ability to "mark" a subset of the instructions
and provide more detailed information on what happens to the marked
instructions as they flow through the pipeline. This marking is
enabled by the "sample enable" bit in MMCRA, and there are
synchronization requirements around setting and clearing the bit.
This adds logic to the processor-specific back-ends so that they know
which events relate to marked instructions and set the sampling enable
bit if any event that we want to put on the PMU is a marked instruction
event. It also adds logic to the generic powerpc code to do the
necessary synchronization if that bit is set.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18908.31930.1024.228867@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 4af4998b ("perf_counter: rework context time") changed struct
perf_counter_context to have a 'time' field instead of a 'time_now'
field, but neglected to fix the place in the powerpc perf_counter.c
where the time_now field was accessed. This fixes it.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18908.31922.411398.147810@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/powerpc/include/asm/systbl.h
arch/powerpc/include/asm/unistd.h
include/linux/init_task.h
Merge reason: the conflicts are non-trivial: PowerPC placement
of sys_perf_counter_open has to be mixed with the
new preadv/pwrite syscalls.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prepare for more generic overflow handling. The new perf_counter_overflow()
method will handle the generic bits of the counter overflow, and can return
a !0 return value, in which case the counter should be (soft) disabled, so
that it won't count until it's properly disabled.
XXX: do powerpc and swcounter
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090406094517.812109629@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
During the ISA 2.06 development the opcode for tlbilx changed and some
early implementations used to old opcode. Add support for a MMU_FTR
fixup to deal with this.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
'tramp' is an unsigned long, so print it with %lx.
Fixes the following build warning:
arch/powerpc/kernel/ftrace.c:291: error: format ‘%x’ expects type ‘unsigned int’, but argument 2 has type ‘long unsigned int’
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit bb7253403f ("powerpc64,
ftrace: save toc only on modules for function graph"), added an
#if CONFIG_PPC64. This changes it to #ifdef.
Fixes the following warning on 32-bit builds:
arch/powerpc/kernel/ftrace.c:562:5: error: "CONFIG_PPC64" is not defined
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The ptrace compat wrapper mishandles access to the fpu registers. The
PTRACE_PEEKUSR and PTRACE_POKEUSR requests miscalculate the index into
the fpr array due to the broken FPINDEX macro. The
PPC_PTRACE_PEEKUSR_3264 request needs to use the same formula that the
native ptrace interface uses when operating on the register number (as
opposed to the 4-byte offset). The PPC_PTRACE_POKEUSR_3264 request
didn't take TS_FPRWIDTH into account.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The irq remapping layer seems to cause some confusion when people
see a different irq number in /proc/interrupts vs the one they
request in their driver or DTS.
So have the irq remapping layer print out a message when we map an
irq. The message is only printed the first time the irq is mapped,
and it's KERN_DEBUG so most people won't see it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When we call giveup_fpu, we need to need to turn off VSX for the
current process. If we don't, on return to userspace it may execute a
VSX instruction before the next FP instruction, and not have its
register state refreshed correctly from the thread_struct. Ditto for
altivec.
This caused a bug where an unaligned lfs or stfs results in
fix_alignment calling giveup_fpu so it can use the FPRs (in order to
do a single <-> double conversion), and then returning to userspace
with FP off but VSX on. Then if a VSX instruction is executed, before
another FP instruction, it will proceed without another exception and
hence have the incorrect register state for VSX registers 0-31.
lfs unaligned <- alignment exception turns FP off but leaves VSX on
VSX instruction <- no exception since VSX on, hence we get the
wrong VSX register values for VSX registers 0-31,
which overlap the FPRs.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
PHYP tells us how often a shared processor dispatch changed physical cpus.
This can highlight performance problems caused by the hypervisor.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Make all messages consistent, some have spaces before the "...", some do not.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The ibm,client-architecture method will often cause a reconfiguration reboot.
When this happens the last thing we see is:
Hypertas detected, assuming LPAR !
Which doesn't explain what just happened. Wrap the ibm,client-architecture
so it's clear what is going on:
Calling ibm,client-architecture... done
In order to maintain the law of conservation of screen real estate, downgrade
two other messages to debug.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Impact: better error reporting
At present, if hw_perf_counter_init encounters an error, all it can do
is return NULL, which causes sys_perf_counter_open to return an EINVAL
error to userspace. This isn't very informative for userspace; it means
that userspace can't tell the difference between "sorry, oprofile is
already using the PMU" and "we don't support this CPU" and "this CPU
doesn't support the requested generic hardware event".
This commit uses the PTR_ERR/ERR_PTR/IS_ERR set of macros to let
hw_perf_counter_init return an error code on error rather than just NULL
if it wishes. If it does so, that error code will be returned from
sys_perf_counter_open to userspace. If it returns NULL, an EINVAL
error will be returned to userspace, as before.
This also adapts the powerpc hw_perf_counter_init to make use of this
to return ENXIO, EINVAL, EBUSY, or EOPNOTSUPP as appropriate. It would
be good to add extra error numbers in future to allow userspace to
distinguish the various errors that are currently reported as EINVAL,
i.e. irq_period < 0, too many events in a group, conflict between
exclude_* settings in a group, and PMU resource conflict in a group.
[ v2: fix a bug pointed out by Corey Ashford where error returns from
hw_perf_counter_init were not handled correctly in the case of
raw hardware events.]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Orig-LKML-Reference: <20090330171023.682428180@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cooperate with oprofile
At present, on PowerPC, if you have perf_counters compiled in, oprofile
doesn't work. There is code to allow the PMU to be shared between
competing subsystems, such as perf_counters and oprofile, but currently
the perf_counter subsystem reserves the PMU for itself at boot time,
and never releases it.
This makes perf_counter play nicely with oprofile. Now we keep a count
of how many perf_counter instances are counting hardware events, and
reserve the PMU when that count becomes non-zero, and release the PMU
when that count becomes zero. This means that it is possible to have
perf_counters compiled in and still use oprofile, as long as there are
no hardware perf_counters active. This also means that if oprofile is
active, sys_perf_counter_open will fail if the hw_event specifies a
hardware event.
To avoid races with other tasks creating and destroying perf_counters,
we use a mutex. We use atomic_inc_not_zero and atomic_add_unless to
avoid having to take the mutex unless there is a possibility of the
count going between 0 and 1.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Orig-LKML-Reference: <20090330171023.627912475@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
While going over the wakeup code I noticed delayed wakeups only work
for hardware counters but basically all software counters rely on
them.
This patch unifies and generalizes the delayed wakeup to fix this
issue.
Since we're dealing with NMI context bits here, use a cmpxchg() based
single link list implementation to track counters that have pending
wakeups.
[ This should really be generic code for delayed wakeups, but since we
cannot use cmpxchg()/xchg() in generic code, I've let it live in the
perf_counter code. -- Eric Dumazet could use it to aggregate the
network wakeups. ]
Furthermore, the x86 method of using TIF flags was flawed in that its
quite possible to end up setting the bit on the idle task, loosing the
wakeup.
The powerpc method uses per-cpu storage and does appear to be
sufficient.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Orig-LKML-Reference: <20090330171023.153932974@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: new functionality
Currently, if there are more counters enabled than can fit on the CPU,
the kernel will multiplex the counters on to the hardware using
round-robin scheduling. That isn't too bad for sampling counters, but
for counting counters it means that the value read from a counter
represents some unknown fraction of the true count of events that
occurred while the counter was enabled.
This remedies the situation by keeping track of how long each counter
is enabled for, and how long it is actually on the cpu and counting
events. These times are recorded in nanoseconds using the task clock
for per-task counters and the cpu clock for per-cpu counters.
These values can be supplied to userspace on a read from the counter.
Userspace requests that they be supplied after the counter value by
setting the PERF_FORMAT_TOTAL_TIME_ENABLED and/or
PERF_FORMAT_TOTAL_TIME_RUNNING bits in the hw_event.read_format field
when creating the counter. (There is no way to change the read format
after the counter is created, though it would be possible to add some
way to do that.)
Using this information it is possible for userspace to scale the count
it reads from the counter to get an estimate of the true count:
true_count_estimate = count * total_time_enabled / total_time_running
This also lets userspace detect the situation where the counter never
got to go on the cpu: total_time_running == 0.
This functionality has been requested by the PAPI developers, and will
be generally needed for interpreting the count values from counting
counters correctly.
In the implementation, this keeps 5 time values (in nanoseconds) for
each counter: total_time_enabled and total_time_running are used when
the counter is in state OFF or ERROR and for reporting back to
userspace. When the counter is in state INACTIVE or ACTIVE, it is the
tstamp_enabled, tstamp_running and tstamp_stopped values that are
relevant, and total_time_enabled and total_time_running are determined
from them. (tstamp_stopped is only used in INACTIVE state.) The
reason for doing it like this is that it means that only counters
being enabled or disabled at sched-in and sched-out time need to be
updated. There are no new loops that iterate over all counters to
update total_time_enabled or total_time_running.
This also keeps separate child_total_time_running and
child_total_time_enabled fields that get added in when reporting the
totals to userspace. They are separate fields so that they can be
atomic. We don't want to use atomics for total_time_running,
total_time_enabled etc., because then we would have to use atomic
sequences to update them, which are slower than regular arithmetic and
memory accesses.
It is possible to measure total_time_running by adding a task_clock
counter to each group of counters, and total_time_enabled can be
measured approximately with a top-level task_clock counter (though
inaccuracies will creep in if you need to disable and enable groups
since it is not possible in general to disable/enable the top-level
task_clock counter simultaneously with another group). However, that
adds extra overhead - I measured around 15% increase in the context
switch latency reported by lat_ctx (from lmbench) when a task_clock
counter was added to each of 2 groups, and around 25% increase when a
task_clock counter was added to each of 4 groups. (In both cases a
top-level task-clock counter was also added.)
In contrast, the code added in this commit gives better information
with no overhead that I could measure (in fact in some cases I
measured lower times with this code, but the differences were all less
than one standard deviation).
[ v2: address review comments by Andrew Morton. ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Orig-LKML-Reference: <18890.6578.728637.139402@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Rework the perfcounter output ABI
use sys_read() only for instant data and provide mmap() output for all
async overflow data.
The first mmap() determines the size of the output buffer. The mmap()
size must be a PAGE_SIZE multiple of 1+pages, where pages must be a
power of 2 or 0. Further mmap()s of the same fd must have the same
size. Once all maps are gone, you can again mmap() with a new size.
In case of 0 extra pages there is no data output and the first page
only contains meta data.
When there are data pages, a poll() event will be generated for each
full page of data. Furthermore, the output is circular. This means
that although 1 page is a valid configuration, its useless, since
we'll start overwriting it the instant we report a full page.
Future work will focus on the output format (currently maintained)
where we'll likey want each entry denoted by a header which includes a
type and length.
Further future work will allow to splice() the fd, also containing the
async overflow data -- splice() would be mutually exclusive with
mmap() of the data.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Orig-LKML-Reference: <20090323172417.470536358@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: new feature giving performance improvement
This adds the ability for userspace to do an mmap on a hardware counter
fd and get access to a read-only page that contains the information
needed to translate a hardware counter value to the full 64-bit
counter value that would be returned by a read on the fd. This is
useful on architectures that allow user programs to read the hardware
counters, such as PowerPC.
The mmap will only succeed if the counter is a hardware counter
monitoring the current process.
On my quad 2.5GHz PowerPC 970MP machine, userspace can read a counter
and translate it to the full 64-bit value in about 30ns using the
mmapped page, compared to about 830ns for the read syscall on the
counter, so this does give a significant performance improvement.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Orig-LKML-Reference: <20090323172417.297057964@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since the bitfields turned into a bit of a mess, remove them and rely on
good old masks.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Orig-LKML-Reference: <20090323172417.059499915@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix for powerpc
Commit db3a944aca35ae61 ("perf_counter: revamp syscall input ABI")
expanded the hw_event.type field into a union of structs containing
bitfields. In particular it introduced a type field and a raw_type
field, with the intention that the 1-bit raw_type field should
overlay the most-significant bit of the 8-bit type field, and in fact
perf_counter_alloc() now assumes that (or at least, assumes that
raw_type doesn't overlay any of the bits that are 1 in the values of
PERF_TYPE_{HARDWARE,SOFTWARE,TRACEPOINT}).
Unfortunately this is not true on big-endian systems such as PowerPC,
where bitfields are laid out from left to right, i.e. from most
significant bit to least significant. This means that setting
hw_event.type = PERF_TYPE_SOFTWARE will set hw_event.raw_type to 1.
This fixes it by making the layout depend on whether or not
__BIG_ENDIAN_BITFIELD is defined. It's a bit ugly, but that's what
we get for using bitfields in a user/kernel ABI.
Also, that commit didn't fix up some places in arch/powerpc/kernel/
perf_counter.c where hw_event.raw and hw_event.event_id were used.
This fixes them too.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Impact: cleanup
This updates the powerpc perf_counter_interrupt following on from the
"perf_counter: unify irq output code" patch. Since we now use the
generic perf_counter_output code, which sets the perf_counter_pending
flag directly, we no longer need the need_wakeup variable.
This removes need_wakeup and makes perf_counter_interrupt use
get_perf_counter_pending() instead.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Orig-LKML-Reference: <20090319194234.024464535@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Having 3 slightly different copies of the same code around does nobody
any good. First step in revamping the output format.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Orig-LKML-Reference: <20090319194233.929962222@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: modify ABI
The hardware/software classification in hw_event->type became a little
strained due to the addition of tracepoint tracing.
Instead split up the field and provide a type field to explicitly specify
the counter type, while using the event_id field to specify which event to
use.
Raw counters still work as before, only the raw config now goes into
raw_event.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Orig-LKML-Reference: <20090319194233.836807573@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix for powerpc
Commit bd753921015e7905 ("perf_counter: software counter event
infrastructure") introduced a use of TIF_PERF_COUNTERS into the core
perfcounter code. This breaks the build on powerpc because we use
a flag in a per-cpu area to signal wakeups on powerpc rather than
a thread_info flag, because the thread_info flags have to be
manipulated with atomic operations and are thus slower than per-cpu
flags.
This fixes the by changing the core to use an abstracted
set_perf_counter_pending() function, which is defined on x86 to set
the TIF_PERF_COUNTERS flag and on powerpc to set the per-cpu flag
(paca->perf_counter_pending). It changes the previous powerpc
definition of set_perf_counter_pending to not take an argument and
adds a clear_perf_counter_pending, so as to simplify the definition
on x86.
On x86, set_perf_counter_pending() is defined as a macro. Defining
it as a static inline in arch/x86/include/asm/perf_counters.h causes
compile failures because <asm/perf_counters.h> gets included early in
<linux/sched.h>, and the definitions of set_tsk_thread_flag etc. are
therefore not available in <asm/perf_counters.h>. (On powerpc this
problem is avoided by defining set_perf_counter_pending etc. in
<asm/hw_irq.h>.)
Signed-off-by: Paul Mackerras <paulus@samba.org>
Merge reason: we have gathered quite a few conflicts, need to merge upstream
Conflicts:
arch/powerpc/kernel/Makefile
arch/x86/ia32/ia32entry.S
arch/x86/include/asm/hardirq.h
arch/x86/include/asm/unistd_32.h
arch/x86/include/asm/unistd_64.h
arch/x86/kernel/cpu/common.c
arch/x86/kernel/irq.c
arch/x86/kernel/syscall_table_32.S
arch/x86/mm/iomap_32.c
include/linux/sched.h
kernel/Makefile
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits)
tracing, net: fix net tree and tracing tree merge interaction
tracing, powerpc: fix powerpc tree and tracing tree interaction
ring-buffer: do not remove reader page from list on ring buffer free
function-graph: allow unregistering twice
trace: make argument 'mem' of trace_seq_putmem() const
tracing: add missing 'extern' keywords to trace_output.h
tracing: provide trace_seq_reserve()
blktrace: print out BLK_TN_MESSAGE properly
blktrace: extract duplidate code
blktrace: fix memory leak when freeing struct blk_io_trace
blktrace: fix blk_probes_ref chaos
blktrace: make classic output more classic
blktrace: fix off-by-one bug
blktrace: fix the original blktrace
blktrace: fix a race when creating blk_tree_root in debugfs
blktrace: fix timestamp in binary output
tracing, Text Edit Lock: cleanup
tracing: filter fix for TRACE_EVENT_FORMAT events
ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
x86: kretprobe-booster interrupt emulation code fix
...
Fix up trivial conflicts in
arch/parisc/include/asm/ftrace.h
include/linux/memory.h
kernel/extable.c
kernel/module.c
It is a fairly common operation to have a pointer to a work and to need a
pointer to the delayed work it is contained in. In particular, all
delayed works which want to rearm themselves will have to do that. So it
would seem fair to offer a helper function for this operation.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PowerPC has been a long time user of the generic RTC abstraction, so hook up
rtc-generic:
- Create the "rtc-generic" platform device if ppc_md.get_rtc_time is set,
- Kill rtc-ppc, as rtc-generic offers the same functionality in a more
generic way, and supports autoloading through udev.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Today's linux-next build (powerpc allyesconfig) failed like this:
arch/powerpc/kernel/ftrace.c: In function 'prepare_ftrace_return':
arch/powerpc/kernel/ftrace.c:612: warning: passing argument 3 of 'ftrace_push_return_trace' makes pointer from integer without a cast
arch/powerpc/kernel/ftrace.c:612: error: too many arguments to function 'ftrace_push_return_trace'
Caused by commit 5d1a03dc54
("function-graph: moved the timestamp from arch to generic code") from
the tracing tree which (removed an argument from
ftrace_push_return_trace()) interacting with commit
6794c78243 ("powerpc64: port of the
function graph tracer") from the powerpc tree.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <linuxppc-dev@ozlabs.org>
LKML-Reference: <20090327230834.93d0221d.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits)
PCI: fix HT MSI mapping fix
PCI: don't enable too much HT MSI mapping
x86/PCI: make pci=lastbus=255 work when acpi is on
PCI: save and restore PCIe 2.0 registers
PCI: update fakephp for bus_id removal
PCI: fix kernel oops on bridge removal
PCI: fix conflict between SR-IOV and config space sizing
powerpc/PCI: include pci.h in powerpc MSI implementation
PCI Hotplug: schedule fakephp for feature removal
PCI Hotplug: rename legacy_fakephp to fakephp
PCI Hotplug: restore fakephp interface with complete reimplementation
PCI: Introduce /sys/bus/pci/devices/.../rescan
PCI: Introduce /sys/bus/pci/devices/.../remove
PCI: Introduce /sys/bus/pci/rescan
PCI: Introduce pci_rescan_bus()
PCI: do not enable bridges more than once
PCI: do not initialize bridges more than once
PCI: always scan child buses
PCI: pci_scan_slot() returns newly found devices
PCI: don't scan existing devices
...
Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.
We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.
But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.
->read_proc/->write_proc were just fixed to not require ->owner for
protection.
rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.
Removing ->owner will also make PDE smaller.
So, let's nuke it.
Kudos to Jeff Layton for reminding about this, let's say, oversight.
http://bugzilla.kernel.org/show_bug.cgi?id=12454
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Rusty's patch to change our sysfs access to various registers
to use smp_call_function_single() introduced a whole bunch of
warnings. This fixes them. This version also fixes an actual
bug in here where it did mtspr instead of mfspr when reading
the files
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On powerpc64 machines running 32-bit userspace, we can get garbage bits in the
stack pointer passed into the kernel. Most places handle this correctly, but
the signal handling code uses the passed value directly for allocating signal
stack frames.
This fixes the issue by introducing a get_clean_sp function that returns a
sanitized stack pointer. For 32-bit tasks on a 64-bit kernel, the stack
pointer is masked correctly. In all other cases, the stack pointer is simply
returned.
Additionally, we pass an 'is_32' parameter to get_sigframe now in order to
get the properly sanitized stack. The callers are know to be 32 or 64-bit
statically.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits)
x86: disable __do_IRQ support
sparseirq, powerpc/cell: fix unused variable warning in interrupt.c
genirq: deprecate obsolete typedefs and defines
genirq: deprecate __do_IRQ
genirq: add doc to struct irqaction
genirq: use kzalloc instead of explicit zero initialization
genirq: make irqreturn_t an enum
genirq: remove redundant if condition
genirq: remove unused hw_irq_controller typedef
irq: export remove_irq() and setup_irq() symbols
irq: match remove_irq() args with setup_irq()
irq: add remove_irq() for freeing of setup_irq() irqs
genirq: assert that irq handlers are indeed running in hardirq context
irq: name 'p' variables a bit better
irq: further clean up the free_irq() code flow
irq: refactor and clean up the free_irq() code flow
irq: clean up manage.c
irq: use GFP_KERNEL for action allocation in request_irq()
kernel/irq: fix sparse warning: make symbol static
irq: optimize init_kstat_irqs/init_copy_kstat_irqs
...
This file uses PCI MSI defines and so needs pci.h.
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This moves some MMU related init code out of setup_64.c into hash_utils_64.c
and calls it early_init_mmu() and early_init_mmu_secondary(). This will
make it easier to plug in a new MMU type.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Complete workaround for DTLB errata in e300c2/c3/c4 processors.
Due to the bug, the hardware-implemented LRU algorythm always goes to way
1 of the TLB. This fix implements the proposed software workaround in
form of a LRW table for chosing the TLB-way.
Based on patch from David Jander <david@protonic.nl>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that r0 is free we can keep the value of I/DMISS in r3 and not reload
it before doing the tlbli/d. This saves us a few cycles in the fast path
case.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Long ago we had some code that actually used the CTR in the SW TLB
miss handlers (603/e300). Since we don't use it no reason to waste
cycles saving it off and restoring it (we actually didn't restore it
in the fast path case).
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since a number of powerpc chips are SoCs we end up having dma-able
devices that are registered as platform or of_platform devices. We need
to hook the archdata to setup proper dma_ops for these devices.
Rather than having to add a bus_notify to each platform we add a default
one at the highest priority (called first) to set the default dma_ops for
of_platform and platform devices to dma_direct_ops. This allows platform
code to override the ops by providing their own notifier call back.
In the future to enable >4G DMA support on ppc32 we can hook swiotlb ops.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This will allow us to remove the ppc32 specific checks in get_dma_ops()
that defaults to dma_direct_ops if the archdata is NULL. We really
should always have archdata set to something going forward.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: performance improvement
This fixes 'powerpc: avoid cpumask games in arch/powerpc/kernel/sysfs.c'
which talked about using smp_call_function_single, but actually used
work_on_cpu (an older version of the patch).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit e7943fbbfd broke ppc32 using
Open Firmware client interface due to using the wrong relocation
macro when accessing the variable "linux_banner".
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Grant picked up the wrong version of "Respect _PAGE_COHERENT on classic
ppc32 SW" (commit a4bd6a93c3)
It was missing the code to actually deal with the fixup of
_PAGE_COHERENT based on the CPU feature.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add the new API pci_enable_msi_block() to allow drivers to
request multiple MSI and reimplement pci_enable_msi in terms of
pci_enable_msi_block. Ensure that the architecture back ends don't
have to know about multiple MSI.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since we now set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing
it out before we setup the SW TLB. Today all the SW TLB machines
(603/e300) that we support are non-SMP, however there are some errata on
some devices that cause us to set _PAGE_COHERENT via CPU_FTR_NEED_COHERENT.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
CONFIG_PPC_MULTIPLATFORM is a remain of the pre-powerpc days and isn't
really meaningful anymore. It was basically equivalent to PPC64 || 6xx.
This removes it along with the following changes:
- 32-bit platforms that relied on PPC32 && PPC_MULTIPLATFORM now rely
on 6xx which is what they want anyway.
- A new symbol, PPC_BOOK3S, is defined that represent compliance with
the "Server" variant of the architecture. This is set when either 6xx
or PPC64 is set and open the door for future BOOK3E 64-bit.
- 64-bit platforms that relied on PPC64 && PPC_MULTIPLATFORM now use
PPC64 && PPC_BOOK3S
- A separate and selectable CONFIG_PPC_OF_BOOT_TRAMPOLINE option is now
used to control the use of prom_init.c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: cleanup
Convert the last remaining users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When the console is on a serial port to be driven by serial8250, a
character can be lost from the end of the first line in the two-line
sequence
serial8250.0: ttyS0 at MMIO 0xe0004500 (irq = 42) is a 16550A
console handover: boot [udbg0] -> real [ttyS0]
This happens because udbg_puts or udbg_write stuff the last byte of
the line into the Tx FIFO and return, whereupon the serial8250
initialization code immediately empties that FIFO. The fix: udbg_puts
and udbg_write now wait for the Tx FIFO to clear before returning.
This delays the system by one additional serial frame time for each
line written by udbg, but the effect is not noticeable, a cumulative
17 milliseconds for 200 lines of early printk output at 115200 baud.
Also, the routines in udbg_16550.c now emit CRLF instead of LFCR.
Linux makes a point of emitting CRLF because, when serial output is
captured to a file, LFCR sequences can confuse text editors. See
http://lkml.org/lkml/2006/2/4/50 for some history.
Signed-off-by: Andrew Klossner <andrew@cesa.opbu.xerox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix typo: s/resouces/resources/ in a pr_debug
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
So at least you can see what kernel you're booting if you die
before the kernel prints it mid-way through start_kernel().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch enables oprofile for all 3 FX variants and GX variant of the
750 processor.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When identify_cpu() is called a second time with a logical PVR, it
only copies a subset of the cpu_spec fields so as to avoid overwriting
the performance monitor fields that were initialized based on the
real PVR.
However some of the other, non performance monitor related fields are
also not copied:
* pvr_mask
* pvr_value
* mmu_features
* machine_check
The fact that pvr_mask is not copied can result in show_cpuinfo()
showing the cpu as "unknown", if we override an unknown PVR with a
logical one - as reported by Shaggy.
So change the logic to copy all fields, and then put back the PMC
related ones in the case that we're overwriting a real PVR with a
logical one.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The for-loop body of identify_cpu() has gotten a little big, so move the
loop body logic into a separate function. No other changes.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: __per_cpu_load available on all SMP capable archs
Percpu now requires three symbols to be defined - __per_cpu_load,
__per_cpu_start and __per_cpu_end. There were three archs which
didn't have it. Update them as follows.
* powerpc: can use generic PERCPU() macro. Compile tested for
powerpc32, compile/boot tested for powerpc64.
* ia64: can use generic PERCPU_VADDR() macro. __phys_per_cpu_start is
identical to __per_cpu_load. Compile tested and symbol table looks
identical after the change except for the additional __per_cpu_load.
* arm: added explicit __per_cpu_load definition. Currently uses
unified .init output section so can't use the generic macro. Dunno
whether the unified .init ouput section is required by arch
peculiarity so I left it alone. Please break it up and use PERCPU()
if possible.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Pat Gefre <pfg@sgi.com>
Cc: Russell King <rmk@arm.linux.org.uk>
The e500mc core supports the new tlbilx instructions that do core
local invalidates and also provide us the ability to take down
all TLB entries matching a given PID.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Impact: more hardware support
This adds the back-end for the PMU on the POWER4 and POWER4+ processors
(GP and GQ). This is quite similar to the PPC970, with 8 PMCs, but has
fewer events than the PPC970.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Impact: more hardware support
This adds the back-end for the PMU on the POWER5+ processors (i.e. GS,
including GS DD3 aka POWER5++). This doesn't use the fixed-function
PMC5 and PMC6 since they don't respect the freeze conditions and don't
generate interrupts, as on POWER6.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Impact: fix oops-causing bug
This fixes a bug in the powerpc hw_perf_counter_init where the code
didn't initialize ctrs[n] before passing the ctrs array to check_excludes,
leading to possible oopses and other incorrect behaviour. This fixes it
by initializing ctrs[n] correctly.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds the back-end for the PMU on the POWER5 processor. This knows
how to use the fixed-function PMC5 and PMC6 (instructions completed and
run cycles). Unlike POWER6, PMC5/6 obey the freeze conditions and can
generate interrupts, so their use doesn't impose any extra restrictions.
POWER5+ is different and is not supported by this patch.
Signed-off-by: Paul Mackerras <paulus@samba.org>
When we introduced VSX, we changed the way FPRs are stored in the
thread_struct. Unfortunately we missed the load/store float double
alignment handler code when updating how we access FPRs in the
thread_struct.
Below fixes this and merges the little/big endian case.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, setting hw_event.exclude_kernel does nothing on the PPC970
variants used in Apple G5 machines, because they have the HV (hypervisor)
bit in the MSR forced to 1, so as far as the PMU is concerned, the
kernel runs in hypervisor mode. Thus we have to use the MMCR0_FCHV
(freeze counters in hypervisor mode) bit rather than the MMCR0_FCS
(freeze counters in supervisor mode) bit.
This checks the MSR.HV bit at startup, and if it is set, we set the
freeze_counters_kernel variable to MMCR0_FCHV (it was initialized to
MMCR0_FCS). We then use that whenever we need to exclude kernel events.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Randomise ELF_ET_DYN_BASE, which is used when loading position independent
executables.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Randomise the lower bits of the stack address. More randomisation is good for
security but the scatter can also help with SMT threads that share an L1. A
quick test case shows this working:
int main()
{
int sp;
printf("%x\n", (unsigned long)&sp & 4095);
}
before:
80
80
80
80
80
after:
610
490
300
6b0
d80
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move is_32bit_task into asm/thread_info.h, that allows us to test for
32/64bit tasks without an ugly CONFIG_PPC64 ifdef.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we introduced VSX, we changed the way FPRs are stored in the
thread_struct. Unfortunately we missed the load/store float double
alignment handler code when updating how we access FPRs in the
thread_struct.
Below fixes this and merges the little/big endian case.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
lfiwzx is a new floating point load instruction in 2.06 that needs an
alignment handler for Linux.
Turns out to be the worlds easiest handler to add.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
While testing partition migration with heavy CPU load using
shared processors, it was observed that sometimes the migration
would never complete and would appear to hang. Currently, the
migration code assumes that if H_SUCCESS is returned from the H_JOIN
then the migration is complete and the processor is waking up on
the target system. If there was an outstanding PROD to the processor
when the H_JOIN is called, however, it will return H_SUCCESS on the source
system, causing the migration to hang, or in some scenarios cause
the kernel to crash on the complete call waking the caller
of rtas_percpu_suspend_me. Fix this by calling H_JOIN multiple times
if necessary during the migration.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The e500mc supports the new msgsnd/doorbell mechanisms that were added in
the Power ISA 2.05 architecture. We use the normal level doorbell for
doing SMP IPIs at this point.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Old OF variants used to create a 'dummy' parent node "multifunc-device"
for devices with more than one PCI function. Our code that matches OF
nodes to PCI devices dealt with that in one place but not in another,
this fixes it.
This has the practical effect of fixing interrupt routing of multifunction
PCI cards on some older PowerMac machines.
Signed-off-by: Tom Arbuckle <tom.d.arbuckle@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Create a new header that becomes a single location for defining PowerPC
opcodes used by code that is either generationg instructions
at runtime (fixups, debug, etc.), emulating instructions, or just
compiling instructions old assemblers don't know about.
We currently don't handle the floating point emulation or alignment decode
as both are better handled by the specific decode support they already
have.
Added support for the new dcbzl, dcbal, msgsnd, tlbilx, & wait instructions
since older assemblers don't know about them.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: clean up, remove duplicate code
When ftrace was first ported to PowerPC, there existed a
create_function_call that would create the instruction to make a call
to a given address. Unfortunately, this call expected to write to
the address it was given, and since it used the address to calculate
the offset, it could not be faked.
ftrace needed a way to create the instruction without actually writing
that instruction to the text section. So ftrace had to implement its
own code.
Now we have create_branch in the code patching library, which does
exactly what ftrace needs. This patch replaces ftrace's implementation
with the library function.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The original port of ftrace to PowerPC kept a lot of the code used
by x86. Some of this code was to handle x86's 5 byte instruction.
This was handled by using character arrays to manipulate the
code.
PowerPC has a consistent 4 byte instruction. Using unsigned ints
makes the code more efficient as well as more readable.
By converting to use unsigned ints to represent instructions,
I was able to remove the side effects that were needed for
manipulating character strings.
i.e. memcpy and memcmp
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch gets function graph tracing working with dynamic function
tracer on PowerPC32.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch ports the function graph tracer for PowerPC, but only
for static function tracing.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: clean up
Use a macro to save and restore the registers for PowerPC32,
since that code is duplicated.
This is similar to the work done by Cyrill Gorcunov for the
mcount code in x86_64.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The TOCS used by modules are different than the one used by
the core kernel code. The function graph tracer must save and
restore the TOC whenever it traces a module call. But this
is an added overhead to burden the majority of core kernel
code being traced.
Benjamin Herrenschmidt suggested in testing the entry of
the call to tell if it is a core kernel function or a module.
He recommended using the REGION_ID() macro to perform this test.
This patch implements Benjamin's idea, and uses a different
return_to_handler routine dependent on if the entry is a core
kernel function or not. The module version saves the TOC, where as
the core kernel version does not.
Geoff Lavand tested on PS3.
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is the port of the function graph tracer to PowerPC with
dynamic tracing.
Geoff Lavand tested on PS3.
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is a port of the function graph tracer that was written by
Frederic Weisbecker for the x86.
This only works for PPC64 at the moment and only for static tracing.
PPC32 and dynamic function graph tracing support will come later.
The trace produces a visual calling of functions:
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) 2.224 us | }
0) ! 271.024 us | }
0) ! 320.080 us | }
0) ! 324.656 us | }
0) ! 329.136 us | }
0) | .put_prev_task_fair() {
0) | .update_curr() {
0) 2.240 us | .update_min_vruntime();
0) 6.512 us | }
0) 2.528 us | .__enqueue_entity();
0) + 15.536 us | }
0) | .pick_next_task_fair() {
0) 2.032 us | .__pick_next_entity();
0) 2.064 us | .__clear_buddies();
0) | .set_next_entity() {
0) 2.672 us | .__dequeue_entity();
0) 6.864 us | }
Geoff Lavand tested on PS3.
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling reported a compile bug when dynamic ftrace was
configured in and modules were not. This was due to the ftrace
code referencing module specific structures.
Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: cleanup
The PowerPC ftrace code uses a hacked up DEBUGP macro for prints.
This patch converts it to the standard pr_debug.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds support for 256KB pages on ppc44x-based boards.
For simplification of implementation with 256KB pages we still assume
2-level paging. As a side effect this leads to wasting extra memory space
reserved for PTE tables: only 1/4 of pages allocated for PTEs are
actually used. But this may be an acceptable trade-off to achieve the
high performance we have with big PAGE_SIZEs in some applications (e.g.
RAID).
Also with 256KB PAGE_SIZE we increase THREAD_SIZE up to 32KB to minimize
the risk of stack overflows in the cases of on-stack arrays, which size
depends on the page size (e.g. multipage BIOs, NTFS, etc.).
With 256KB PAGE_SIZE we need to decrease the PKMAP_ORDER at least down
to 9, otherwise all high memory (2 ^ 10 * PAGE_SIZE == 256MB) we'll be
occupied by PKMAP addresses leaving no place for vmalloc. We do not
separate PKMAP_ORDER for 256K from 16K/64K PAGE_SIZE here; actually that
value of 10 in support for 16K/64K had been selected rather intuitively.
Thus now for all cases of PAGE_SIZE on ppc44x (including the default, 4KB,
one) we have 512 pages for PKMAP.
Because ELF standard supports only page sizes up to 64K, then you should
use binutils later than 2.17.50.0.3 with '-zmax-page-size' set to 256K
for building applications, which are to be run with the 256KB-page sized
kernel. If using the older binutils, then you should patch them like follows:
--- binutils/bfd/elf32-ppc.c.orig
+++ binutils/bfd/elf32-ppc.c
-#define ELF_MAXPAGESIZE 0x10000
+#define ELF_MAXPAGESIZE 0x40000
One more restriction we currently have with 256KB page sizes is inability
to use shmem safely, so, for now, the 256KB is available only if you turn
the CONFIG_SHMEM option off (another variant is to use BROKEN).
Though, if you need shmem with 256KB pages, you can always remove the !SHMEM
dependency in 'config PPC_256K_PAGES', and use the workaround available here:
http://lkml.org/lkml/2008/12/19/20
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Fix the VSX alignment handler for VSX registers > 32. 32-63 are stored
in the VMX part of the thread_struct not the FPR part.
Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: stable@kernel.org (2.6.27 & .28 please)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The Power ISA 2.06 spec introduces a standard MMU programming model that
is based on the Freescale Book-E MMU programing model. The Freescale
version is pretty backwards compatiable with the ISA 2.06 definition so
we are starting to refactor some of the Freescale code so it can be
easily shared.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The Power ISA 2.06 added power of two page sizes to the embedded MMU
architecture. Its done it such a way to be code compatiable with the
existing HW. Made the minor code changes to support both power of two
and power of four page sizes. Also added some new MAS bits and macros
that are defined as part of the 2.06 ISA. Renamed some things to use
the 'Book-3e' concept to convey the new MMU that is based on the
Freescale Book-E MMU programming model.
Note, its still invalid to try and use a page size that isn't supported
by cpu.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
move the definition of hose_list next to its hotplug spinlock.
create pcibios_io_size to encapsulate ifdef in existing pci-common
function pcibios_vaddr_is_ioport
move pci_address_to_pio to pci-common, using new pcibios_io_size, and
protect this GPL exported function against concurrent hotplug removal
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: new perf_counter feature
This extends the perf_counter_hw_event struct with bits that specify
that events in user, kernel and/or hypervisor mode should not be
counted (i.e. should be excluded), and adds code to program the PMU
mode selection bits accordingly on x86 and powerpc.
For software counters, we don't currently have the infrastructure to
distinguish which mode an event occurs in, so we currently fail the
counter initialization if the setting of the hw_event.exclude_* bits
would require us to distinguish. Context switches and CPU migrations
are currently considered to occur in kernel mode.
On x86, this changes the previous policy that only root can count
kernel events. Now non-root users can count kernel events or exclude
them. Non-root users still can't use NMI events, though. On x86 we
don't appear to have any way to control whether hypervisor events are
counted or not, so hw_event.exclude_hv is ignored.
On powerpc, the selection of whether to count events in user, kernel
and/or hypervisor mode is PMU-wide, not per-counter, so this adds a
check that the hw_event.exclude_* settings are the same as other events
on the PMU. Counters being added to a group have to have the same
settings as the other hardware counters in the group. Counters and
groups can only be enabled in hw_perf_group_sched_in or power_perf_enable
if they have the same settings as any other counters already on the
PMU. If we are not running on a hypervisor, the exclude_hv setting
is ignored (by forcing it to 0) since we can't ever get any
hypervisor events.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The lmb debugging can be turned on at boottime with lmb=debug on the
command line. However on powerpc that doesn't work, because we don't
necessarily call lmb_dump_all().
So always call lmb_dump_all() after lmb_analyze(), no output is
generated unless lmb=debug is found on the command line.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The new legacy_mem file in sysfs is causing problems with X on machines
that don't support legacy memory access. The way I initially implemented
it, we would fail with -ENXIO when trying to mmap it, thus exposing to
X that we do support the API but there is no legacy memory.
Unfortunately, X poor error handling is causing it to fail to start when
it gets this error.
This implements a workaround hack that instead maps anonymous memory
instead (using shmem if VM_SHARED is set, just like /dev/zero does).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: fix dynamic ftrace with large modules in PPC64
The math to calculate the offset into the TOC that is taken from reading
the trampoline is incorrect. The bottom half of the offset is a signed
extended short. The current code was using an OR to create the offset
when it should have been using an addition.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Recently, a patch left DEBUG enabled in the powerpc common PCI code,
resulting in an old bug in a pr_debug() statement to show up and cause
a NULL dereference on some machines.
This fixes the pr_debug() statement and reverts to DEBUG not being
force-enabled in that file.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We currently have a few variants of fsl-booke processors (e500v1, e500v2,
e500mc, and e200). They all have minor differences that we had previously
been handling via ifdefs.
To move towards having this support the following changes have been made:
* PID1, PID2 only exist on e500v1 & e500v2 and should not be accessed on
e500mc or e200. We use MMUCFG[NPIDS] to determine which case we are
since we only touch PID1/2 in extremely early init code.
* Not all IVORs exist on all the processors so introduce cpu_setup
functions for each variant to setup the proper IVORs that are either
unique or exist but have some variations between the processors
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
In the VIO bus code the wrappers for dma alloc_coherent and free_coherent
calls are rounding to IOMMU_PAGE_SIZE. Taking a look at the underlying
calls, the actual mapping is promoted to PAGE_SIZE. Changing the
rounding in these two functions fixes under-reporting the entitlement
used by the system. Without this change, the system could run out of
entitlement before it believes it has and incur mapping failures at the
firmware level.
Also in the VIO bus code, the wrapper for dma map_sg is not exiting in
an error path where it should. Rather than fall through to code for the
success case, this patch adds the return that is needed in the error path.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Conflicts:
arch/x86/include/asm/pda.h
We merge tip/core/percpu into tip/perfcounters/core because of a
semantic and contextual conflict: the former eliminates the PDA,
while the latter extends it with apic_perf_irqs field.
Resolve the conflict by moving the new field to the irq_cpustat
structure on 64-bit too.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arm, arm/mach-integrator and powerpc were missing
.data.percpu.page_aligned in their percpu output section definitions.
Add it.
Signed-off-by: Tejun Heo <tj@kernel.org>
The PAPR says that the property for specifying the number of SLBs should
be called "slb-size". We currently only look for "ibm,slb-size" because
this is what firmware actually presents.
This patch makes us look for the "slb-size" property as well and in
preference to the "ibm,slb-size". This should future proof us if
firmware changes to match PAPR.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: New perf_counter features
A pinned counter group is one that the user wants to have on the CPU
whenever possible, i.e. whenever the associated task is running, for
a per-task group, or always for a per-cpu group. If the system
cannot satisfy that, it puts the group into an error state where
it is not scheduled any more and reads from it return EOF (i.e. 0
bytes read). The group can be released from error state and made
readable again using prctl(PR_TASK_PERF_COUNTERS_ENABLE). When we
have finer-grained enable/disable controls on counters we'll be able
to reset the error state on individual groups.
An exclusive group is one that the user wants to be the only group
using the CPU performance monitor hardware whenever it is on. The
counter group scheduler will not schedule an exclusive group if there
are already other groups on the CPU and will not schedule other groups
onto the CPU if there is an exclusive group scheduled (that statement
does not apply to groups containing only software counters, which can
always go on and which do not prevent an exclusive group from going on).
With an exclusive group, we will be able to let users program PMU
registers at a low level without the concern that those settings will
perturb other measurements.
Along the way this reorganizes things a little:
- is_software_counter() is moved to perf_counter.h.
- cpuctx->active_oncpu now records the number of hardware counters on
the CPU, i.e. it now excludes software counters. Nothing was reading
cpuctx->active_oncpu before, so this change is harmless.
- A new cpuctx->exclusive field records whether we currently have an
exclusive group on the CPU.
- counter_sched_out moves higher up in perf_counter.c and gets called
from __perf_counter_remove_from_context and __perf_counter_exit_task,
where we used to have essentially the same code.
- __perf_counter_sched_in now goes through the counter list twice, doing
the pinned counters in the first loop and the non-pinned counters in
the second loop, in order to give the pinned counters the best chance
to be scheduled in.
Note that only a group leader can be exclusive or pinned, and that
attribute applies to the whole group. This avoids some awkwardness in
some corner cases (e.g. where a group leader is closed and the other
group members get added to the context list). If we want to relax that
restriction later, we can, and it is easier to relax a restriction than
to apply a new one.
This doesn't yet handle the case where a pinned counter is inherited
and goes into error state in the child - the error state is not
propagated up to the parent when the child exits, and arguably it
should.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This makes sure that we call the platform-specific ppc_md.enable_pmcs
function on each CPU before we try to use the PMU on that CPU. If the
CPU goes off-line and then on-line, we need to do the enable_pmcs call
again, so we use the hw_perf_counter_setup hook to ensure that. It gets
called as each CPU comes online, but it isn't called on the CPU that is
coming up, so this adds the CPU number as an argument to it (there were
no non-empty instances of hw_perf_counter_setup before).
This also arranges to set the pmcregs_in_use field of the lppaca (data
structure shared with the hypervisor) on each CPU when we are using the
PMU and clear it when we are not. This allows the hypervisor to optimize
partition switches by not saving/restoring the PMU registers when we
aren't using the PMU.
Signed-off-by: Paul Mackerras <paulus@samba.org>
We use Doorbell interrupts for IPIs and thus we need to make sure we aren't
interrupted in the process of processing the IPI.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Dave Liu <daveliu@freescale.com>
The PowerMac kernel occasionally fails to bring up the secondary CPUs on
SMP, the trigger factor seem to be fairly random and related to location
of code and data.
This appears to be due to the initial loading of the TOC value by the
secondary processor which now happens before we clear HID4:RM_CI (Real
Mode Cache Invalidate). This bit should really be cleared before we do
any load or store other than fetching code.
This fix works based on the assumption that all SMP 64-bit PowerMacs use
variants of the 970, which fortunately is true, by explicitely clearing
that bit, adding an slbia for good measure as RM_CI mode is known to
create bogus ERAT entries.
I also removed some spurrious debug output that was left enabled by
mistake while at it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The per_cpu__ prefix on DECLARE_PER_CPU'd variables is going away;
rename cache_dir to cache_dir_pcpu.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Convert arch/powerpc/ over to long long based u64:
-#ifdef __powerpc64__
-# include <asm-generic/int-l64.h>
-#else
-# include <asm-generic/int-ll64.h>
-#endif
+#include <asm-generic/int-ll64.h>
This will avoid reoccuring spurious warnings in core kernel code that
comes when people test on their own hardware. (i.e. x86 in ~98% of the
cases) This is what x86 uses and it generally helps keep 64-bit code
32-bit clean too.
[Adjusted to not impact user mode (from paulus) - sfr]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Enforce that the crash kernel region never overlaps the current kernel,
as it will be written directly on kexec load.
Also, default to the previous KDUMP_KERNELBASE if the start is 0.
Other architectures (x86, ia64) state that specifying the start address
0 (or omitting it) will result in the kernel allocating it. Before the
relocatable patch in 2.6.28, powerpc would adjust any other start value
to the hardcoded KDUMP_KERNELBASE of 32M.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We are declaring the dummy section (used to work around a binutils
bug) as PT_NOTE, but we don't have enough bytes for it to be a valid
note header, and kexec userspace complains:
Warning: Elf Note name is not null terminated
Warning: append= option is not passed. Using the first kernel root partition
Warning: Elf Note name is not null terminated
Instead of using the arbitray value 0xf177 (aka "fill"), declare a
no-name no-description note of type 0.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: cleanup, update to new cpumask API
Irq_desc.affinity and irq_desc.pending_mask are now cpumask_var_t's
so access to them should be using the new cpumask API.
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: build fix
Ingo Molnar wrote:
> tip/arch/blackfin/kernel/irqchip.c: In function 'show_interrupts':
> tip/arch/blackfin/kernel/irqchip.c:85: error: 'struct kernel_stat' has no member named 'irqs'
> make[2]: *** [arch/blackfin/kernel/irqchip.o] Error 1
> make[2]: *** Waiting for unfinished jobs....
>
So could move kstat_irqs array to irq_desc struct.
(s390, m68k, sparc) are not touched yet, because they don't support genirq
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This adds the back-end for the PMU on the POWER6 processor.
Fortunately, the event selection hardware is somewhat simpler on
POWER6 than on other POWER family processors, so the constraints
fit into only 32 bits.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds the back-end for the PMU on the PPC970 family.
The PPC970 allows events from the ISU to be selected in two different
ways. Rather than use alternative event codes to express this, we
instead use a single encoding for ISU events and express the
resulting constraint (that you can't select events from all three
of FPU/IFU/VPU, ISU and IDU/STS at the same time, since they all come
in through only 2 multiplexers) using a NAND constraint field, and
work out which multiplexer is used for ISU events at compute_mmcr
time.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This provides the architecture-specific functions needed to access
PMU hardware on the 64-bit PowerPC processors. It has been designed
for the IBM POWER family (POWER 4/4+/5/5+/6 and PPC970) but will
hopefully also suit other 64-bit PowerPC machines (although probably
not Cell given how different it is in this area). This doesn't
include back-ends for any specific processors.
This implements a system which allows back-ends to express the
constraints that their hardware has on what events can be counted
simultaneously. The constraints are expressed as a 64-bit mask +
64-bit value for each event, and the encoding is capable of
expressing the constraints arising from having a set of multiplexers
feeding an event bus, with some events being available through
multiple multiplexer settings, such as we get on POWER4 and PPC970.
Furthermore, the back-end can supply alternative event codes for
each event, and the constraint checking code will try all possible
combinations of alternative event codes to try to find a combination
that will fit.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Because 64-bit powerpc uses lazy (soft) interrupt disabling, it is
possible for a performance monitor exception to come in when the
kernel thinks interrupts are disabled (i.e. when they are
soft-disabled but hard-enabled). In such a situation the performance
monitor exception handler might have some processing to do (such as
process wakeups) which can't be done in what is effectively an NMI
handler.
This provides a way to defer that work until interrupts get enabled,
either in raw_local_irq_restore() or by returning from an interrupt
handler to code that had interrupts enabled. We have a per-processor
flag that indicates that there is work pending to do when interrupts
subsequently get re-enabled. This flag is checked in the interrupt
return path and in raw_local_irq_restore(), and if it is set,
perf_counter_do_pending() is called to do the pending work.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The Freescale PowerPC specific gianfar driver (gig-e) uses
cacheable_memzero for performance reasons we need to export
the symbol to allow the driver to be built as a module.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
tce_entryp is a "u64 *" not an "unsigned long *".
[Split from a large patch -sfr]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
of_get_flat_dt_prop() returns a "void *", so we don't need to cast when
assigning its result to a pointer variable.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
X has been failing to start on my quad G5 powermac since commit
1fd0f52583 ("powerpc: Fix domain numbers
in /proc on 64-bit") went in. The reason is that the change allows X
to see the PCI-PCI bridge above the video card (previously it was
obscured by the fact that there were two "00" directories in
/proc/bus/pci), and the pciconfig_iobase system call on the bridge is
failing because of a hack that we have to return information about the
AGP bus when X asks about bus 0. This machine doesn't have an AGP bus
(it has PCI Express) and so the pciconfig_iobase call is returning -1,
which ultimately causes X to fail to start.
This fixes it by checking that we have an AGP bridge before
redirecting the pciconfig_iobase call to return information about the
AGP bus. With this, X starts successfully both on a quad G5 with
PCI Express and on an older dual G5 with AGP.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The current code for providing processor cache information in sysfs
has the following deficiencies:
- several complex functions that are hard to understand
- implicit recursion (cache_desc_release -> kobject_put -> cache_desc_release)
- explicit recursion (create_cache_index_info)
- use of two per-cpu arrays when one would suffice
- duplication of work on systems where CPUs share cache
Also, when I looked at implementing support for a shared_cpu_map
attribute, it was pretty much impossible to handle hotplug without
checking every single online CPU's cache_desc list and fixing things
up... not that this is a hot path, but it would have introduced
O(n^2)-ish behavior during boot. Addressing this involved rethinking
the core data structures used, which didn't lend itself to an
incremental approach.
This implementation maintains a "forest" (potentially more than one
tree) of cache objects which reflects the system's cache topology.
Cache objects are instantiated as needed as CPUs come online. A
per-cpu array is used mainly for sysfs-related bookkeeping; the
objects in the array just point to the appropriate points in the
forest.
This maintains compatibility with the existing code and includes some
enhancements:
- Implement the shared_cpu_map attribute, which is essential for
enabling userspace to discover the system's overall cache topology.
- Use cache-block-size properties if cache-line-size is not available.
I chose to place this implementation in a new file since it would have
roughly doubled the size of sysfs.c, which is already kind of messy.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There's a problem on some embedded platforms when we re-assign
everything on PCI, such as 44x. The generic code tries to avoid
assigning devices to addresses overlapping the low legacy
addresses such as VGA hard decoded areas using constants that
are unfortunately no good for us, as they don't take into account
the address translation we do to access PCI busses.
Thus we end up allocating things like IO BARs to 0, which is
technically legal, but will shadow hard decoded ports for use
by things like VGA cards.
This works around it by attempting to reserve legacy regions
before we try to assign addresses.
NOTE: This may have nasty side effects in cases I haven't tested
yet:
- We try to use FW mappings (ie. powermac) and the FW has allocated
a conflicting address over those legacy regions. This will typically
happen. I would expect the new code to just fail with an informative
message without harm but I haven't had a chance to test that scenario
yet.
- A device with fixed BARs overlapping those legacy addresses such
as an IDE controller in legacy mode is in the system. I don't know
for sure yet what will happen there, I have to test :-)
Ideally, we should change PCIBIOS_MIN_IO/MIN_MEM accross the board
to take a bus pointer so they can provide appropriate per-bus translated
values to the generic code but that's a more invasive patch. I will
do that in the future, but in the meantime, this fixes the problem
locally
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (98 commits)
PCI PM: Put PM callbacks in the order of execution
PCI PM: Run default PM callbacks for all devices using new framework
PCI PM: Register power state of devices during initialization
PCI PM: Call pci_fixup_device from legacy routines
PCI PM: Rearrange code in pci-driver.c
PCI PM: Avoid touching devices behind bridges in unknown state
PCI PM: Move pci_has_legacy_pm_support
PCI PM: Power-manage devices without drivers during suspend-resume
PCI PM: Add suspend counterpart of pci_reenable_device
PCI PM: Fix poweroff and restore callbacks
PCI: Use msleep instead of cpu_relax during ASPM link retraining
PCI: PCIe portdrv: Add kerneldoc comments to remining core funtions
PCI: PCIe portdrv: Rearrange code so that related things are together
PCI: PCIe portdrv: Fix suspend and resume of PCI Express port services
PCI: PCIe portdrv: Add kerneldoc comments to some core functions
x86/PCI: Do not use interrupt links for devices using MSI-X
net: sfc: Use pci_clear_master() to disable bus mastering
PCI: Add pci_clear_master() as opposite of pci_set_master()
PCI hotplug: remove redundant test in cpq hotplug
PCI: pciehp: cleanup register and field definitions
...
This is a global variable defined in fsl_booke_mmu.c with a value that gets
initialized in assembly code in head_fsl_booke.S.
It's never used.
If some code ever does want to know the number of entries in TLB1, then
"numcams = mfspr(SPRN_TLB1CFG) & 0xfff", is a whole lot simpler than a
global initialized during kernel boot from assembly.
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Some assembly code in head_fsl_booke.S hard-coded the size of struct tlbcam
to 20 when it indexed the TLBCAM table. Anyone changing the size of struct
tlbcam would not know to expect that.
The kernel already has a system to get the size of C structures into
assembly language files, asm-offsets, so let's use it.
The definition of the struct gets moved to a header, so that asm-offsets.c
can include it.
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
trivial: chack -> check typo fix in main Makefile
trivial: Add a space (and a comma) to a printk in 8250 driver
trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
trivial: Fix misspelling of "firmware" in powerpc Makefile
trivial: Fix misspelling of "firmware" in usb.c
trivial: Fix misspelling of "firmware" in qla1280.c
trivial: Fix misspelling of "firmware" in a100u2w.c
trivial: Fix misspelling of "firmware" in megaraid.c
trivial: Fix misspelling of "firmware" in ql4_mbx.c
trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
trivial: Fix misspelling of "firmware" in ipw2100.c
trivial: Fix misspelling of "firmware" in atmel.c
trivial: Fix misspelled firmware in Kconfig
trivial: fix an -> a typos in documentation and comments
trivial: fix then -> than typos in comments and documentation
trivial: update Jesper Juhl CREDITS entry with new email
trivial: fix singal -> signal typo
trivial: Fix incorrect use of "loose" in event.c
trivial: printk: fix indentation of new_text_line declaration
trivial: rtc-stk17ta8: fix sparse warning
...
Use the generic pci_swizzle_interrupt_pin() instead of arch-specific code.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add kprobe_insn_mutex for protecting kprobe_insn_pages hlist, and remove
kprobe_mutex from architecture dependent code.
This allows us to call arch_remove_kprobe() (and free_insn_slot) while
holding kprobe_mutex.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
x86: export vector_used_by_percpu_irq
x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
sched: nominate preferred wakeup cpu, fix
x86: fix lguest used_vectors breakage, -v2
x86: fix warning in arch/x86/kernel/io_apic.c
sched: fix warning in kernel/sched.c
sched: move test_sd_parent() to an SMP section of sched.h
sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
sched: activate active load balancing in new idle cpus
sched: bias task wakeups to preferred semi-idle packages
sched: nominate preferred wakeup cpu
sched: favour lower logical cpu number for sched_mc balance
sched: framework for sched_mc/smt_power_savings=N
sched: convert BALANCE_FOR_xx_POWER to inline functions
x86: use possible_cpus=NUM to extend the possible cpus allowed
x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
x86: update io_apic.c to the new cpumask code
x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
x86: xen: use smp_call_function_many()
x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
...
Fixed up trivial conflict in kernel/time/tick-sched.c manually
* 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (140 commits)
KVM: MMU: handle large host sptes on invlpg/resync
KVM: Add locking to virtual i8259 interrupt controller
KVM: MMU: Don't treat a global pte as such if cr4.pge is cleared
MAINTAINERS: Maintainership changes for kvm/ia64
KVM: ia64: Fix kvm_arch_vcpu_ioctl_[gs]et_regs()
KVM: x86: Rework user space NMI injection as KVM_CAP_USER_NMI
KVM: VMX: Fix pending NMI-vs.-IRQ race for user space irqchip
KVM: fix handling of ACK from shared guest IRQ
KVM: MMU: check for present pdptr shadow page in walk_shadow
KVM: Consolidate userspace memory capability reporting into common code
KVM: Advertise the bug in memory region destruction as fixed
KVM: use cpumask_var_t for cpus_hardware_enabled
KVM: use modern cpumask primitives, no cpumask_t on stack
KVM: Extract core of kvm_flush_remote_tlbs/kvm_reload_remote_mmus
KVM: set owner of cpu and vm file operations
anon_inodes: use fops->owner for module refcount
x86: KVM guest: kvm_get_tsc_khz: return khz, not lpj
KVM: MMU: prepopulate the shadow on invlpg
KVM: MMU: skip global pgtables on sync due to cr3 switch
KVM: MMU: collapse remote TLB flushes on root sync
...
Existing KVM statistics are either just counters (kvm_stat) reported for
KVM generally or trace based aproaches like kvm_trace.
For KVM on powerpc we had the need to track the timings of the different exit
types. While this could be achieved parsing data created with a kvm_trace
extension this adds too much overhead (at least on embedded PowerPC) slowing
down the workloads we wanted to measure.
Therefore this patch adds a in-kernel exit timing statistic to the powerpc kvm
code. These statistic is available per vm&vcpu under the kvm debugfs directory.
As this statistic is low, but still some overhead it can be enabled via a
.config entry and should be off by default.
Since this patch touched all powerpc kvm_stat code anyway this code is now
merged and simplified together with the exit timing statistic code (still
working with exit timing disabled in .config).
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Formerly, we used to maintain a per-vcpu shadow TLB and on every entry to the
guest would load this array into the hardware TLB. This consumed 1280 bytes of
memory (64 entries of 16 bytes plus a struct page pointer each), and also
required some assembly to loop over the array on every entry.
Instead of saving a copy in memory, we can just store shadow mappings directly
into the hardware TLB, accepting that the host kernel will clobber these as
part of the normal 440 TLB round robin. When we do that we need less than half
the memory, and we have decreased the exit handling time for all guest exits,
at the cost of increased number of TLB misses because the host overwrites some
guest entries.
These savings will be increased on processors with larger TLBs or which
implement intelligent flush instructions like tlbivax (which will avoid the
need to walk arrays in software).
In addition to that and to the code simplification, we have a greater chance of
leaving other host userspace mappings in the TLB, instead of forcing all
subsequent tasks to re-fault all their mappings.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch doesn't yet move all 44x-specific data into the new structure, but
is the first step down that path. In the future we may also want to create a
struct kvm_vcpu_booke.
Based on patch from Liu Yu <yu.liu@freescale.com>.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This will ease ports to other cores.
Also remove unused "struct kvm_tlb" while we're at it.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The cpu time spent by the idle process actually doing something is
currently accounted as idle time. This is plain wrong, the architectures
that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
time spent doing nothing and the time spent by idle doing work. The first
is accounted with account_idle_time and the second with account_system_time.
The architectures that use the account_xxx_time interface directly and not
the account_xxx_ticks interface now need to do the check for the idle
process in their arch code. In particular to improve the system vs true
idle time accounting the arch code needs to measure the true idle time
instead of just testing for the idle process.
To improve the tick based accounting as well we would need an architecture
primitive that can tell us if the pt_regs of the interrupted context
points to the magic instruction that halts the cpu.
In addition idle time is no more added to the stime of the idle process.
This field now contains the system time of the idle process as it should
be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
every tick that occurs while idle is running will be accounted as idle
time.
This patch contains the necessary common code changes to be able to
distinguish idle system time and true idle time. The architectures with
support for VIRT_CPU_ACCOUNTING need some changes to exploit this.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The utimescaled / stimescaled fields in the task structure and the
global cpustat should be set on all architectures. On s390 the calls
to account_user_time_scaled and account_system_time_scaled never have
been added. In addition system time that is accounted as guest time
to the user time of a process is accounted to the scaled system time
instead of the scaled user time.
To fix the bugs and to prevent future forgetfulness this patch merges
account_system_time_scaled into account_system_time and
account_user_time_scaled into account_user_time.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Michael Neuling <mikey@neuling.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits)
powerpc/44x: Support 16K/64K base page sizes on 44x
powerpc: Force memory size to be a multiple of PAGE_SIZE
powerpc/32: Wire up the trampoline code for kdump
powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M
powerpc/32: Allow __ioremap on RAM addresses for kdump kernel
powerpc/32: Setup OF properties for kdump
powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs()
powerpc: Prepare xmon_save_regs for use with kdump
powerpc: Remove default kexec/crash_kernel ops assignments
powerpc: Make default kexec/crash_kernel ops implicit
powerpc: Setup OF properties for ppc32 kexec
powerpc/pseries: Fix cpu hotplug
powerpc: Fix KVM build on ppc440
powerpc/cell: add QPACE as a separate Cell platform
powerpc/cell: fix build breakage with CONFIG_SPUFS disabled
powerpc/mpc5200: fix error paths in PSC UART probe function
powerpc/mpc5200: add rts/cts handling in PSC UART driver
powerpc/mpc5200: Make PSC UART driver update serial errors counters
powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver
powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver
...
Fix trivial conflict in drivers/char/Makefile as per Paul's directions
This adds support for 16k and 64k page sizes on PowerPC 44x processors.
The PGDIR table is much smaller than a page when using 16k or 64k
pages (512 and 32 bytes respectively) so we allocate the PGDIR with
kzalloc() instead of __get_free_pages().
One PTE table covers rather a large memory area when using 16k or 64k
pages (32MB or 512MB respectively), so we can easily put FIXMAP and
PKMAP in the area covered by one PTE table.
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Vladimir Panfilov <pvr@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Ensure that total memory size is page-aligned, because otherwise
mark_bootmem() gets upset.
This error case was triggered by using 64 KiB pages in the kernel
while arch/powerpc/boot/4xx.c arbitrarily reduced the amount of memory
by 4096 (to work around a chip bug that affects the last 256 bytes of
physical memory).
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
arch_setup_additional_pages currently gets two arguments, the binary
format descripton and an indication if the process uses an executable
stack or not. The second argument is not used by anybody, it could
be removed without replacement.
What actually does make sense is to pass an indication if the process
uses the elf interpreter or not. The glibc code will not use anything
from the vdso if the process does not use the dynamic linker, so for
statically linked binaries the architecture backend can choose not
to map the vdso.
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Wire up the trampoline code for ppc32 to relay exceptions from the
vectors at address 0 to vectors at address 32MB, and modify Kconfig
to enable Kdump support for all classic powerpcs.
Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add the ability for a classic ppc kernel to be loaded at an address
of 32MB. This done by fixing a few places that assume we are loaded
at address 0, and by changing several uses of KERNELBASE to use
PAGE_OFFSET, instead.
Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Refactor the setting of kdump OF properties, moving the common code
from machine_kexec_64.c to machine_kexec.c where it can be used on
both ppc64 and ppc32. This will be needed for kdump to work on ppc32
platforms.
Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This replaces the dummy crash_setup_regs function with full-fledged
crash_setup_regs implementation. On PPC32 we simply use the new
ppc_save_regs function to dump the registers.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Today the arch/powerpc/xmon/setjmp.S file contains only the
xmon_save_regs function. We want to use it for kdump purposes, so
let's move the file into arch/powerpc/kernel/ and give the function a
more generic name (ppc_save_regs).
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This removes the need for each platform to specify default kexec and
crash kernel ops, thus effectively adds a working kexec support for
most 6xx/7xx/7xxx-based boards.
Platforms that can't cope with default ops will explode in some weird
way (a hang or reboot is most likely), which means that the board's
kexec support should be fixed or blacklisted via dummy _prepare
callback returning -ENOSYS.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Refactor the setting of kexec OF properties, moving the common code
from machine_kexec_64.c to machine_kexec.c where it can be used on
both ppc64 and ppc32. This is needed for kexec to work on ppc32
platforms.
Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
After discussing with chip designers, it appears that it's not
necessary to set G everywhere on 440 cores. The various core
errata related to prefetch should be sorted out by firmware by
disabling icache prefetching in CCR0. We add the workaround to
the kernel however just in case oooold firmwares don't do it.
This is valid for -all- 4xx core variants. Later ones hard wire
the absence of prefetch but it doesn't harm to clear the bits
in CCR0 (they should already be cleared anyway).
We still leave G=1 on the linear mapping for now, we need to
stop over-mapping RAM to be able to remove it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in
in the hash code based on some CPU feature bit. We also manipulate
_PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places.
This changes the logic so that instead, the PTE now contains
_PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms
that need it. The hash code clears it if the feature bit is not set.
It also adds some clean accessors to setup various valid combinations
of access flags and change various bits of code to use them instead.
This should help having the PTE actually containing the bit
combinations that we really want.
I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead
set it explicitely from the TLB miss. I will ultimately remove it
completely as it appears that it might not be needed after all
but in the meantime, having it in the TLB miss makes things a
lot easier.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This makes the MMU context code used for CPUs with no hash table
(except 603) dynamically allocate the various maps used to track
the state of contexts.
Only the main free map and CPU 0 stale map are allocated at boot
time. Other CPU maps are allocated when those CPUs are brought up
and freed if they are unplugged.
This also moves the initialization of the MMU context management
slightly later during the boot process, which should be fine as
it's really only needed when userland if first started anyways.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, the various forms of low level TLB invalidations are all
implemented in misc_32.S for 32-bit processors, in a fairly scary
mess of #ifdef's and with interesting duplication such as a whole
bunch of code for FSL _tlbie and _tlbia which are no longer used.
This moves things around such that _tlbie is now defined in
hash_low_32.S and is only used by the 32-bit hash code, and all
nohash CPUs use the various _tlbil_* forms that are now moved to
a new file, tlb_nohash_low.S.
I moved all the definitions for that stuff out of
include/asm/tlbflush.h as they are really internal mm stuff, into
mm/mmu_decl.h
The code should have no functional changes. I kept some variants
inline for trivial forms on things like 40x and 8xx.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This commit moves the whole no-hash TLB handling out of line into a
new tlb_nohash.c file, and implements some basic SMP support using
IPIs and/or broadcast tlbivax instructions.
Note that I'm using local invalidations for D->I cache coherency.
At worst, if another processor is trying to execute the same and
has the old entry in its TLB, it will just take a fault and re-do
the TLB flush locally (it won't re-do the cache flush in any case).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We're soon running out of CPU features and I need to add some new
ones for various MMU related bits, so this patch separates the MMU
features from the CPU features. I moved over the 32-bit MMU related
ones, added base features for MMU type families, but didn't move
over any 64-bit only feature yet.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This splits the mmu_context handling between 32-bit hash based
processors, 64-bit hash based processors and everybody else. This is
preliminary work for adding SMP support for BookE processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds supports to the "extended" DCR addressing via the indirect
mfdcrx/mtdcrx instructions supported by some 4xx cores (440H6 and
later).
I enabled the feature for now only on AMCC 460 chips.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Using the common code means that more complete cache information will
provided in sysfs on platforms that don't use the l2-cache property
convention.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The smp code uses cache information to populate cpu_core_map; change
it to use common code for cache lookup.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We have more than one piece of code that looks up cache nodes manually
using the "l2-cache" property. Add a common helper routine which does
this and handles ePAPR's "next-level-cache" property as well as
powermac.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since "Factor out cpu joining/unjoining the GIQ"
(b4963255ad) the WARN_ON in
xics_set_cpu_giq() is being triggered during boot on JS20 because the
GIQ indicator is not available on that platform. While the warning is
harmless and the system runs normally, it's nicer to check for the
existence of the indicator before trying to manipulate it.
Implement rtas_indicator_present(), which searches the
/rtas/rtas-indicators property for the given indicator token, and use
this function in xics_set_cpu_giq().
Also use a WARN statement in xics_set_cpu_giq to get better
information on failure.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
smp_hw_index isn't used on 64-bit, so move it from smp.c to
setup_32.c.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The `have_of' variable is a relic from the arch/ppc time, it isn't
useful nowadays.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
An example calling sequence which we did see:
copy_user_highpage -> kmap_atomic -> flush_tlb_page -> _tlbil_va
We got interrupted after setting up the MAS registers before the
tlbwe and the interrupt handler that caused the interrupt also did
a kmap_atomic (ide code) and thus on returning from the interrupt
the MAS registers no longer contained the proper values.
Since we dont save/restore MAS registers for normal interrupts we
need to disable interrupts in _tlbil_va to ensure atomicity.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Impact: change calling convention of existing clock_event APIs
struct clock_event_timer's cpumask field gets changed to take pointer,
as does the ->broadcast function.
Another single-patch change. For safety, we BUG_ON() in
clockevents_register_device() if it's not set.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Impact: change existing irq_chip API
Not much point with gentle transition here: the struct irq_chip's
setaffinity method signature needs to change.
Fortunately, not widely used code, but hits a few architectures.
Note: In irq_select_affinity() I save a temporary in by mangling
irq_desc[irq].affinity directly. Ingo, does this break anything?
(Folded in fix from KOSAKI Motohiro)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: jeremy@xensource.com
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Impact: cleanup
Each SMP arch defines these themselves. Move them to a central
location.
Twists:
1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a
CONFIG_INIT_ALL_POSSIBLE for this rather than break them.
2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'.
Those archs simply have phys_cpu_present_map replaced everywhere.
3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky
so I just manipulate them both in sync.
4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map'
declarations.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Mike Travis <travis@sgi.com>
Cc: ink@jurassic.park.msu.ru
Cc: rmk@arm.linux.org.uk
Cc: starvik@axis.com
Cc: tony.luck@intel.com
Cc: takata@linux-m32r.org
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: paulus@samba.org
Cc: schwidefsky@de.ibm.com
Cc: lethal@linux-sh.org
Cc: wli@holomorphy.com
Cc: davem@davemloft.net
Cc: jdike@addtoit.com
Cc: mingo@redhat.com
The 440x5 core in the Virtex5 uses the 440A type machine check
(ie, they have MCSRR0/MCSRR1). They thus need to call the
appropriate fixup function to hook the right variant of the
exception.
Without this, all machine checks become fatal due to loss
of context when entering the exception handler.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Added 85xx specifc smp_ops structure. We use ePAPR style boot release
and the MPIC for IPIs at this point.
Additionally added routines for secondary cpu entry and initializtion.
Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The initial TLB mapping for the kernel boot didn't set the memory coherent
attribute, MAS2[M], in SMP mode.
If this code supported booting a secondary processor, which it doesn't yet,
but if it did, then when a secondary processor boots, it would probably signal
the primary processor by setting a variable called something like
__secondary_hold_acknowledge. However, due to the lack of the M bit, the
primary processor would not snoop the transaction (even if a transaction were
broadcast). If primary CPU's L1 D-cache had a copy, it would not be flushed
and the CPU would never see the ack. Which would have resulted in the primary
CPU spinning for a long time, perhaps a full second before it gives up, while
it would have waited for the ack from the secondary CPU that it wouldn't have
been able to see because of the stale cache.
The value of MAS2 for the boot page TLB1 entry is a compile time constant,
so there is no need to calculate it in powerpc assembly language.
Also, from the MPC8572 manual section 6.12.5.3, "Bits that represent
offsets within a page are ignored and should be cleared." Existing code
didn't clear them, this code does.
The same when the page of KERNELBASE is found; we don't need to use asm to
mask the lower 12 bits off.
In the code that computes the address to rfi from, don't hard code the
offset to 24 bytes, but have the assembler figure that out for us.
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This patch add the handlers of SPE/EFP exceptions.
The code is used to emulate float point arithmetic,
when MSR(SPE) is enabled and receive EFP data interrupt or EFP round interrupt.
This patch has no conflict with or dependence on FP math-emu.
The code has been tested by TestFloat.
Now the code doesn't support SPE/EFP instructions emulation
(it won't be called when receive program interrupt),
but it could be easily added.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
ibmebus_free_irq() frees the IRQ but does not remove its mapping, which
results in stale entries in the map.
This fixes it by adding a call to irq_dispose_mapping() in
ibmebus_free_irq().
Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
As noted by Akinobu Mita in commit b1fceac2 ("x86: remove unnecessary
memset and NULL check after alloc_bootmem()"), alloc_bootmem and
related functions never return NULL and always return a zeroed region
of memory. Thus a NULL test or memset after calls to these functions
is unnecessary.
This was fixed using the following semantic patch.
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression E;
statement S;
@@
E = \(alloc_bootmem\|alloc_bootmem_low\|alloc_bootmem_pages\|alloc_bootmem_low_pages\|alloc_bootmem_node\|alloc_bootmem_low_pages_node\|alloc_bootmem_pages_node\)(...)
... when != E
(
- BUG_ON (E == NULL);
|
- if (E == NULL) S
)
@@
expression E,E1;
@@
E = \(alloc_bootmem\|alloc_bootmem_low\|alloc_bootmem_pages\|alloc_bootmem_low_pages\|alloc_bootmem_node\|alloc_bootmem_low_pages_node\|alloc_bootmem_pages_node\)(...)
... when != E
- memset(E,0,E1);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We need to swap these out once we start using swiotlb, so add
them to dma_ops. Create CONFIG_PPC_NEED_DMA_SYNC_OPS Kconfig
option; this is currently enabled automatically if we're
CONFIG_NOT_COHERENT_CACHE. In the future, this will also
be enabled for builds that need swiotlb. If PPC_NEED_DMA_SYNC_OPS
is not defined, the dma_sync_*_for_* ops compile to nothing.
Otherwise, they access the dma_ops pointers for the sync ops.
This patch also changes dma_sync_single_range_* to actually
sync the range - previously it was using a generous
dma_sync_single. dma_sync_single_* is now implemented
as a dma_sync_single_range with an offset of 0.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On my screen, when something crashes, I only have space for maybe 16
functions of the stack trace before the information above it scrolls
off the screen. It's easy to hack the kernel to print out only that
much, but it's harder to remember to do it. This introduces a config
option for it so that I can keep the setting in my config.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On PowerPC 4xx or other non cache-coherent platforms, we lost the
appropriate cache flushing in dma_map_sg() when merging the 32 and
64-bit DMA code (commit 4fc665b88a,
"powerpc: Merge 32 and 64-bit dma code"). This restores it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
attr_smt_snooze_delay is only defined for CONFIG_PPC64, so protect the
attribute removal with the same condition. This fixes this build error
on 32-bit SMP configurations:
/data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c: In function ‘unregister_cpu_online’:
/data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c:722: error: ‘attr_smt_snooze_delay’ undeclared (first use in this function)
/data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c:722: error: (Each undeclared identifier is reported only once
/data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c:722: error: for each function it appears in.)
Signed-off-by: Paul Mackerras <paulus@samba.org>
It turns out that on Cell, on a kernel with CONFIG_VIRT_CPU_ACCOUNTING
= y, if a program sets the SO (summary overflow) bit in the XER and
then does a system call, the SO bit in CR0 will be set on return
regardless of whether the system call detected an error. Since CR0.SO
is used as the error indication from the system call, this means that
all system calls appear to fail.
The reason is that the workaround for the timebase bug on Cell uses a
compare instruction. With CONFIG_VIRT_CPU_ACCOUNTING = y, the
ACCOUNT_CPU_USER_ENTRY macro reads the timebase, so we end up doing a
compare instruction, which copies XER.SO to CR0.SO. Since we were
doing this in the system call entry patch after clearing CR0.SO but
before saving the CR, this meant that the saved CR image had CR0.SO
set if XER.SO was set on entry.
This fixes it by moving the clearing of CR0.SO to after the
ACCOUNT_CPU_USER_ENTRY call in the system call entry path.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, some PCIe devices on POWER6 machines do not get interrupts
assigned correctly. The problem is that OF doesn't create an
"interrupt" property for them. The fix is for of_irq_map_pci to fall
back to using the value in the PCI interrupt-pin register in config
space, as we do when there is no OF device-tree node for the device.
I have verified that this works fine with a pair of Squib-E SAS
adapter on a P6-570.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Impact: fix for PowerPC 32 code
There were some early init code that was not safe for static
ftrace to boot on my PowerBook. This code must only use relative
addressing, and static mcount performs a compare of the
ftrace_trace_function pointer, and gets that with an absolute address.
In the early init boot up code, this will cause a fault.
This patch removes tracing from the files containing the offending
functions.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: clean up
Paul Mackerras pointed out that the code to determine if the branch
can reach the destination is incorrect. Michael Ellerman suggested
to pull out the code from create_branch and use that.
Simply using create_branch is probably the best.
Reported-by: Michael Ellerman <michael@ellerman.id.au>
Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix to PowerPC code modification
After modifying code it is essential to flush the icache. This patch
adds the missing flush.
Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: clean up and robustness addition
This patch addresses the comments made by Paul Mackerras.
It removes the type casting between unsigned int and unsigned char
pointers, and replaces them with a use of all unsigned int.
Verification that the jump is indeed made to a trampoline has also
been added.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: quicken mcount calls that are not replaced by dyn ftrace
Dynamic ftrace no longer does on the fly recording of mcount locations.
The mcount locations are now found at compile time. The mcount
function no longer needs to store registers and call a stub function.
It can now just simply return.
Since there are some functions that do not get converted to a nop
(.init sections and other code that may disappear), this patch should
help speed up that code.
Also, the stub for mcount on PowerPC 32 can not be a simple branch
link register like it is on PowerPC 64. According to the ABI specification:
"The _mcount routine is required to restore the link register from
the stack so that the profiling code can be inserted transparently,
whether or not the profiled function saves the link register itself."
This means that we must restore the link register that was used
to make the call to mcount. The minimal mcount function for PPC32
ends up being:
mcount:
mflr r0
mtctr r0
lwz r0, 4(r1)
mtlr r0
bctr
Where we move the link register used to call mcount into the
ctr register, and then restore the link register from the stack.
Then we use the ctr register to jump back to the mcount caller.
The r0 register is free for us to use.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: add ability to trace modules on 32 bit PowerPC
This patch performs the necessary trampoline calls to handle
modules with dynamic ftrace on 32 bit PowerPC.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: Allow 64 bit PowerPC to trace modules with dynamic ftrace
This adds code to handle the PPC64 module trampolines, and allows for
PPC64 to use dynamic ftrace.
Thanks to Paul Mackerras for these updates:
- fix the mod and rec->arch.mod NULL checks.
- fix to is_bl_op compare.
Thanks to Milton Miller for:
- finding the nasty race with using two nops, and recommending
instead that I use a branch 8 forward.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: use cleaner probe_kernel API over assembly
Using probe_kernel_read/write interface is a much cleaner approach
than the current assembly version.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: update to PowerPC ftrace arch API
This patch converts PowerPC to use the new dynamic ftrace arch API.
Thanks to Paul Mackennas for pointing out the mistakes of my original
test_24bit_addr function.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: fix for irq off latency tracer
When idle is called, interrupts are disabled, but the idle function
will still wake up on an interrupt. The problem is that the interrupt
disabled latency tracer will take this call to idle as a latency.
This patch disables the latency tracing when going into idle.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
With the new generic smp call function helpers, I noticed the code in
smp_message_recv was a single function call in many cases. While
getting the message number from the ipi data is easy, we can reduce
the path length by a function and data-dependent switch by registering
seperate IPI actions for these simple calls.
Originally I left the ipi action array exposed, but then I realized the
registration code should be common too.
The three users each had their own name array, so I made a fourth
to convert all users to use a common one.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Those cores use the 440A type machine check (ie, they have
MCSRR0/MCSRR1). They thus need to call the appropriate fixup
function to hook the right variant of the exception.
Without this, all machine checks become fatal due to loss
of context when entering the exception handler.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
The new context may not be 16-byte aligned, so the real address of the
mcontext structure should be read from the uc_regs pointer instead of
directly using the (unaligned) uc_mcontext field.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since we started using the generic timekeeping code, we haven't had a
powerpc-specific version of do_gettimeofday, and hence there is now
nothing that reads the do_gtod variable in arch/powerpc/kernel/time.c.
This therefore removes it and the code that sets it.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently the clock_gettime implementation in the VDSO produces a
result with microsecond resolution for the cases that are handled
without a system call, i.e. CLOCK_REALTIME and CLOCK_MONOTONIC. The
nanoseconds field of the result is obtained by computing a
microseconds value and multiplying by 1000.
This changes the code in the VDSO to do the computation for
clock_gettime with nanosecond resolution. That means that the
resolution of the result will ultimately depend on the timebase
frequency.
Because the timestamp in the VDSO datapage (stamp_xsec, the real time
corresponding to the timebase count in tb_orig_stamp) is in units of
2^-20 seconds, it doesn't have sufficient resolution for computing a
result with nanosecond resolution. Therefore this adds a copy of
xtime to the VDSO datapage and updates it in update_gtod() along with
the other time-related fields.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This does a few cosmetic cleanups, moving a couple of things around
but without actually changing what the code does.
(There is a minor change in ordering of operations in
pcibios_setup_bus_devices but it should have no impact).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The pseries PCI hotplug code has a number of issues, ranging from
incorrect resource setup to crashes, depending on what is added,
when, whether it contains a bridge, etc etc....
This fixes a whole bunch of these, while actually simplifying the code
a bit, using more generic code in the process and factoring out common
code between adding of a PHB, a slot or a device.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
To properly fix PCI hotplug, it's useful to be able to make the fixup
passes on all devices whether they were just hot plugged or already
there.
However, pcibios_allocate_bus_resources() wouldn't cope well with
being called twice for a given bus. This makes it ignore resources
that have already been allocated, along with adding a bit of debug
output.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, our PCI code uses the pcibios_fixup_bus() callback, which
is called by the generic code when probing PCI buses, for two
different things.
One is to set up things related to the bus itself, such as reading
bridge resources for P2P bridges, fixing them up, or setting up the
iommu's associated with bridges on some platforms.
The other is some setup for each individual device under that bridge,
mostly setting up DMA mappings and interrupts.
The problem is that this approach doesn't work well with PCI hotplug
when an existing bus is re-probed for new children. We fix this
problem by splitting pcibios_fixup_bus into two routines:
pcibios_setup_bus_self() is now called to setup the bus itself
pcibios_setup_bus_devices() is now called to setup devices
pcibios_fixup_bus() is then modified to call these two after reading the
bridge bases, and the OF based PCI probe is modified to avoid calling
into the first one when rescanning an existing bridge.
[paulus@samba.org - fixed eeh.h for 32-bit compile now that pci-common.c
is including it unconditionally.]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The function pcibios_do_bus_setup() was used by pcibios_fixup_bus()
to perform setup that is different between the 32-bit and 64-bit
code. This difference no longer exists, thus the function is removed
and the setup now done directly from pci-common.c.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 32-bit and 64-bit powerpc PCI code used to set up the resource
pointers of the root bus of a given PHB in completely different
places.
This unifies this in large part, by making 32-bit use a routine very
similar to what 64-bit does when initially scanning the PCI busses.
The actual setup of the PHB resources itself is then moved to a
common function in pci-common.c.
This should cause no functional change on 64-bit. On 32-bit, the
effect is that the PHB resources are going to be setup a bit earlier,
instead of being setup from pcibios_fixup_bus().
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This removes the various DBG() macro from the powerpc PCI code and
makes it use the standard pr_debug instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A new field has been added to the VPA as a method for the client OS to
communicate to firmware the number of page-ins it is performing when
running collaborative memory overcommit. The hypervisor will use this
information to better determine if a partition is experiencing memory
pressure and needs more memory allocated to it.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When no hardware method is provided to sync the timebase registers
across the machine, and the platform doesn't sync them for us, then we
use a generic software implementation. Currently, the code for that
has many printks, and they don't have log levels. Most of the printks
are only useful for debugging the code, and since we haven't had any
problems with it for years, this turns them into pr_debug.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The code to properly expose domain numbers in /proc is somewhat
bogus on ppc64 as it depends on the "buid" field being non-0,
but that field is really pseries specific.
This removes that code and makes ppc64 use the same code as 32-bit
which effectively decides whether to expose domains based on
ppc_pci_flags set by the platform, and sets the default for 64-bit
to enable domains and enable compatibility for domain 0 (which
strips the domain number for domain 0 to help with X servers).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (23 commits)
Revert "powerpc: Sync RPA note in zImage with kernel's RPA note"
powerpc: Fix compile errors with CONFIG_BUG=n
powerpc: Fix format string warning in arch/powerpc/boot/main.c
powerpc: Fix bug in kernel copy of libfdt's fdt_subnode_offset_namelen()
powerpc: Remove duplicate DMA entry from mpc8313erdb device tree
powerpc/cell/OProfile: Fix on-stack array size in activate spu profiling function
powerpc/mpic: Fix regression caused by change of default IRQ affinity
powerpc: Update remaining dma_mapping_ops to use map/unmap_page
powerpc/pci: Fix unmapping of IO space on 64-bit
powerpc/pci: Properly allocate bus resources for hotplug PHBs
OF-device: Don't overwrite numa_node in device registration
powerpc: Fix swapcontext system for VSX + old ucontext size
powerpc: Fix compiler warning for the relocatable kernel
powerpc: Work around ld bug in older binutils
powerpc/ppc64/kdump: Better flag for running relocatable
powerpc: Use is_kdump_kernel()
powerpc: Kexec exit should not use magic numbers
powerpc/44x: Update 44x defconfigs
powerpc/40x: Update 40x defconfigs
powerpc: enable heap randomization for linkstations
...
This reverts commit 91a0030295, plus
commit 0dcd440120 ("powerpc: Revert CHRP
boot wrapper to real-base = 12MB on 32-bit") which depended on it.
Commit 91a00302 was causing NVRAM corruption on some pSeries machines,
for as-yet unknown reasons, so this reverts it until the cause is
identified.
Signed-off-by: Paul Mackerras <paulus@samba.org>
After the merge of the 32 and 64bit DMA code, dma_direct_ops lost
their map/unmap_single() functions but gained map/unmap_page(). This
caused a problem for Cell because Cell's dma_iommu_fixed_ops called
the dma_direct_ops if the fixed linear mapping was to be used or the
iommu ops if the dynamic window was to be used. So in order to fix
this problem we need to update the 64bit DMA code to use
map/unmap_page.
First, we update the generic IOMMU code so that iommu_map_single()
becomes iommu_map_page() and iommu_unmap_single() becomes
iommu_unmap_page(). Then we propagate these changes up through all
the callers of these two functions and in the process update all the
dma_mapping_ops so that they have map/unmap_page rahter than
map/unmap_single. We can do this because on 64bit there is no HIGHMEM
memory so map/unmap_page ends up performing exactly the same function
as map/unmap_single, just taking different arguments.
This has no affect on drivers because the dma_map_single_attrs() just
ends up calling the map_page() function of the appropriate
dma_mapping_ops and similarly the dma_unmap_single_attrs() calls
unmap_page().
This fixes an oops on Cell blades, which oops on boot without this
because they call dma_direct_ops.map_single, which is NULL.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A typo/thinko made us pass the wrong argument to __flush_hash_table_range
when unplugging bridges, thus not flushing all the translations for
the IO space on unplug. The third parameter to __flush_hash_table_range
is `end', not `size'.
This causes the hypervisor to refuse unplugging slots.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Resources for PHB's that are dynamically added to a system are not
properly allocated in the resource tree.
Not having these resources allocated causes an oops when removing
the PHB when we try to release them.
The diff appears a bit messy, this is mainly due to moving everything
one tab to the left in the pcibios_allocate_bus_resources routine.
The functionality change in this routine is only that the
list_for_each_entry() loop is pulled out and moved to the necessary
calling routine.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, the numa_node of OF-devices will be overwritten during
device_register, which simply sets the node to -1. On cell machines,
this means that devices can't find their IOMMU, which is referenced
through the device's numa node.
Set the numa node for OF devices with no parent, and use the
lower-level device_initialize and device_add functions, so that the
node is preserved.
We can remove the call to set_dev_node in of_device_alloc, as it
will be overwritten during register.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since VSX support was added, we now have two sizes of ucontext_t;
the older, smaller size without the extra VSX state, and the new
larger size with the extra VSX state. A program using the
sys_swapcontext system call and supplying smaller ucontext_t
structures will currently get an EINVAL error if the task has
used VSX (e.g. because of calling library code that uses VSX) and
the old_ctx argument is non-NULL (i.e. the program is asking for
its current context to be saved). Thus the program will start
getting EINVAL errors on calls that previously worked.
This commit changes this behaviour so that we don't send an EINVAL in
this case. It will now return the smaller context but the VSX MSR bit
will always be cleared to indicate that the ucontext_t doesn't include
the extra VSX state, even if the task has executed VSX instructions.
Both 32 and 64 bit cases are updated.
[paulus@samba.org - also fix some access_ok() and get_user() calls]
Thanks to Ben Herrenschmidt for noticing this problem.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fixes this warning:
arch/powerpc/kernel/setup_64.c:447:5: warning: "kernstart_addr" is not defined
which arises because PHYSICAL_START is no longer a constant when
CONFIG_RELOCATABLE=y.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit 549e8152de ("powerpc: Make the
64-bit kernel as a position-independent executable") added lines to
vmlinux.lds.S to add the extra sections needed to implement a
relocatable kernel. However, those lines seem to trigger a bug in
older versions of GNU ld (such as 2.16.1) when building a
non-relocatable kernel. Since ld 2.16.1 is still a popular choice for
cross-toolchains, this adds an #ifdef to vmlinux.lds.S so the added
lines are only included when building a relocatable kernel.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The __kdump_flag ABI is overly constraining for future development.
As of 2.6.27, the kernel entry point has 4 constraints: Offset 0 is
the starting point for the master (boot) cpu (entered with r3 pointing
to the device tree structure), offset 0x60 is code for the slave cpus
(entered with r3 set to their device tree physical id), offset 0x20 is
used by the iseries hypervisor, and secondary cpus must be well behaved
when the first 256 bytes are copied to address 0.
Placing the __kdump_flag at 0x18 is bad because:
- It was taking the last 8 bytes before the iseries hypervisor data.
- It was 8 bytes for a boolean flag
- It had no way of identifying that the flag was present
- It does leave any room for the master to add any additional code
before branching, which hurts debug.
- It will be unnecessarily hard for 32 bit code to be common (8 bytes)
Now that we have eliminated the use of __kdump_flag in favor of
the standard is_kdump_kernel(), this flag only controls run without
relocating the kernel to PHYSICAL_START (0), so rename it __run_at_load.
Move the flag to 0x5c, 1 word before the secondary cpu entry point at
0x60. Initialize it with "run0" to say it will run at 0 unless it is
set to 1. It only exists if we are relocatable.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
linux/crash_dump.h defines is_kdump_kernel() to be used by code that
needs to know if the previous kernel crashed instead of a (clean) boot
or reboot.
This updates the just added powerpc code to use it. This is needed
for the next commit, which will remove __kdump_flag.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit 54622f10a6 ("powerpc: Support for
relocatable kdump kernel") added a magic flag value in a register to
tell purgatory that it should be a panic kernel. This part is wrong
and is reverted by this commit.
The kernel gets a list of memory blocks and a entry point from user space.
Its job is to copy the blocks into place and then branch to the designated
entry point (after turning "off" the mmu).
The user space tool inserts a trampoline, called purgatory, that runs
before the user supplied code. Its job is to establish the entry
environment for the new kernel or other application based on the contents
of memory. The purgatory code is compiled and embedded in the tool,
where it is later patched using the elf symbol table using elf symbols.
Since the tool knows it is creating a purgatory that will run after a
kernel crash, it should just patch purgatory (or the kernel directly)
if something needs to happen.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'x86/um-header' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
x86: canonicalize remaining header guards
x86: drop double underscores from header guards
x86: Fix ASM_X86__ header guards
x86, um: get rid of uml-config.h
x86, um: get rid of arch/um/Kconfig.arch
x86, um: get rid of arch/um/os symlink
x86, um: get rid of excessive includes of uml-config.h
x86, um: get rid of header symlinks
x86, um: merge Kconfig.i386 and Kconfig.x86_64
x86, um: get rid of sysdep symlink
x86, um: trim the junk from uml ptrace-*.h
x86, um: take vm-flags.h to sysdep
x86, um: get rid of uml asm/arch
x86, um: get rid of uml highmem.h
x86, um: get rid of uml unistd.h
x86, um: get rid of system.h -> system.h include
x86, um: uml atomic.h is not needed anymore
x86, um: untangle uml ldt.h
x86, um: get rid of more uml asm/arch uses
x86, um: remove dead header (uml module-generic.h; never used these days)
...
The entire file of ftrace.c in the arch code needs to be marked
as notrace. It is much cleaner to do this from the Makefile with
CFLAGS_REMOVE_ftrace.o.
[ powerpc already had this in its Makefile. ]
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The arch dependent function ftrace_mcount_set was only used by the daemon
start up code. This patch removes it.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the only theoretical reason for it these days is ppc; aside of uml/ppc
being dead, do_signal() would be happier in arch/powerpc/kernel/signal.h
anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This adds relocatable kernel support for kdump. With this one can
use the same regular kernel to capture the kdump. A signature (0xfeed1234)
is passed in r6 from panic code to the next kernel through kexec_sequence
and purgatory code. The signature is used to differentiate between
kdump kernel and non-kdump kernels.
The purgatory code compares the signature and sets the __kdump_flag in
head_64.S. During the boot up, kernel code checks __kdump_flag and if it
is set, the kernel will behave as relocatable kdump kernel. This kernel
will boot at the address where it was loaded by kexec-tools ie. at the
address reserved through crashkernel boot parameter.
CONFIG_CRASH_DUMP depends on CONFIG_RELOCATABLE option to build kdump
kernel as relocatable. So the same kernel can be used as production and
kdump kernel.
This patch incorporates the changes suggested by Paul Mackerras to avoid
GOT use and to avoid two copies of the code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Mohan Kumar M <mohan@in.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A patch of mine was recently committed to fix up STRICT_MM_TYPECHECKS
behaviour on powerpc (f5ea64dcba).
However, something which breaks it again seems to have slipped in
afterwards. So, here's another small fix.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Remove empty/bogus #else from signal_64.c
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Most of the platforms were printing the size of the memory
in their show_cpuinfo implementations. This moves that to
the common show_cpuinfo, so that all 32-bit platforms will
now print the size of memory. I also update the code
to deal with the fact that total_memory is now a phys_addr_t.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
These functions should have been static, and inspection shows they
are no longer used. (We used to parse mem= but we now defer that
to early_param).
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
64 bit powerpc requires the kexec user space tools avoid overwriting
the static kernel image and translation hash table when choosing
where to put memory image data because it copies the data into place
using the kernels virtual memory system. Kexec userspace determines
these and other areas blocked by reading properties the kernel adds,
but does not filter these properties when creating the device tree
for the next kernel.
When the second kernel tries to add its values for these properties,
the export via /proc/device-tree is hidden by the pre-existing but
stale values from the flat tree. Kexec userspace reads the old
property, allocates the new kernel at the old kernel's end, and
gets rejected by the overlap check.
Search and remove these stale properties before adding the new values.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There are two issues when we enable CONFIG_RELOCATABLE. The first is due
to the fact that phys_addr_t is now defined in linux/types.h. The second
is due to the fact that the DMA code changes expose memstart_addr to
prom_init.c
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
"unsigned int" speed cannot be negative, it's thus pointless
to test if it is.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds support for legacy_io and legacy_mem files in
bus class directories in sysfs for powerpc
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Due to confusion between the ftrace infrastructure and the gcc profiling
tracer "ftrace", this patch renames the config options from FTRACE to
FUNCTION_TRACER. The other two names that are offspring from FTRACE
DYNAMIC_FTRACE and FTRACE_MCOUNT_RECORD will stay the same.
This patch was generated mostly by script, and partially by hand.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
o elfcorehdr_addr is used by not only the code under CONFIG_PROC_VMCORE
but also by the code which is not inside CONFIG_PROC_VMCORE. For
example, is_kdump_kernel() is used by powerpc code to determine if
kernel is booting after a panic then use previous kernel's TCE table.
So even if CONFIG_PROC_VMCORE is not set in second kernel, one should be
able to correctly determine that we are booting after a panic and setup
calgary iommu accordingly.
o So remove the assumption that elfcorehdr_addr is under
CONFIG_PROC_VMCORE.
o Move definition of elfcorehdr_addr to arch dependent crash files.
(Unfortunately crash dump does not have an arch independent file
otherwise that would have been the best place).
o kexec.c is not the right place as one can Have CRASH_DUMP enabled in
second kernel without KEXEC being enabled.
o I don't see sh setup code parsing the command line for
elfcorehdr_addr. I am wondering how does vmcore interface work on sh.
Anyway, I am atleast defining elfcoredhr_addr so that compilation is not
broken on sh.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (134 commits)
KVM: ia64: Add intel iommu support for guests.
KVM: ia64: add directed mmio range support for kvm guests
KVM: ia64: Make pmt table be able to hold physical mmio entries.
KVM: Move irqchip_in_kernel() from ioapic.h to irq.h
KVM: Separate irq ack notification out of arch/x86/kvm/irq.c
KVM: Change is_mmio_pfn to kvm_is_mmio_pfn, and make it common for all archs
KVM: Move device assignment logic to common code
KVM: Device Assignment: Move vtd.c from arch/x86/kvm/ to virt/kvm/
KVM: VMX: enable invlpg exiting if EPT is disabled
KVM: x86: Silence various LAPIC-related host kernel messages
KVM: Device Assignment: Map mmio pages into VT-d page table
KVM: PIC: enhance IPI avoidance
KVM: MMU: add "oos_shadow" parameter to disable oos
KVM: MMU: speed up mmu_unsync_walk
KVM: MMU: out of sync shadow core
KVM: MMU: mmu_convert_notrap helper
KVM: MMU: awareness of new kvm_mmu_zap_page behaviour
KVM: MMU: mmu_parent_walk
KVM: x86: trap invlpg
KVM: MMU: sync roots on mmu reload
...
This is a preparation patch for introducing a generic iommu_num_pages function.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nothing arch specific in get/settimeofday. The details of the timeval
conversion varied a little from arch to arch, but all with the same
results.
Also add an extern declaration for sys_tz to linux/time.h because externs
in .c files are fowned upon. I'll kill the externs in various other files
in a sparate patch.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net> [ sparc bits ]
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct stat / compat_stat is the same on all architectures, so
cp_compat_stat should be, too.
Turns out it is, except that various architectures have slightly and some
high2lowuid/high2lowgid or the direct assignment instead of the
SET_UID/SET_GID that expands to the correct one anyway.
This patch replaces the arch-specific cp_compat_stat implementations with
a common one based on the x86-64 one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net> [ sparc bits ]
Acked-by: Kyle McMartin <kyle@mcmartin.ca> [ parisc bits ]
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we use TID=N userspace mappings, we must ensure that kernel mappings have
been destroyed when entering userspace. Using TID=1/TID=0 for kernel/user
mappings and running userspace with PID=0 means that userspace can't access the
kernel mappings, but the kernel can directly access userspace.
The net is that we don't need to flush the TLB on privilege switches, but we do
on guest context switches (which are far more infrequent). Guest boot time
performance improvement: about 30%.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Track which TLB entries need to be written, instead of overwriting everything
below the high water mark. Typically only a single guest TLB entry will be
modified in a single exit.
Guest boot time performance improvement: about 15%.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
We're saving the host TLB state to memory on every exit, but never using it.
Originally I had thought that we'd want to restore host TLB for heavyweight
exits, but that could actually hurt when context switching to an unrelated host
process (i.e. not qemu).
Since this decreases the performance penalty of all exits, this patch improves
guest boot time by about 15%.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
prom_init was changed to take a new argument, the address
where the kernel is loaded, which is now used to copy the
SMP spin loop down before use.
However, only head_64.S was adapted to pass this new value,
not head_32.S, thus breaking SMP boot on 32-bit SMP CHRP
machines.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The new merged DMA code will try to access isa_bridge_pcidev when
trying to DMA to/from legacy devices. This is however only defined
on 64-bit. Fixes this for now by adding the variable, even if it
stays NULL. In the long run, we'll make isa-bridge.c common to
32 and 64-bit.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When the powerpc PCI layer is not configured to re-assign everything,
it currently fails to detect that a PCI to PCI bridge has been left
unassigned by the firmware and tries to allocate resource for the
default window values in the bridge (0...X) (with the notable exception
of a hack we have in there that detects some Apple firmware unassigned
bridge resources).
This results in resource allocation failures, which are generally
fixed up later on but it causes scary warnings in the logs and we
have seen the fixup code fall over in some circumstances (a different
issue to fix as well).
This code improves that by providing a more complete & useful function
to intuit that a bridge was left unassigned by the firmware, and thus
force a full re-allocation by the PCI code without trying to allocate
the existing useless resources first.
The algorithm we use basically considers unassigned a window that
starts at 0 (PCI address) if the corresponding address space enable
bit is not set. In addition, for memory space, it considers such a
resource unassigned also if the host bridge isn't configured to
forward cycles to address 0 (ie, the resource basically overlaps
main memory).
This fixes a range of problems with things like Bare-Metal support
on pSeries machines, or attempt to use partial firmware PCI setup.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The "phys" argument to machine_init() isn't used and isn't likely to
ever be so let's remove it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
After Becky's work we can almost have different DMA offsets
between on-chip devices and PCI. Almost because there's a
problem with the non-coherent DMA code that basically ignores
the programmed offset to use the global one for everything.
This fixes it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
b38fd42ff4 added false dependencys
to order the load of upper and lower halfs of the pte, but only
adjusted whitespace instead of deleting the old load in the iside
handler, letting the hardware see the non-dependent load.
This patch removes the extra load.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The class of the MPC5121 pci host bridge is PCI_CLASS_BRIDGE_OTHER
while other freescale host bridges have class set to
PCI_CLASS_PROCESSOR_POWERPC.
This patch makes fixup_hide_host_resource_fsl match
PCI_CLASS_BRIDGE_OTHER in addition to PCI_CLASS_PROCESSOR_POWERPC.
Signed-off-by: John Rigby <jrigby@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The comment in the code was asking "Do we have to do this?", and according
to x86 and s390 the answer is no, the scheduler will do it before calling
the arch hook.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 9b09c6d909 ("powerpc: Change the
default link address for pSeries zImage kernels") changed the
real-base value in the CHRP note added by the addnote program from
12MB to 32MB to give more space for Open Firmware to load the zImage.
(The real-base value says where we want OF to position itself in
memory.) However, this change was ineffective on most pSeries
machines, because the RPA note added by addnote has the "ignore me"
flag set to 1. This was intended to tell OF to ignore just the RPA
note, but has the side effect of also making OF ignore the CHRP note
(at least on most pSeries machines).
To solve this we have to set the "ignore me" flag to 0 in the RPA
note. (We can't just omit the RPA note because that is equivalent to
having an RPA note with default values, and the default values are not
what we want.) However, then we have to make sure the values in the
zImage's RPA note match up with the values that the kernel supplies
later in prom_init.c with either the ibm,client-architecture-support
call or the process-elf-header call in prom_send_capabilities().
So this sets the "ignore me" flag in the RPA note in addnote to 0, and
adjusts the RPA note values in addnote.c and in prom_init.c to be
consistent with each other and with the values in ibm_architecture_vec.
However, since the wrapper is independent of the kernel, this doesn't
ensure that the notes will stay consistent. To ensure that, this adds
code to addnote.c so that it can extract the kernel's RPA note from
the kernel binary and put that in the zImage. To that end, we put the
kernel's fake ELF header (which contains the kernel's RPA note) into
its own section, and arrange for wrapper to pull out that section with
objcopy and pass it to addnote, which then extracts the RPA note from
it and transfers it to the zImage.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The powerpc 32-bit and 64-bit kernel_thread functions don't properly
propagate errors being returned by the clone syscall. (In the case of
error, the syscall exit code returns a positive errno in r3 and sets
the CR0[SO] bit.)
This patch fixes that by negating r3 if CR0[SO] is set after the syscall.
Signed-off-by: Josh Poimboeuf <jpoimboe@us.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When manipulating 64-bit PCI addresses, the code would lose the
top 32-bit in a couple of places when shifting a pfn due to missing
type casting from the 32-bit pfn to a 64-bit resource before the
shift.
This breaks using newer X servers for example on 440 machines
with the PCI bus above 32-bit.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A bug in my initial 64-bit hibernation code breaks it when using
page sizes that aren't 4K.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add a .gitignore in arch/powerpc/kernel to ignore the generated
vmlinux.lds.
Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix failure to shutdown with CPU hotplug
powerpc: Fix PCI in Holly device tree
I tracked down the shutdown regression to CPUs not dying
when being shut down during power-off. This turns out to
be due to the system_state being SYSTEM_POWER_OFF, which
this code doesn't take as a valid state for shutting off
CPUs in.
This has never made sense to me, but when I added hotplug
code to implement hibernate I only "made it work" and did
not question the need to check the system_state. Thomas
Gleixner helped me dig, but the only thing we found is
that it was added with the original commit that added CPU
hotplug support.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On the x86 arch, user space single step exceptions should be ignored
if they occur in the kernel space, such as ptrace stepping through a
system call.
First check if it is kgdb that is executing a single step, then ensure
it is not an accidental traversal into the user space, while in kgdb,
any other time the TIF_SINGLESTEP is set, kgdb should ignore the
exception.
On x86, arm, mips and powerpc, the kgdb_contthread usage was
inconsistent with the way single stepping is implemented in the kgdb
core. The arch specific stub should always set the
kgdb_cpu_doing_single_step correctly if it is single stepping. This
allows kgdb to correctly process an instruction steps if ptrace
happens to be requesting an instruction step over a system call.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
This rearranges a bit of code, and adds support for
36-bit physical addressing for configs that use a
hashed page table. The 36b physical support is not
enabled by default on any config - it must be
explicitly enabled via the config system.
This patch *only* expands the page table code to accomodate
large physical addresses on 32-bit systems and enables the
PHYS_64BIT config option for 86xx. It does *not*
allow you to boot a board with more than about 3.5GB of
RAM - for that, SWIOTLB support is also required (and
coming soon).
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Introduced a new set of low level tlb invalidate functions that do not
broadcast invalidates on the bus:
_tlbil_all - invalidate all
_tlbil_pid - invalidate based on process id (or mm context)
_tlbil_va - invalidate based on virtual address (ea + pid)
On non-SMP configs _tlbil_all should be functionally equivalent to _tlbia and
_tlbil_va should be functionally equivalent to _tlbie.
The intent of this change is to handle SMP based invalidates via IPIs instead
of broadcasts as the mechanism scales better for larger number of cores.
On e500 (fsl-booke mmu) based cores move to using MMUCSR for invalidate alls
and tlbsx/tlbwe for invalidate virtual address.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
We essentially adopt the 64-bit dma code, with some changes to support
32-bit systems, including HIGHMEM. dma functions on 32-bit are now
invoked via accessor functions which call the correct op for a device based
on archdata dma_ops. If there is no archdata dma_ops, this defaults
to dma_direct_ops.
In addition, the dma_map/unmap_page functions are added to dma_ops
because we can't just fall back on map/unmap_single when HIGHMEM is
enabled. In the case of dma_direct_*, we stop using map/unmap_single
and just use the page version - this saves a lot of ugly
ifdeffing. We leave map/unmap_single in the dma_ops definition,
though, because they are needed by the iommu code, which does not
implement map/unmap_page. Ideally, going forward, we will completely
eliminate map/unmap_single and just have map/unmap_page, if it's
workable for 64-bit.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Use the struct device's numa_node instead; use accessor functions
to get/set numa_node.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
32-bit platforms are about to start using dma.c; move the iommu
dma ops into their own file to make this a bit cleaner.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This is in preparation for the merge of the 32 and 64-bit
dma code in arch/powerpc.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
We need to create a false data dependency to ensure the loads of
the pte are done in the right order.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
arch/powerpc/kernel/sysfs.c:197:7: warning: "CONFIG_6xx" is not defined
arch/powerpc/kernel/sysfs.c:141: warning: 'run_on_cpu' defined but not used
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Commit deac93df26 ("lib: Correct printk
%pF to work on all architectures") broke the non modular builds by
moving an essential function into modules.c. Fix this by moving it
out again and into asm/sections.h as an inline. To do this, the
definition of struct ppc64_opd_entry has been lifted out of modules.c
and put in asm/elf.h where it belongs.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Some 74xx cores by Freescale are using the configuration field instead
of the major revision field for their revision number. This corrects
the wrong behaviour for those ppc cores including my one.
There is a reference document at Freecale. It describes the PVR
register. This is based on that pdf. You can find the document at:
http://www.freescale.com/files/archives/doc/support_info/PPCPVR.pdf
Signed-off-by: Martin Langer <martin-langer@gmx.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The radix trees used by interrupt controllers for their irq reverse
mapping (currently only the XICS found on pSeries) have a complex
locking scheme dating back to before the advent of the lockless radix
tree.
This takes advantage of the lockless radix tree and of the fact that
the items of the tree are pointers to a static array (irq_map)
elements which can never go under us to simplify the locking.
Concurrency between readers and writers is handled by the intrinsic
properties of the lockless radix tree. Concurrency between writers is
handled with a global mutex.
Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
irq_radix_revmap() currently serves 2 purposes, irq mapping lookup
and insertion which happen in interrupt and process context respectively.
Separate the function into its 2 components, one for lookup only and one
for insertion only.
Fix the only user of the revmap tree (XICS) to use the new functions.
Also, move the insertion into the radix tree of those irqs that were
requested before it was initialized at said tree initialization.
Mutual exclusion between the tree initialization and readers/writers is
handled via a state variable (revmap_trees_allocated) set to 1 when the tree
has been initialized and set to 2 after the already requested irqs have been
inserted in the tree by the init path. This state is checked before any reader
or writer access just like we used to check for tree.gfp_mask != 0 before.
Finally, now that we're not any longer inserting nodes into the radix-tree
in interrupt context, turn the GFP_ATOMIC allocations into GFP_KERNEL ones.
Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
sys32_pause is a useless copy of the generic sys_pause.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as
a position-independent executable (PIE) when it is set. This involves
processing the dynamic relocations in the image in the early stages of
booting, even if the kernel is being run at the address it is linked at,
since the linker does not necessarily fill in words in the image for
which there are dynamic relocations. (In fact the linker does fill in
such words for 64-bit executables, though not for 32-bit executables,
so in principle we could avoid calling relocate() entirely when we're
running a 64-bit kernel at the linked address.)
The dynamic relocations are processed by a new function relocate(addr),
where the addr parameter is the virtual address where the image will be
run. In fact we call it twice; once before calling prom_init, and again
when starting the main kernel. This means that reloc_offset() returns
0 in prom_init (since it has been relocated to the address it is running
at), which necessitated a few adjustments.
This also changes __va and __pa to use an equivalent definition that is
simpler. With the relocatable kernel, PAGE_OFFSET and MEMORY_START are
constants (for 64-bit) whereas PHYSICAL_START is a variable (and
KERNELBASE ideally should be too, but isn't yet).
With this, relocatable kernels still copy themselves down to physical
address 0 and run there.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Using LOAD_REG_IMMEDIATE to get the address of kernel symbols
generates 5 instructions where LOAD_REG_ADDR can do it in one,
and will generate R_PPC64_ADDR16_* relocations in the output when
we get to making the kernel as a position-independent executable,
which we'd rather not have to handle. This changes various bits
of assembly code to use LOAD_REG_ADDR when we need to get the
address of a symbol, or to use suitable position-independent code
for cases where we can't access the TOC for various reasons, or
if we're not running at the address we were linked at.
It also cleans up a few minor things; there's no reason to save and
restore SRR0/1 around RTAS calls, __mmu_off can get the return
address from LR more conveniently than the caller can supply it in
R4 (and we already assume elsewhere that EA == RA if the MMU is on
in early boot), and enable_64b_mode was using 5 instructions where
2 would do.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This changes the way that the exception prologs transfer control to
the handlers in 64-bit kernels with the aim of making it possible to
have the prologs separate from the main body of the kernel. Now,
instead of computing the address of the handler by taking the top
32 bits of the paca address (to get the 0xc0000000........ part) and
ORing in something in the bottom 16 bits, we get the base address of
the kernel by doing a load from the paca and add an offset.
This also replaces an mfmsr and an ori to compute the MSR value for
the handler with a load from the paca. That makes it unnecessary to
have a separate version of EXCEPTION_PROLOG_PSERIES that forces 64-bit
mode.
We can no longer use a direct branches in the exception prolog code,
which means that the SLB miss handlers can't branch directly to
.slb_miss_realmode any more. Instead we have to compute the address
and do an indirect branch. This is conditional on CONFIG_RELOCATABLE;
for non-relocatable kernels we use a direct branch as before. (A later
change will allow CONFIG_RELOCATABLE to be set on 64-bit powerpc.)
Since the secondary CPUs on pSeries start execution in the first 0x100
bytes of real memory and then have to get to wherever the kernel is,
we can't use a direct branch to get there. Instead this changes
__secondary_hold_spinloop from a flag to a function pointer. When it
is set to a non-NULL value, the secondary CPUs jump to the function
pointed to by that value.
Finally this eliminates one code difference between 32-bit and 64-bit
by making __secondary_hold be the text address of the secondary CPU
spinloop rather than a function descriptor for it.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This rearranges head_64.S so that we have all the first-level exception
prologs together starting at 0x100, followed by all the second-level
handlers that are invoked from the first-level prologs, followed by
other code. This doesn't make any functional change but will make
following changes for relocatable kernel support easier.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Kdump kernel needs to use only those memory regions that it is allowed
to use (crashkernel, rtas, tce, etc.). Each of these regions have
their own sizes and are currently added under 'linux,usable-memory'
property under each memory@xxx node of the device tree.
The ibm,dynamic-memory property of ibm,dynamic-reconfiguration-memory
node (on POWER6) now stores in it the representation for most of the
logical memory blocks with the size of each memory block being a
constant (lmb_size). If one or more or part of the above mentioned
regions lie under one of the lmb from ibm,dynamic-memory property,
there is a need to identify those regions within the given lmb.
This makes the kernel recognize a new 'linux,drconf-usable-memory'
property added by kexec-tools. Each entry in this property is of the
form of a count followed by that many (base, size) pairs for the above
mentioned regions. The number of cells in the count value is given by
the #size-cells property of the root node.
Signed-off-by: Chandru Siddalingappa <chandru@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It was introduced by "vsprintf: add support for '%pS' and '%pF' pointer
formats" in commit 0fe1ef24f7. However,
the current way its coded doesn't work on parisc64. For two reasons: 1)
parisc isn't in the #ifdef and 2) parisc has a different format for
function descriptors
Make dereference_function_descriptor() more accommodating by allowing
architecture overrides. I put the three overrides (for parisc64, ppc64
and ia64) in arch/kernel/module.c because that's where the kernel
internal linker which knows how to deal with function descriptors sits.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Right now, there is no notifier that is called on a new cpu, before the new
cpu begins processing interrupts/softirqs.
Various kernel function would need that notification, e.g. kvm works around
by calling smp_call_function_single(), rcu polls cpu_online_map.
The patch adds a CPU_STARTING notification. It also adds a helper function
that sends the message to all cpu_chain handlers.
Tested on x86-64.
All other archs are untested. Especially on sparc, I'm not sure if I got
it right.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch also includes the required removal of (unused) inclusion of
<asm/a.out.h> <linux/a.out.h>'s in the arch/ code for these
architectures.
[dwmw2: updated for 2.6.27-rc]
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The calculation to get TI_CPU based off of SPRG3 was just plain wrong,
meaning that we were getting garbage for the CPU number on 6xx/G3/G4
based SMP boxes in this code.
Just offset off the stack pointer (to get to thread_info) like all the
other references to TI_CPU do.
This was pointed out by Chen Gong <G.Chen@freescale.com>
[paulus@samba.org - use rlwinm r12,r11,... instead of
rlwinm r12,r1,...; tophys()]
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This bug is causing random crashes
(http://bugzilla.kernel.org/show_bug.cgi?id=11414).
-fno-omit-frame-pointer is only needed on powerpc when -pg is also
supplied, and there is a gcc bug that causes incorrect code generation
on 32-bit powerpc when -fno-omit-frame-pointer is used---it uses stack
locations below the stack pointer, which is not allowed by the ABI
because those locations can and sometimes do get corrupted by an
interrupt.
This ensures that CONFIG_FRAME_POINTER is only selected by ftrace.
When CONFIG_FTRACE is enabled we also pass -mno-sched-epilog to work
around the gcc codegen bug.
Patch based on work by:
Andreas Schwab <schwab@suse.de>
Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This makes core_kernel_text() (and therefore kernel_text_address())
return the correct result. Currently all the __devinit routines (at
least) will not be considered to be kernel text.
This is just a quick fix for 2.6.27 - hopefully we will be able to fix
this better in 2.6.28.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes an uninitialised variable in the VSX alignment code. It can
cause warnings from GCC (noticed with gcc-4.1.1). Gcc is actually
correct in this instance, and this bug could cause the alignment
interrupt handler to send a SIGSEGV to the process on a legitimate
access.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The file arch/powerpc/kernel/sysfs.c is currently only compiled for
64-bit kernels. It contain code to register CPU sysdevs in sysfs and
add various properties such as cache topology and raw access by root
to performance monitor counters (PMCs). A lot of that can be re-used
as is on 32-bits.
This makes the file be built for both, with appropriate ifdef'ing
for the few bits that are really 64-bit specific, and adds some
support for the raw PMCs for 75x and 74xx processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When removing a directory, the sysfs core takes care of removing files
in the directory (see sysfs_remove_dir()). So when we are about to
delete a kobject (and thus cause its sysfs directory to be removed),
we don't have to explicitly remove the files attached to it, although
it's harmless to do so.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
__FUNCTION__ is gcc-specific, use __func__
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
[akpm@linux-foundation.org: exclude prom_init.c]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There is a small passage of code in ret_from_except_lite which is
only required on iSeries. For a multi-platform kernel on non-iSeries
machines this means we end up executing ~15 nops in ret_from_except_lite.
It would be nicer if non-iSeries could skip the code entirely, and on
iSeries we can jump out of line to execute the code.
I have no performance numbers to justify this, other than the assertion
that executing 15 nops takes longer than executing 0.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When CMO is enabled and booted on a non CMO system and the VIO
device's probe function fails, an oops can result since
vio_cmo_bus_remove is called when it should not. This fixes it by
avoiding the vio_cmo_bus_remove call on platforms that don't implement
CMO.
cpu 0x0: Vector: 300 (Data Access) at [c00000000e13b3d0]
pc: c000000000020d34: .vio_cmo_bus_remove+0xc0/0x1f4
lr: c000000000020ca4: .vio_cmo_bus_remove+0x30/0x1f4
sp: c00000000e13b650
msr: 8000000000009032
dar: 0
dsisr: 40000000
current = 0xc00000000e0566c0
paca = 0xc0000000006f9b80
pid = 2428, comm = modprobe
enter ? for help
[c00000000e13b6e0] c000000000021d94 .vio_bus_probe+0x2f8/0x33c
[c00000000e13b7a0] c00000000029fc88 .driver_probe_device+0x13c/0x200
[c00000000e13b830] c00000000029fdac .__driver_attach+0x60/0xa4
[c00000000e13b8c0] c00000000029f050 .bus_for_each_dev+0x80/0xd8
[c00000000e13b980] c00000000029f9ec .driver_attach+0x28/0x40
[c00000000e13ba00] c00000000029f630 .bus_add_driver+0xd4/0x284
[c00000000e13baa0] c0000000002a01bc .driver_register+0xc4/0x198
[c00000000e13bb50] c00000000002168c .vio_register_driver+0x40/0x5c
[c00000000e13bbe0] d0000000003b3f1c .ibmvfc_module_init+0x70/0x109c [ibmvfc]
[c00000000e13bc70] c0000000000acf08 .sys_init_module+0x184c/0x1a10
[c00000000e13be30] c000000000008748 syscall_exit+0x0/0x40
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Recent of_platform changes made of_bus_type_init() overwrite the bus
type's .dev_attrs list, meaning that the "name" attribute that ibmebus
devices previously had is no longer present. This is a user-visible
regression which breaks the userspace eHCA support, since the eHCA
userspace driver relies on the name attribute to check for valid
adapters.
This fixes it by providing the "name" attribute in the generic OF
device code instead. Tested on POWER.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A change to __ioremap() broke reading /dev/oldmem because we're no
longer able to ioremap pfn 0 (d177c207, "[PATCH] powerpc: IOMMU: don't
ioremap null addresses").
We actually don't need to ioremap for anything that's part of the linear
mapping, so just read it directly.
Also make sure we're only reading one page or less at a time.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use the generic compat_sys_old_readdir instead of the powerpc one which
is almost the same except for the almost complete lack of error
handling.
Note that we can't just use SYSCALL() in systbl.h because the native
syscall is named old_readdir, not sys_old_readdir.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit 163f6876f5 missed one, resulting in
the following compile error:
AS arch/powerpc/kernel/misc_32.o
arch/powerpc/kernel/misc_32.S: Assembler messages:
arch/powerpc/kernel/misc_32.S:902: Error: unsupported relocation against KEXEC_CONTROL_CODE_SIZE
make[2]: *** [arch/powerpc/kernel/misc_32.o] Error 1
make[1]: *** [arch/powerpc/kernel] Error 2
make: *** [vmlinux] Error 2
I grepped arch/ and found no further instances.
Signed-off-by: Paul Collins <paul@ondioline.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Doing some various "make randconfig", I came across an error when
CONFIG_BUG was not set:
arch/powerpc/kernel/module.c: In function 'module_find_bug':
arch/powerpc/kernel/module.c:111: error: increment of pointer to unknown structure
arch/powerpc/kernel/module.c:111: error: arithmetic on pointer to an incomplete type
arch/powerpc/kernel/module.c:112: error: dereferencing pointer to incomplete type
Looking further into this, I found that module_find_bug, defined in
powerpc arch code, is not called anywhere, so this just removes it.
There is a static module_find_bug in lib/bug.c but that is a separate issue.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a field in lparcfg output to indicate whether the kernel is
running on a dedicated or shared memory lpar. Added fields to show
the paging space pool IDs and the CMO page size.
Submitted-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The intent of "flush_tlbs" is to invalidate all TLB entries by doing a
TLB invalidate instruction for all pages in the address range 0 to
0x00400000. A loop counter is set up at the high value and
decremented by page size. However, the loop is only done once as the
sense of the conditional branch at the loop end does not match the
setup/decrement. This fixes it to do the whole range by correcting
the branch condition.
Signed-off-by: Rocky Craig <rocky.craig@hp.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control
page is used for not only code on some platform. For example in kexec
jump, it is used for data and stack too.
[akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we have an ISA memory hole (ie, a PCI window that allows us to
generate PCI memory cycles at low PCI address) mixed with other
resources using a different CPU <=> PCI mapping, we must not keep
the ISA hole in the bridge resource list.
If we do, things might start trying to allocate device resources
in there and will get the PCI addresses wrong.
This fixes it by arranging to remove the ISA memory hole resource in
this case. This fixes various cases of PCMCIA breakage on PowerBooks
using the MPC106 "grackle" bridge.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The kernel copy of the rtas args struct contains the return
value(s) for the specified rtas call. These are copied back
to user space with the assumption that every value has been
set by the rtas call, which turns out to be not always true.
Thus userspace can see random values and think the call failed
when in fact it succeeded, but for some reason didn't set one
of the return values.
This fixes the problem by zeroing out the return value fields
of the rtas args struct before processing the rtas call.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that arch/ppc is gone and CONFIG_PPC_MERGE is always set, remove
the dead code associated with !CONFIG_PPC_MERGE from arch/powerpc
and include/asm-powerpc.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In PTRACE_GET/SETVSRREGS, we should be using the thread we are
ptracing rather than current.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix cut-and-paste error in the size setting for ptrace buffers for VSX.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix bug where PTRACE_GET/SETVSRREGS are not connected for 32 bit processes.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The code to handle writes to /proc/ppc64/lparcfg incorrectly
assumes that the return code from the helper routines to update
processor or memory entitlement return a hcall return value. It
then assumes any non-hcall return value is bad and sets the return
code for the write to be -EIO.
The update_[mp]pp routines can return values other than a hcall
return value. This patch removes the automatic setting of any
return code that is not an hcall return value from these routines
to -EIO.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This removes the non-working code in legacy_serial that tried to handle
the powermac SCC ports, and instead add a (now working) function to the
powermac platform code to find the default serial console if any.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Collect cache information from the OF device tree and display it in
the cpu hierarchy in sysfs. This is intended to be compatible at the
userspace level with x86's implementation[1], hence some of the funny
attribute names. The arrangement of cache info is not immediately
intuitive, but (again) it's for compatibility's sake.
The cache attributes exposed are:
type (Data, Instruction, or Unified)
level (1, 2, 3...)
size
coherency_line_size
number_of_sets
ways_of_associativity
All of these can be derived on platforms that follow the OF PowerPC
Processor binding. The code "publishes" only those attributes for
which it is able to determine values; attributes for values which
cannot be determined are not created at all.
[1] arch/x86/kernel/cpu/intel_cacheinfo.c
BenH: Turned some printk's into pr_debug, added better NULL checking
in a couple of places.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Existing Open Firmware practice is to report each processor core as a
separate node in the device tree. Report the value of the "reg" OF
property corresponding to a logical CPU's device node as the core_id
attribute in /sys/devices/system/cpu/cpu*/topology/core_id.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Implement the notion of "core siblings" for powerpc. This makes
/sys/devices/system/cpu/cpu*/topology/core_siblings present sensible
values, indicating online CPUs which share an L2 cache.
BenH: Made cpu_to_l2cache() use of_find_node_by_phandle() instead
of IBM-specific open coded search
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arch/powerpc/kernel/vio.c:533: error: too few arguments to function 'dma_mapping_error'
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds TIF_NOTIFY_RESUME support for powerpc. When set,
we call tracehook_notify_resume() on the way to user mode.
This overloads do_signal() to do the work, but changes its
arguments to it has the TIF_* bits handy in a register and
drops the useless first argument that was always zero.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This changes powerpc syscall tracing to use the new tracehook.h entry
points. There is no change, only cleanup.
In addition, the assembly changes allow do_syscall_trace_enter() to
abort the syscall without losing the information about the original
r0 value.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This makes the powerpc signal handling code call tracehook_signal_handler()
after a handler is set up. This means that using PTRACE_SINGLESTEP to
enter a signal handler will report to ptrace on the first instruction of
the handler, instead of the second. This is consistent with what x86 and
other machines do, and what users and debuggers want.
BenH: Fixed up the test for the trap value.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Rather doing one initialization pass over all the per-cpu
cpu_sibling_maps at boot, update the maps at cpu online/offline time.
This is a behavior change -- the thread_siblings attribute now
reflects only online siblings, whereas it would display offline
siblings before. The new behavior matches that of x86, and is
arguably more useful.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It is called only in cpu online paths.
(caught by CONFIG_DEBUG_SECTION_MISMATCH=y)
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This piece of code is broken for >2 threads, and possibly in some
other subtle ways (such as comparing a value obtained from an
"ibm,ppc-interrupt-server#s" property to a value obtained from a
"reg" property) and doesn't seem to have any useful purpose in the
first place other than a dubious warning in case NR_CPUS is too
small, which probably isn't the right place to do so.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arch/powerpc/kernel/vio.c:1034: warning: function declaration isnât a prototype
arch/powerpc/kernel/vio.c:1035: warning: function declaration isnât a prototype
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* CONFIG_BOOKE is selected by CONFIG_44x so we dont need both
* Fixed a few comments
* Go back to only using DBCR0_IDM to determine if we are using
debug resources.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Removed duplicated include file <linux/module.h> in
arch/powerpc/kernel/stacktrace.c.
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.
Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.c
This is flag day, yes.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch provides an enhancement to kexec/kdump. It implements the
following features:
- Backup/restore memory used by the original kernel before/after
kexec.
- Save/restore CPU state before/after kexec.
The features of this patch can be used as a general method to call program in
physical mode (paging turning off). This can be used to call BIOS code under
Linux.
kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:
source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10
Usage example of calling some physical mode code and return:
1. Compile and install patched kernel with following options selected:
CONFIG_X86_32=y
CONFIG_KEXEC=y
CONFIG_PM=y
CONFIG_KEXEC_JUMP=y
2. Build patched kexec-tool or download the pre-built one.
3. Build some physical mode executable named such as "phy_mode"
4. Boot kernel compiled in step 1.
5. Load physical mode executable with /sbin/kexec. The shell command
line can be as follow:
/sbin/kexec --load-preserve-context --args-none phy_mode
6. Call physical mode executable with following shell command line:
/sbin/kexec -e
Implementation point:
To support jumping without reserving memory. One shadow backup page (source
page) is allocated for each page used by kexeced code image (destination
page). When do kexec_load, the image of kexeced code is loaded into source
pages, and before executing, the destination pages and the source pages are
swapped, so the contents of destination pages are backupped. Before jumping
to the kexeced code image and after jumping back to the original kernel, the
destination pages and the source pages are swapped too.
C ABI (calling convention) is used as communication protocol between
kernel and called code.
A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.
Now, only the i386 architecture is supported.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 9115d13453 ("powerpc: Enable
AT_BASE_PLATFORM aux vector") broke boot on 32-bit powerpc systems; we
have to use PTRRELOC to initialize powerpc_base_platform this early in
boot.
Bug reported by Jon Smirl.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (34 commits)
powerpc: Wireup new syscalls
Move update_mmu_cache() declaration from tlbflush.h to pgtable.h
powerpc/pseries: Remove kmalloc call in handling writes to lparcfg
powerpc/pseries: Update arch vector to indicate support for CMO
ibmvfc: Add support for collaborative memory overcommit
ibmvscsi: driver enablement for CMO
ibmveth: enable driver for CMO
ibmveth: Automatically enable larger rx buffer pools for larger mtu
powerpc/pseries: Verify CMO memory entitlement updates with virtual I/O
powerpc/pseries: vio bus support for CMO
powerpc/pseries: iommu enablement for CMO
powerpc/pseries: Add CMO paging statistics
powerpc/pseries: Add collaborative memory manager
powerpc/pseries: Utilities to set firmware page state
powerpc/pseries: Enable CMO feature during platform setup
powerpc/pseries: Split retrieval of processor entitlement data into a helper routine
powerpc/pseries: Add memory entitlement capabilities to /proc/ppc64/lparcfg
powerpc/pseries: Split processor entitlement retrieval and gathering to helper routines
powerpc/pseries: Remove extraneous error reporting for hcall failures in lparcfg
powerpc: Fix compile error with binutils 2.15
...
Fixed up conflict in arch/powerpc/platforms/52xx/Kconfig manually.
Currently list of kretprobe instances are stored in kretprobe object (as
used_instances,free_instances) and in kretprobe hash table. We have one
global kretprobe lock to serialise the access to these lists. This causes
only one kretprobe handler to execute at a time. Hence affects system
performance, particularly on SMP systems and when return probe is set on
lot of functions (like on all systemcalls).
Solution proposed here gives fine-grain locks that performs better on SMP
system compared to present kretprobe implementation.
Solution:
1) Instead of having one global lock to protect kretprobe instances
present in kretprobe object and kretprobe hash table. We will have
two locks, one lock for protecting kretprobe hash table and another
lock for kretporbe object.
2) We hold lock present in kretprobe object while we modify kretprobe
instance in kretprobe object and we hold per-hash-list lock while
modifying kretprobe instances present in that hash list. To prevent
deadlock, we never grab a per-hash-list lock while holding a kretprobe
lock.
3) We can remove used_instances from struct kretprobe, as we can
track used instances of kretprobe instances using kretprobe hash
table.
Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
with return probes set on all systemcalls looks like this.
cacheline non-cacheline Un-patched kernel
aligned patch aligned patch
===============================================================================
real 9m46.784s 9m54.412s 10m2.450s
user 40m5.715s 40m7.142s 40m4.273s
sys 2m57.754s 2m58.583s 3m17.430s
===========================================================
Time duration for kernel compilation ("make -j 8) on the same system, when
kernel is not probed.
=========================
real 9m26.389s
user 40m8.775s
sys 2m7.283s
=========================
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are only 4 valid name=value pairs for writes to
/proc/ppc64/lparcfg. Current code allocates a buffer to copy
this information in from the user. Since the longest name=value
pair will easily fit into a buffer of 64 characters, simply
put the buffer on the stack instead of allocating the buffer.
Signed-off-by: Nathan Fotenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Update the architecture vector to indicate that Cooperative Memory
Overcommitment is supported if CONFIG_PPC_SMLPAR is set.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Verify memory entitlement updates can be handled by vio.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is a large patch but the normal code path is not affected. For
non-pSeries platforms the code is ifdef'ed out and for non-CMO enabled
pSeries systems this does not affect the normal code path. Devices that
do not perform DMA operations do not need modification with this patch.
The function get_desired_dma was renamed from get_io_entitlement for
clarity.
Overview
Cooperative Memory Overcommitment (CMO) allows for a set of OS partitions
to be run with less RAM than the aggregate needs of the group of
partitions. The firmware will balance memory between the partitions
and page in/out memory as needed. Based on the number and type of IO
adpaters preset each partition is allocated an amount of memory for
DMA operations and this allocation will be guaranteed to the partition;
this is referred to as the partition's 'entitlement'.
Partitions running in a CMO environment can only have virtual IO devices
present. The VIO bus layer will manage the IO entitlement for the system.
Accounting, at a system and per-device level, is tracked in the VIO bus
code and exposed via sysfs. A set of dma_ops functions are added to
the bus to allow for this accounting.
Bus initialization
At initialization, the bus will calculate the minimum needs of the system
based on providing each device present with a standard minimum entitlement
along with a spare allocation for the bus to handle hotplug events.
If the minimum needs can not be met the system boot will be halted.
Device changes
The significant changes for devices while running under CMO are that the
devices must specify how much dedicated IO entitlement they desire and
must also handle DMA mapping errors that can occur due to constrained
IO memory. The virtual IO drivers are modified to silence errors when
DMA mappings fail for CMO and handle these failures gracefully.
Each devices will be guaranteed a minimum entitlement that can always
be mapped. Devices will specify how much entitlement they desire and
the VIO bus will attempt to provide for this. Devices can change their
desired entitlement level at any point in time to address particular needs
(via vio_cmo_set_dev_desired()), not just at device probe time.
VIO bus changes
The system will have a particular entitlement level available from which
it can provide memory to the devices. The bus defines two pools of memory
within this entitlement, the reserved and excess pools. Each device is
provided with it's own entitlement no less than a system defined minimum
entitlement and no greater than what the device has specified as it's
desired entitlement. The entitlement provided to devices comes from the
reserve pool. The reserve pool can also contain a spare allocation as
large as the system defined minimum entitlement which is used for device
hotplug events. Any entitlement not needed to fulfill the needs of a
reserve pool is placed in the excess pool. Each device is guaranteed
that it can map up to it's entitled level; additional mapping are possible
as long as there is unmapped memory in the excess pool.
Bus probe
As the system starts, each device is given an entitlement equal only
to the system defined minimum entitlement. The reserve pool is equal
to the sum of these entitlements, plus a spare allocation. The VIO bus
also tracks the aggregate desired entitlement of all the devices. If the
system desired entitlement is greater than the size of the reserve pool,
when devices unmap IO memory it will be reserved and a balance operation
will be scheduled for some time in the future.
Entitlement balancing
The balance function tries to fairly distribute entitlement between the
devices in the system with the goal of providing each device with it's
desired amount of entitlement. Devices using more than what would be
ideal will have their entitled set-point adjusted; this will effectively
set a goal for lower IO memory usage as future mappings can fail and
deallocations will trigger a balance operation to distribute the newly
unmapped memory. A fair distribution of entitlement can take several
balance operations to achieve. Entitlement changes and device DLPAR
events will alter the state of CMO and will trigger balance operations.
Hotplug events
The VIO bus allows for changes in system entitlement at run-time via
'vio_cmo_entitlement_update()'. When devices are added the hotplug
device event will be preceded by a system entitlement increase and this
is reversed when devices are removed.
The following changes are made that the VIO bus layer for CMO:
* add IO memory accounting per device structure.
* add IO memory entitlement query function to driver structure.
* during vio bus probe, if CMO is enabled, check that driver has
memory entitlement query function defined. Fail if function not defined.
* fail to register driver if io entitlement function not defined.
* create set of dma_ops at vio level for CMO that will track allocations
and return DMA failures once entitlement is reached. Entitlement will
limited by overall system entitlement. Devices will have a reserved
quantity of memory that is guaranteed, the rest can be used as available.
* expose entitlement, current allocation, desired allocation, and the
allocation error counter for devices to the user through sysfs
* provide mechanism for changing a device's desired entitlement at run time
for devices as an exported function and sysfs tunable
* track any DMA failures for entitled IO memory for each vio device.
* check entitlement against available system entitlement on device add
* track entitlement metrics (high water mark, current usage)
* provide function to reset high water mark
* provide minimum and desired entitlement numbers at a bus level
* provide drivers with a minimum guaranteed entitlement
* balance available entitlement between devices to satisfy their needs
* handle system entitlement changes and device hotplug
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To support Cooperative Memory Overcommitment (CMO), we need to check
for failure from some of the tce hcalls.
These changes for the pseries platform affect the powerpc architecture;
patches for the other affected platforms are included in this patch.
pSeries platform IOMMU code changes:
* platform TCE functions must handle H_NOT_ENOUGH_RESOURCES errors and
return an error.
Architecture IOMMU code changes:
* Calls to ppc_md.tce_build need to check return values and return
DMA_MAPPING_ERROR for transient errors.
Architecture changes:
* struct machdep_calls for tce_build*_pSeriesLP functions need to change
to indicate failure.
* all other platforms will need updates to iommu functions to match the new
calling semantics; they will return 0 on success. The other platforms
default configs have been built, but no further testing was performed.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
With the addition of Cooperative Memory Overcommitment (CMO) support
for IBM Power Systems, two fields have been added to the VPA to report
paging statistics. Add support in lparcfg to report them to userspace.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Split the retrieval of processor entitlement data returned in the H_GET_PPP
hcall into its own helper routine.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Update /proc/ppc64/lparcfg to display Cooperative Memory
Overcommitment statistics as reported by the H_GET_MPP hcall. This
also updates the lparcfg interface to allow setting memory entitlement
and weight.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Split the retrieval and setting of processor entitlement and weight into
helper routines. This also removes the printing of the raw values
returned from h_get_ppp, the values are already parsed and printed.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Remove the extraneous error reporting used when a hcall made from lparcfg fails.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
My previous patch to fix compilation with binutils-2.17 causes
a "file truncated" build error from ld with binutils 2.15 (and
possibly older), and a warning with 2.16 and 2.17.
This fixes it.
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Acked-by: Chuck Meade <chuckmeade@mindspring.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch implements support for HW based watchpoint via the
DBSR_DAC (Data Address Compare) facility of the BookE processors.
It does so by interfacing with the existing DABR breakpoint code
and adding the necessary bits and pieces for the new bits to
be properly set or cleared
Signed-off-by: Luis Machado <luisgpm@br.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A struct sysdev_attribute * parameter was added to the show routine by
commit 4a0b2b4dbe "sysdev: Pass the
attribute to the low level sysdev show/store function".
This eliminates a warning:
arch/powerpc/kernel/sysfs.c:538: warning: initialization from incompatible pointer type
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stash the first platform string matched by identify_cpu() in
powerpc_base_platform, and supply that to the ELF loader for the value
of AT_BASE_PLATFORM.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well
nohz: prevent tick stop outside of the idle loop
On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
boundary. For example:
u64 val = PAGE_ALIGN(size);
always returns a value < 4GB even if size is greater than 4GB.
The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
example):
#define PAGE_SHIFT 12
#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE-1))
...
#define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK)
The "~" is performed on a 32-bit value, so everything in "and" with
PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
Using the ALIGN() macro seems to be the right way, because it uses
typeof(addr) for the mask.
Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
include/linux/mm.h.
See also lkml discussion: http://lkml.org/lkml/2008/6/11/237
[akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
[akpm@linux-foundation.org: fix v850]
[akpm@linux-foundation.org: fix powerpc]
[akpm@linux-foundation.org: fix arm]
[akpm@linux-foundation.org: fix mips]
[akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
[akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
[akpm@linux-foundation.org: fix powerpc]
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch removes the old kgdb reminants from ARCH=powerpc and
implements the new style arch specific stub for the common kgdb core
interface.
It is possible to have xmon and kgdb in the same kernel, but you
cannot use both at the same time because there is only one set of
debug hooks.
The arch specific kgdb implementation saves the previous state of the
debug hooks and restores them if you unconfigure the kgdb I/O driver.
Kgdb should have no impact on a kernel that has no kgdb I/O driver
configured.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (49 commits)
powerpc: Fix build bug with binutils < 2.18 and GCC < 4.2
powerpc/eeh: Don't panic when EEH_MAX_FAILS is exceeded
fbdev: Teaches offb about palette on radeon r5xx/r6xx
powerpc/cell/edac: Log a syndrome code in case of correctable error
powerpc/cell: Add DMA_ATTR_WEAK_ORDERING dma attribute and use in Cell IOMMU code
powerpc: Indicate which oprofile counters to use while in compat mode
powerpc/boot: Change spaces to tabs
powerpc: Remove duplicate 6xx option in Kconfig
powerpc: Use PPC_LONG and PPC_LONG_ALIGN in lib/string.S
powerpc: Use PPC_LONG_ALIGN in uaccess.h
powerpc: Add a #define for aligning to a long-sized boundary
powerpc: Fix OF parsing of 64 bits PCI addresses
powerpc: Use WARN_ON(1) instead of __WARN()
powerpc: Fix support for latencytop
powerpc/ps3: Update ps3_defconfig
powerpc/ps3: Add a sub-match id to ps3_system_bus
powerpc: Add a 6xx defconfig
powerpc/dma: Use the struct dma_attrs in iommu code
powerpc/cell: Add support for power button of future IBM cell blades
powerpc/cell: Cleanup sysreset_hack for IBM cell blades
...
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.
I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.
I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.
Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
binutils < 2.18 has a bug that makes it misbehave when taking an
ELF file with all segments at load address 0 as input. This
happens when running "strip" on vmlinux, because of the AT() magic
in this linker script. People using GCC >= 4.2 won't run into
this problem, because the "build-id" support will put some data
into the "notes" segment (at a non-zero load address).
To work around this, we force some data into both the "dummy"
segment and the kernel segment, so the dummy segment will get a
non-zero load address. It's not enough to always create the
"notes" segment, since if nothing gets assigned to it, its load
address will be zero.
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Tested-By: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
While running on a system with new hardware and a kernel where the
cpu_specs[] table does not recognize the new hardware, the identify_cpu()
routine will select the default case as it searches through cpu_specs[]
in an attempt to match the real PVR. Once the default case is selected,
non of the oprofile counters and/or fields have been set up or defined.
When identify_cpu() is called once more with the logical PVR, some of
the cpu specific fields are replaced with the exception of the oprofile
related ones. However, in the case where we have actually taken the
default case while searching for the real PVR, we need to tell
oprofile that we are now running in compatibility mode so it can pick up
the correct counters. We do this by setting the oprofile_cpu_type field
to be that taken from the cpu_specs[] for the cpu we are now emulating.
This change will detect that we are now altering the real PVR and determine
if we also need to update the oprofile_cpu_type field.
Signed-off-by: Torez Smith <lnxtorez@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The OF parsing code for PCI addresses isn't always treating properly
the address space indication 0b11 (ie. 0x3) as meaning 64 bits
memory space.
This means that it fails to parse addresses for PCI BARs that have
this encoding set by the firmware, which happens on some SLOF
versions and breaks offb palette handling on Powerstation.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
We need to pass the kernel stack pointer instead of the user space
stack pointer in save_stack_trace_tsk().
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Update iommu_alloc() to take the struct dma_attrs and pass them on to
tce_build(). This change propagates down to the tce_build functions of
all the platforms.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Jack Ren and Eric Miao tracked down the following long standing
problem in the NOHZ code:
scheduler switch to idle task
enable interrupts
Window starts here
----> interrupt happens (does not set NEED_RESCHED)
irq_exit() stops the tick
----> interrupt happens (does set NEED_RESCHED)
return from schedule()
cpu_idle(): preempt_disable();
Window ends here
The interrupts can happen at any point inside the race window. The
first interrupt stops the tick, the second one causes the scheduler to
rerun and switch away from idle again and we end up with the tick
disabled.
The fact that it needs two interrupts where the first one does not set
NEED_RESCHED and the second one does made the bug obscure and extremly
hard to reproduce and analyse. Kudos to Jack and Eric.
Solution: Limit the NOHZ functionality to the idle loop to make sure
that we can not run into such a situation ever again.
cpu_idle()
{
preempt_disable();
while(1) {
tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
are in the idle loop
while (!need_resched())
halt();
tick_nohz_restart_sched_tick(); <- disables NOHZ mode
preempt_enable_no_resched();
schedule();
preempt_disable();
}
}
In hindsight we should have done this forever, but ...
/me grabs a large brown paperbag.
Debugged-by: Jack Ren <jack.ren@marvell.com>,
Debugged-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This converts the FSL Book-E PTE access and TLB miss handling to match
with the recent changes to 44x that introduce support for non-atomic PTE
operations in pgtable-ppc32.h and removes write back to the PTE from
the TLB miss handlers. In addition, the DSI interrupt code no longer
tries to fixup write permission, this is left to generic code, and
_PAGE_HWWRITE is gone.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Push the sync below the secondary smp init hold loop and comment its purpose.
This should speed up boot by reducing global traffic during the single-threaded
portion of boot.
Signed-off-by: Sonny Rao <sonnyrao@us.ibm.com>
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
VSX loads and stores will take an alignment exception when the address
is not on a 4 byte boundary.
This add support for these alignment exceptions and will emulate the
requested load or store.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
giveup_vsx didn't save the FPU and VMX regsiters. Change it to be
like giveup_fpr/altivec which save these registers.
Also update call sites where FPU and VMX are already saved to use the
original giveup_vsx (renamed to __giveup_vsx).
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Implement save_stack_trace_tsk on powerpc, so that we can run with
latencytop.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Background from Maynard Johnson:
As of POWER6, a set of 32 common events is defined that must be
supported on all future POWER processors. The main impetus for this
compat set is the need to support partition migration, especially from
processor P(n) to processor P(n+1), where performance software that's
running in the new partition may not be knowledgeable about processor
P(n+1). If a performance tool determines it does not support the
physical processor, but is told (via the
PPC_FEATURE_PSERIES_PERFMON_COMPAT bit) that the processor supports
the notion of the PMU compat set, then the performance tool can
surface just those events to the user of the tool.
PPC_FEATURE_PSERIES_PERFMON_COMPAT indicates that the PMU supports at
least this basic subset of events which is compatible across POWER
processor lines.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Today's linux-next build (powerpc allmodconfig) failed like this:
ERROR: ".save_stack_trace" [tests/backtracetest.ko] undefined!
But save_stack_trace is exported in arch/powerpc/kernel/stacktrace.c
I couldn't figure it out until I noticed these earlier warnings:
arch/powerpc/kernel/stacktrace.c:47: warning: data definition has no type or storage class
arch/powerpc/kernel/stacktrace.c:47: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/powerpc/kernel/stacktrace.c:47: warning: parameter names (without types) in function declaration
I applied the patch below.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: <linuxppc-dev@ozlabs.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For some reason long ago I decided that we should zero out the time base
when we calibrate the decrementer. The problem is that this can be
harmful in SMP systems where the firmware has already synchronized the
time bases on the various cores.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This is some preliminary work to improve TLB management on SW loaded
TLB powerpc platforms. This introduce support for non-atomic PTE
operations in pgtable-ppc32.h and removes write back to the PTE from
the TLB miss handlers. In addition, the DSI interrupt code no longer
tries to fixup write permission, this is left to generic code, and
_PAGE_HWWRITE is gone.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
regs is not used in emulate_fp_pair so remove it.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When the ucontext changed to add the VSX context, this broke backwards
compatibly on swapcontext. swapcontext only compares the ucontext size
passed in from the user to the new kernel ucontext size.
This adds a check against the old ucontext size (with VMX but without
VSX). It also adds some sanity check for ucontexts without VSX, but
where VSX is used according the MSR. Fixes for both 32 and 64bit
processes on 64bit kernels
Kudos to Paulus for noticing.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Choose a more meaningful name for better System.map readability and
autopsy value etc.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Allow an application to enable Strong Access Ordering on specific pages of
memory on Power 7 hardware. Currently, power has a weaker memory model than
x86. Implementing a stronger memory model allows an emulator to more
efficiently translate x86 code into power code, resulting in faster code
execution.
On Power 7 hardware, storing 0b1110 in the WIMG bits of the hpte enables
strong access ordering mode for the memory page. This patchset allows a
user to specify which pages are thus enabled by passing a new protection
bit through mmap() and mprotect(). I have defined PROT_SAO to be 0x10.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This changes the oops and backtrace code to use the new %pS
printk extension to print out symbols rather than manually
calling print_symbol.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move device_to_mask() to dma-mapping.h because we need to use it from
outside dma_64.c in a later patch.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Update powerpc to use the new dma_*map*_attrs() interfaces. In doing so
update struct dma_mapping_ops to accept a struct dma_attrs and propagate
these changes through to all users of the code (generic IOMMU and the
64bit DMA code, and the iseries and ps3 platform code).
The old dma_*map_*() interfaces are reimplemented as calls to the
corresponding new interfaces.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Make iommu_map_sg take a struct iommu_table. It did so before commit
740c3ce667 (iommu sg merging: ppc: make
iommu respect the segment size limits).
This stops the function looking in the archdata.dma_data for the iommu
table because in the future it will be called with a device that has
no table there.
This also has the nice side effect of making iommu_map_sg() match the
other map functions.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There is dma_mask in of_device upon of_platform_device_create()
but we don't actually set coherent_dma_mask. This may cause weird
behavior of USB subsystem using of_device USB host drivers.
Signed-off-by: Vitaly Bordug <vitb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A recent patch to legacy_serial.c factored out some code by
using the of_match_node() facility to match a node against
an array of possible matches. However, the patch didn't properly
terminate the array causing potential crashes in cases where no
match is found. In addition, the name of the array was poorly
chosen for a static symbol making debugging harder.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Updates the cputable to include the 440 processor found in the
Xilinx Virtex5 FXT FPGA.
Signed-off-by: John Linn <john.linn@xilinx.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Today's linux-next build (powerpc ppc64_defconfig) failed like this:
arch/powerpc/mm/tlb_64.c: In function 'pgtable_free_now':
arch/powerpc/mm/tlb_64.c:66: error: too many arguments to function 'smp_call_function'
arch/powerpc/kernel/machine_kexec_64.c: In function 'kexec_prepare_cpus':
arch/powerpc/kernel/machine_kexec_64.c:175: error: too many arguments to function 'smp_call_function'
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <linuxppc-dev@ozlabs.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andrew Morton reported this against linux-next:
ERROR: ".save_stack_trace" [tests/backtracetest.ko] undefined!
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since Roland's ptrace cleanup starting with commit
f65255e8d5 ("[POWERPC] Use user_regset
accessors for FP regs"), the dump_task_* functions are no longer being
used.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This merges and cleans up some of the ugly copy/to from user code
which is required for the new fpr and vsx layout in the thread_struct.
Also fixes some hard coded buffer sizes and removes a redundant
fpr_flush_to_thread.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
To allow for a single kernel image on e500 v1/v2/mc we need to fixup lwsync
at runtime. On e500v1/v2 lwsync causes an illop so we need to patch up
the code. We default to 'sync' since that is always safe and if the cpu
is capable we will replace 'sync' with 'lwsync'.
We introduce CPU_FTR_LWSYNC as a way to determine at runtime if this is
needed. This flag could be moved elsewhere since we dont really use it
for the normal CPU_FTR purpose.
Finally we only store the relative offset in the fixup section to keep it
as small as possible rather than using a full fixup_entry.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The legacy serial driver does not work with an 8250 type UART that is
described in the device tree with the reg-offset and reg-shift
properties. This change makes legacy_serial ignore these devices.
Signed-off-by: John Linn <john.linn@xilinx.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This correctly hooks the VSX dump into Roland McGrath core file
infrastructure. It adds the VSX dump information as an additional elf
note in the core file (after talking more to the tool chain/gdb guys).
This also ensures the formats are consistent between signals, ptrace
and core files.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix compile error when CONFIG_VSX is enabled.
arch/powerpc/kernel/signal_64.c: In function 'restore_sigcontext':
arch/powerpc/kernel/signal_64.c:241: error: 'i' undeclared (first use in this function)
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Gcc 4.3 produced this warning:
arch/powerpc/kernel/signal_64.c: In function 'restore_sigcontext':
arch/powerpc/kernel/signal_64.c:161: warning: array subscript is above array bounds
This is caused by us copying to aliases of elements of the pt_regs
structure. Make those explicit.
This adds one extra __get_user and unrolls a loop.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available. This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.
Mixing FP, VMX and VSX code will get constant architected state.
The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers. Backward
compatibility is maintained.
The ptrace interface is also extended to allow access to VSR 0-31 full
registers.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds the macros for the VSX load/store instruction as most
binutils are not going to support this for a while.
Also add VSX register save/restore macros and vsr[0-63] register definitions.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a VSX CPU feature. Also add code to detect if VSX is available
from the device tree.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The layout of the new VSR registers and how they overlap on top of the
legacy FPR and VR registers is:
VSR doubleword 0 VSR doubleword 1
----------------------------------------------------------------
VSR[0] | FPR[0] | |
----------------------------------------------------------------
VSR[1] | FPR[1] | |
----------------------------------------------------------------
| ... | |
| ... | |
----------------------------------------------------------------
VSR[30] | FPR[30] | |
----------------------------------------------------------------
VSR[31] | FPR[31] | |
----------------------------------------------------------------
VSR[32] | VR[0] |
----------------------------------------------------------------
VSR[33] | VR[1] |
----------------------------------------------------------------
| ... |
| ... |
----------------------------------------------------------------
VSR[62] | VR[30] |
----------------------------------------------------------------
VSR[63] | VR[31] |
----------------------------------------------------------------
VSX has 64 128bit registers. The first 32 regs overlap with the FP
registers and hence extend them with and additional 64 bits. The
second 32 regs overlap with the VMX registers.
This commit introduces the thread_struct changes required to reflect
this register layout. Ptrace and signals code is updated so that the
floating point registers are correctly accessed from the thread_struct
when CONFIG_VSX is enabled.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Make load_up_fpu and load_up_altivec callable so they can be reused by
the VSX code.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Move the altivec_unavailable code, to make room at 0xf40 where the
vsx_unavailable exception will be.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers. Update all code to use these new macros.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If we set the SPE MSR bit in save_user_regs we can blow away the VEC
bit. This doesn't matter in reality as they are in fact the same bit
but looks bad.
Also, when we add VSX in a later patch, we need to be able to set two
separate MSR bits here.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use an alternative feature section in _switch. There are three cases
handled here, either we don't have an SLB, in which case we jump over the
entire code section, or if we do we either do or don't have 1TB segments.
Boot tested on Power3, Power5 and Power5+.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The current feature section logic only supports nop'ing out code, this means
if you want to choose at runtime between instruction sequences, one or both
cases will have to execute the nop'ed out contents of the other section, eg:
BEGIN_FTR_SECTION
or 1,1,1
END_FTR_SECTION_IFSET(FOO)
BEGIN_FTR_SECTION
or 2,2,2
END_FTR_SECTION_IFCLR(FOO)
and the resulting code will be either,
or 1,1,1
nop
or,
nop
or 2,2,2
For small code segments this is fine, but for larger code blocks and in
performance criticial code segments, it would be nice to avoid the nops.
This commit starts to implement logic to allow the following:
BEGIN_FTR_SECTION
or 1,1,1
FTR_SECTION_ELSE
or 2,2,2
ALT_FTR_SECTION_END_IFSET(FOO)
and the resulting code will be:
or 1,1,1
or,
or 2,2,2
We achieve this by extending the existing FTR macros. The current feature
section semantic just becomes a special case, ie. if the else case is empty
we nop out the default case.
The key limitation is that the size of the else case must be less than or
equal to the size of the default case. If the else case is smaller the
remainder of the section is nop'ed.
We let the linker put the else case code in with the rest of the text,
so that relative branches from the else case are more likley to link,
this has the disadvantage that we can't free the unused else cases.
This commit introduces the required macro and linker script changes, but
does not enable the patching of the alternative sections.
We also need to update two hand-made section entries in reg.h and timex.h
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The logic to patch CPU feature sections lives in cputable.c, but these days
it's used for CPU features as well as firmware features. Move it into
it's own file for neatness and as preparation for some additions.
While we're moving the code, we pull the loop body logic into a separate
routine, and remove a comment which doesn't apply anymore.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A bunch of code has hard-coded the value for a "nop" instruction, it
would be nice to have a #define for it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently create_branch() creates a branch instruction for you, and
patches it into the call site. In some circumstances it would be nice
to be able to create the instruction and patch it later, and also some
code might want to check for errors in the branch creation before
doing the patching. A future commit will change create_branch() to
check for errors.
For callers that don't care, replace create_branch() with
patch_branch(), which just creates the branch and patches it directly.
While we're touching all the callers, change to using unsigned int *,
as this seems to match usage better. That allows (and requires) us to
remove the volatile in the definition of vector in powermac/smp.c and
mpc86xx_smp.c, that's correct because now that we're passing vector as
an unsigned int * the compiler knows that it's value might change
across the patch_branch() call.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Jon Loeliger <jdl@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We currently have a few routines for patching code in asm/system.h, because
they didn't fit anywhere else. I'd like to clean them up a little and add
some more, so first move them into a dedicated C file - they don't need to
be inlined.
While we're moving the code, drop create_function_call(), it's intended
caller never got merged and will be replaced in future with something
different.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Refactor common code between ppc32 and ppc64 module handling into a
shared filed.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add the bits to the architecture-vec so that ibm,client-architecture
lets the firmware know we support the 2.06 architecture.
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add an entry for Power7 architected mode and add "(raw)" to Power7 raw
mode to distinguish it more clearly.
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a cputable entry for the POWER7 processor.
Also tell firmware that we know about POWER7.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There are now two potential callers of machine_crash_shutdown,
so increase the limit accordingly.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This converts ppc to use the new helpers for smp_call_function() and
friends, and adds support for smp_call_function_single().
ppc loses the timeout functionality of smp_call_function_mask() with
this change, as the generic code does not provide that.
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch is based on work done by Madhvesh. R. Sulibhavi back in
March 2007.
We refactor some of the single step handling since it differs between
"classic" and "booke" powerpc cores.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
* Mark __flush_icache_range as a function that can't be probed since its
used by the kprobe code.
* Fix an issue with single stepping and async exceptions. We need to
ensure that we dont get an async exception (external, decrementer, etc)
while we are attempting to single step the probe point.
Added a check to ensure we only handle a single step if its really
intended for the instruction in question.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
If we have an L2CSR register (e500mc) we need to flush the L2 before going
to nap. We use the HW flush mechanism provided in that register.
The code reuses the CPU_FTR_604_PERF_MON bit as it is no longer used by
any code in the kernel. Additionally we didn't reuse the exist L2CR
feature bit as this is intended for the 7xxx L2CR register and L2CSR
is part of the new Freescale "Book-E" registers.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The e500 core enter DOZE/NAP power-saving modes when the core go to
cpu_idle routine.
The power management default running mode is DOZE, If the user
echo 1 > /proc/sys/kernel/powersave-nap
the system will change to NAP running mode.
Signed-off-by: Dave Liu <daveliu@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Record the address of the mcount call-site. Currently all archs except sparc64
record the address of the instruction following the mcount call-site. Some
general cleanups are entailed. Storing mcount addresses in rec->ip enables
looking them up in the kprobe hash table later on to check if they're kprobe'd.
Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: davem@davemloft.net
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
KAMEZAWA Hiroyuki and Oleg Nesterov point out that since the commit
557ed1fa26 ("remove ZERO_PAGE") removed
the ZERO_PAGE from the VM mappings, any users of get_user_pages() will
generally now populate the VM with real empty pages needlessly.
We used to get the ZERO_PAGE when we did the "handle_mm_fault()", but
since fault handling no longer uses ZERO_PAGE for new anonymous pages,
we now need to handle that special case in follow_page() instead.
In particular, the removal of ZERO_PAGE effectively removed the core
file writing optimization where we would skip writing pages that had not
been populated at all, and increased memory pressure a lot by allocating
all those useless newly zeroed pages.
This reinstates the optimization by making the unmapped PTE case the
same as for a non-existent page table, which already did this correctly.
While at it, this also fixes the XIP case for follow_page(), where the
caller could not differentiate between the case of a page that simply
could not be used (because it had no "struct page" associated with it)
and a page that just wasn't mapped.
We do that by simply returning an error pointer for pages that could not
be turned into a "struct page *". The error is arbitrarily picked to be
EFAULT, since that was what get_user_pages() already used for the
equivalent IO-mapped page case.
[ Also removed an impossible test for pte_offset_map_lock() failing:
that's not how that function works ]
Acked-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The new e500mc core from Freescale is based on the e500v2 but with the
following changes:
* Supports only the Enhanced Debug Architecture (DSRR0/1, etc)
* Floating Point
* No SPE
* Supports lwsync
* Doorbell Exceptions
* Hypervisor
* Cache line size is now 64-bytes (e500v1/v2 have a 32-byte cache line)
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
A recent commit added support for the new 440x6 and 464 cores that have the
added WL1, IL1I, IL1D, IL2I, and ILD2 bits for the caching attributes in the
TLBs. The new bits were cleared in the finish_tlb_load function, however a
similar bit of code was missed in the DataStorage interrupt vector.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There are no in-tree uses of the export any more and in linux-next there
is a change that exports it globally which causes warnings:
WARNING: vmlinux: 'console_drivers' exported twice. Previous export was in vmlinux
and in one case (mpc85xx_defconfig) a build error:
kernel/built-in.o: In function `__crc_console_drivers':
(*ABS*+0x1eb0e6f5): multiple definition of `__crc_console_drivers'
So remove the export now. Also, there is no longer any need to include
linux/console.h.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
GCC 4.4.x looks to be adding support for generating out-of-line register
saves/restores based on:
http://gcc.gnu.org/ml/gcc-patches/2008-04/msg01678.html
This breaks the kernel if we enable CONFIG_CC_OPTIMIZE_FOR_SIZE. To fix
this we add the use the save/restore code from gcc and simplified it down
for our needs (integer only).
Additionally, we have to link this code into each module. The other
solution was to add EXPORT_SYMBOL() which meant going through the
trampoline which seemed nonsensical for these out-of-line routines.
Finally, we add some checks to prom_init_check.sh to ignore the
out-of-line save/restore functions.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
debugfs_create_file() returns a non-NULL (non-zero) value in case of
success, not a NULL value.
This fixes this non-critical boot-time debugging error message:
[ 1.316386] calling irq_debugfs_init+0x0/0x50
[ 1.316399] initcall irq_debugfs_init+0x0/0x50 returned -12 after 0 msecs
[ 1.316411] initcall irq_debugfs_init+0x0/0x50 returned with error code -12
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This reverts commit acb0142bf0.
AMCC has indicated that the PPC 460GT does have FPU support. This
revert enables the FPU for those chips again.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
When I changed irq_alloc_host() to take an of_node
(52964f87c6: "Add an optional
device_node pointer to the irq_host"), I botched the reference
counting semantics.
Stephen pointed out that it's irq_alloc_host()'s business if
it needs to take an additional reference to the device_node,
the caller shouldn't need to care.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This eliminates this minor boot-time debugging error message:
[ 1.316451] calling add_pcspkr+0x0/0x84
[ 1.316478] initcall add_pcspkr+0x0/0x84 returned -19 after 0 msecs
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
During the next merge window, pci_name()'s return value will become
const, so use the new dev_set_name() instead to avoid the warning (from
linux-next):
arch/powerpc/kernel/pci_64.c: In function 'of_create_pci_dev':
arch/powerpc/kernel/pci_64.c:193: warning: passing argument 1 of 'sprintf' discards qualifiers from pointer target type
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When building a signal or a ucontext, we can incorrectly set the MSR_VEC
bit of the kernel pt_regs->msr before returning to userspace if the task
-ever- used VMX.
This can lead to funny result if that stack used it in the past, then
"lost" it (ie. it wasn't enabled after a context switch for example)
and then called get_context. It can end up with VMX enabled and the
registers containing values from some other task.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On machines with more than one exception level any system register that
might be modified by the "normal" exception level needs to be saved and
restored on taking a higher level exception. We already are saving
and restoring ESR and DEAR.
For critical level add SRR0/1.
For debug level add CSRR0/1 and SRR0/1.
For machine check level add DSRR0/1, CSRR0/1, and SRR0/1.
On FSL Book-E parts we always save/restore the MAS registers for critical,
debug, and machine check level exceptions. On 44x we always save/restore
the MMUCR.
Additionally, we save and restore the ksp_limit since we have to adjust it
for each exception level.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Paul Mackerras <paulus@samba.org>
* Cleanup the code a bit my allocating an INT_FRAME on our exception
stack there by make references go from GPR11-INT_FRAME_SIZE(r8) to
just GPR11(r8)
* simplify {lvl}_transfer_to_handler code by moving the copying of the
temp registers we use if we come from user space into the PROLOG
* If the exception came from kernel mode copy thread_info flags,
preempt, and task pointer from the process thread_info.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Paul Mackerras <paulus@samba.org>
For the additonal exception levels (critical, debug, machine check) on
40x/book-e we were using "static" allocations of the stack in the
associated head.S.
Move to a runtime allocation to make the code a bit easier to read as
we mimic how we handle IRQ stacks. Its also a bit easier to setup the
stack with a "dummy" thread_info in C code.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Paul Mackerras <paulus@samba.org>
This patch cleans up the ftrace code in PowerPC based on the comments from
Michael Ellerman.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: proski@gnu.org
Cc: a.p.zijlstra@chello.nl
Cc: Pekka Paalanen <pq@iki.fi>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: linuxppc-dev@ozlabs.org
Cc: Soeren Sandmann Pedersen <sandmann@redhat.com>
Cc: paulus@samba.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds full support for ftrace for PowerPC (both 64 and 32 bit).
This includes dynamic tracing and function filtering.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Since commit "85xx: Add support for relocatable kernel (and
booting at non-zero)" (37dd2badcf),
PHYSICAL_START is #defined as kernstart_addr if RELOCATABLE
and FLATMEM is enabled.
PHYSICAL_START is used in prom_init.c and so kernstart_addr
needs to be added to the list of allowed symbols that
prom_init.c can access.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
func_descr_t->entry is already an unsigned long. Mea culpa.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This works around bugs in older binutils' objcopy.
The placement of these sections does not really matter,
but it confused the buggy old BFD libraries.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit 140b932f8c ("Create modalias file
in sysfs for of_platform bus") needs this to avoid breaking the sparc
builds.
Just move the code and add whitespace around some binary operators.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This provides a way to defer processing of an interrupt that wakes the
processor out of sleep mode. On 32-bit platforms that use an
interrupt to wake the processor, we have to have interrupts enabled in
hardware at the point where we go to sleep, otherwise the processor
will never wake up. However, because interrupts are logically
disabled at this point, we don't want to process the interrupt
straight away.
This is handled by setting the _TLF_SLEEPING flag. When we get an
interrupt and _TLF_SLEEPING is set, we firstly clear the MSR_EE
(external interrupt enable) bit in the saved MSR value, and secondly
we then return to the address in the link register, like we do for
_TLF_NAPPING, but without actually handling the interrupt.
Note that this is handled somewhat differently on powerbooks, so this
new code will only be used on non-Apple machines.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Make a few things static in lparcfg.c
Make init and exit routines static in rtas_flash.c
Make things static in rtas_pci.c
Make some functions static in rtas.c
Make fops static in rtas-proc.c
Remove unneeded extern for do_gtod in smp.c
Make clocksource_init() static in time.c
Make last_tick_len and ticklen_to_xs static in time.c
Move the declaration of the pvr per-cpu into smp.h
Make kexec_smp_down() and kexec_stack static in machine_kexec_64.c
Don't return void in arch_teardown_msi_irqs() in msi.c
Move declaration of GregorianDay()into asm/time.h
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We use the low bits of regs->trap as flag bits. We already indicate
critical and machine check level exceptions via this mechanism. Extend it
to indicate debug level exceptions.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Replace TIF_RESTORE_SIGMASK with TLF_RESTORE_SIGMASK and define
our own set_restore_sigmask() function. This saves the costly
SMP-safe set_bit operation, which we do not need for the sigmask
flag since TIF_SIGPENDING always has to be set too.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit 76bc080ef5 ("POWERPC] Make default
cputable entries reflect selected CPU family") added default entries
for the e200 and e500 families, but missed a closing brace on those
entries, as pointed out by David Gibson. This adds the closing braces.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This printk() appears twice in the same function. Only the latter one
in the inval_range: section appears to be legitimate.
Signed-off-by: Nate Case <ncase@xes-inc.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove duplicate #include of <asm/prom.h> in
arch/powerpc/kernel/btext.c.
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves lockdep_init() to before udbg_early_init() as the later
can call things that acquire spinlocks etc... This also makes printk
safer to use earlier.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When debugging early boot problems, it's common to sprinkle printk's
all over the place. However, on 64-bit powerpc, this can lead to
memory corruption if done too early due to the PACA pointer and
lockdep core not being initialized.
This adds some comments to early_setup() that document when it is
safe to do so in order to save time for whoever has to debug that
stuff next.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When doing lockdep, I had two patches to initialize paca->_current
early, one bogus, and one correct. Unfortunately both got merged
as the bad one ended up being part of the main lockdep patch by
mistake. This causes memory corruption at boot. This removes
the offending code.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Changes the cputable so that various CPU families that have an exclusive
CONFIG_ option have a more sensible default entry to use if the specific
processor hasn't been identified.
This makes the kernel more generally useful when booted on an unknown
PVR for things like new 4xx variants.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The new 440x6 core used on AMCC 460EX/GT introduces new storage attibure
fields to the TLB2 word. Those are:
Bit 11 12 13 14 15
WL1 IL1I IL1D IL2I IL2D
With these bits the cache (L1 and L2) can be configured in a more flexible
way, instruction- and data-cache independently now. The "old" I and W bits
are still available and setting these old bits will automically set these
new bits too (for backward compatibilty).
The current code does not clear these fields resulting in disabling the cache
by chance. This patch now makes sure that these new bits are cleared when
the TLB2 word is written.
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This replaces the duplicated arch-specific versions of "sys_pipe()" with
one unified implementation. This removes almost 250 lines of duplicated
code.
It's marked __weak, so that *if* an architecture wants to override the
default implementation it can do so by simply having its own replacement
version, since many architectures use alternate calling conventions for
the 'pipe()' system call for legacy reasons (ie traditional UNIX
implementations often return the two file descriptors in registers)
I still haven't changed the cris version even though Linus says the BKL
isn't needed. The arch maintainer can easily do it if there are really
no obstacles.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Bolt in SLB entry for kernel stack on secondary cpus
[POWERPC] PS3: Update ps3_defconfig
[POWERPC] PS3: Remove unsupported wakeup sources
[POWERPC] PS3: Make ps3_virq_setup and ps3_virq_destroy static
[POWERPC] PS3: Add time include to lpm
[POWERPC] Fix slb.c compile warnings
[POWERPC] Xilinx: Fix compile warnings
[POWERPC] Squash build warning for print of resource_size_t in fsl_soc.c
[RAPIDIO] fix current kernel-doc notation
[POWERPC] 86xx: mpc8610_hpcd: add support for PCI Express x8 slot
Fix a potential issue in mpc52xx uart driver
[POWERPC] mpc5200: Allow for fixed speed MII configurations
[POWERPC] 86xx: Fix the wrong serial1 interrupt for 8610 board
This fixes a regression reported by Kamalesh Bulabel where a POWER4
machine would crash because of an SLB miss at a point where the SLB
miss exception was unrecoverable. This regression is tracked at:
http://bugzilla.kernel.org/show_bug.cgi?id=10082
SLB misses at such points shouldn't happen because the kernel stack is
the only memory accessed other than things in the first segment of the
linear mapping (which is mapped at all times by entry 0 of the SLB).
The context switch code ensures that SLB entry 2 covers the kernel
stack, if it is not already covered by entry 0. None of entries 0
to 2 are ever replaced by the SLB miss handler.
Where this went wrong is that the context switch code assumes it
doesn't have to write to SLB entry 2 if the new kernel stack is in the
same segment as the old kernel stack, since entry 2 should already be
correct. However, when we start up a secondary cpu, it calls
slb_initialize, which doesn't set up entry 2. This is correct for
the boot cpu, where we will be using a stack in the kernel BSS at this
point (i.e. init_thread_union), but not necessarily for secondary
cpus, whose initial stack can be allocated anywhere. This doesn't
cause any immediate problem since the SLB miss handler will just
create an SLB entry somewhere else to cover the initial stack.
In fact it's possible for the cpu to go quite a long time without SLB
entry 2 being valid. Eventually, though, the entry created by the SLB
miss handler will get overwritten by some other entry, and if the next
access to the stack is at an unrecoverable point, we get the crash.
This fixes the problem by making slb_initialize create a suitable
entry for the kernel stack, if we are on a secondary cpu and the stack
isn't covered by SLB entry 0. This requires initializing the
get_paca()->kstack field earlier, so I do that in smp_create_idle
where the current field is initialized. This also abstracts a bit of
the computation that mk_esid_data in slb.c does so that it can be used
in slb_initialize.
Signed-off-by: Paul Mackerras <paulus@samba.org>
As TICK_LENGTH_SHIFT is used for more than just the tick length, the name
isn't quite approriate anymore, so this renames it to NTP_SCALE_SHIFT.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This changes time_freq to a 64bit value and makes it static (the only outside
user had no real need to modify it). Intermediate values were already 64bit,
so the change isn't that big, but it saves a little in shifts by replacing
SHIFT_NSEC with TICK_LENGTH_SHIFT. PPM_SCALE is then used to convert between
user space and kernel space representation.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds a minimalistic braille screen reader support. This is meant to
be used by blind people e.g. on boot failures or when / cannot be mounted
etc and thus the userland screen readers can not work.
[akpm@linux-foundation.org: fix exports]
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Jiri Kosina <jikos@jikos.cz>
Cc: Dmitry Torokhov <dtor@mail.ru>
Acked-by: Alan Cox <alan@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit edd8ce6743 (Use extended crashkernel
command line on ppc64), changed the logic in reserve_crashkernel()
which deals with the crashkernel= command line option.
This introduced a bug in the case when there is no crashkernel= option,
or it is incorrect. We would fall through and calculate the crash_size
based on the existing values in crashk_res. If both start and end are 0,
the default, we calculate the crash_size as 1 byte - which is wrong.
Rework the logic so that we use crashk_res, regardless of whether it's
set by the command line or via the device tree (see prom.c). Then check
if we have an empty range (end == start), and if so make sure to set
both end and start to zero (this is checked in machine_kexec_64.c). Then
we calculate the crash_size once we know we have a non-zero range.
Finally we always want to warn the user if they specify a base != 32MB,
so remove the special case for that in the command line parsing case.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The current_thread_info() macro, used by preempt_count(), assumes the
base address and size of the stack are THREAD_SIZE aligned.
The emergency stack currently isn't either of these things, which
could potentially cause problems anytime we're running on the
emergency stack. That includes when we detect a bad kernel stack
pointer, and also during early_setup_secondary().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[RAPIDIO] Change RapidIO doorbell source and target ID field to 16-bit
[RAPIDIO] Add RapidIO connection info print out and re-training for broken connections
[RAPIDIO] Add serial RapidIO controller support, which includes MPC8548, MPC8641
[RAPIDIO] Add RapidIO node probing into MPC86xx_HPCN board id table
[RAPIDIO] Add RapidIO node into MPC8641HPCN dts file
[RAPIDIO] Auto-probe the RapidIO system size
[RAPIDIO] Add OF-tree support to RapidIO controller driver
[RAPIDIO] Add RapidIO multi mport support
[RAPIDIO] Move include/asm-ppc/rio.h to asm-powerpc
[RAPIDIO] Add RapidIO option to kernel configuration
[RAPIDIO] Change RIO function mpc85xx_ to fsl_
[POWERPC] Provide walk_memory_resource() for powerpc
[POWERPC] Update lmb data structures for hotplug memory add/remove
[POWERPC] Hotplug memory remove notifications for powerpc
[POWERPC] windfarm: Add PowerMac 12,1 support
[POWERPC] Fix building of pmac32 when CONFIG_NVRAM=m
[POWERPC] Add IRQSTACKS support on ppc32
[POWERPC] Use __always_inline for xchg* and cmpxchg*
[POWERPC] Add fast little-endian switch system call
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.
Add correct ->owner to proc_fops to fix reading/module unloading race.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This initializes the RapidIO controller driver using addresses and
interrupt numbers obtained from the firmware device tree, rather than
using hardcoded constants.
Signed-off-by: Zhang Wei <wei.zhang@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This makes it possible to use separate stacks for hard and soft IRQs
on 32-bit powerpc as well as on 64-bit. The code for 32-bit is just
the 32-bit analog of the 64-bit code.
* Added allocation and initialization of the irq stacks. We limit the
stacks to be in lowmem for ppc32.
* Implemented ppc32 versions of call_do_softirq() and call_handle_irq()
to switch the stack pointers
* Reworked how we do stack overflow detection. We now keep around the
limit of the stack in the thread_struct and compare against the limit
to see if we've overflowed. We can now use this on ppc64 if desired.
[ paulus@samba.org: Fixed bug on 6xx where we need to reload r9 with the
thread_info pointer. ]
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds a system call on 64-bit platforms for switching between
little-endian and big-endian modes that is much faster than doing a
prctl call. This system call is handled as a special case right at
the start of the system call entry code, and because it is a special
case, it uses a system call number which is out of the range of
normal system calls, namely 0x1ebe.
Measurements with lmbench on a 4.2GHz POWER6 showed no measurable
change in the speed of normal system calls with this patch.
Switching endianness with this new system call takes around 60ns on a
4.2GHz POWER6, compared with around 300ns to switch endian mode with a
prctl. This can provide a significant performance advantage for
emulators for little-endian architectures that want to switch between
big-endian and little-endian mode frequently, e.g. because they are
generating instructions sequences on the fly and they want to run
those sequences in little-endian mode.
The other thing about this system call is that it doesn't clobber as
many registers as a normal system call. It only clobbers r12.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This functionality is definitely experimental, but is capable of running
unmodified PowerPC 440 Linux kernels as guests on a PowerPC 440 host. (Only
tested with 440EP "Bamboo" guests so far, but with appropriate userspace
support other SoC/board combinations should work.)
See Documentation/powerpc/kvm_440.txt for technical details.
[stephen: build fix]
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The AMCC 460GT doesn't have an FPU so let's not enable support for it.
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This splits cell io-workaround code into spider-pci dependent code and
a generic part, and also moves io-workarounds initialization into
cell_setup_phb.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The udbg console should be safe to call basically at any time after boot.
It does not need any per-cpu resources or for the cpu to be online, as
long as there is a udbg_putc routine hooked up it should work. So mark it
as CON_ANYTIME.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Because the udbg_console has CON_ENABLED set, it's possible that when we
register it with the console code the index won't be set. This leads to
slightly confusing boot messages like:
[ 0.000000] console [udbg-1] enabled
We could remove CON_ENABLED, but we don't want to do that, we always
want the udbg console to be activated, even if the user specified some
other console on the command line.
The simplest fix seems to be just to set the index to 0 by hand. There
is no issue with duplicate udbg consoles, as we guard against registering
multiple times in register_early_udbg_console().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds the required functionality to fill in all pacas at runtime.
With NR_CPUS=1024
text data bss dec hex filename
137 1704032 0 1704169 1a00e9 arch/powerpc/kernel/paca.o :Before
121 1179744 524288 1704153 1a00d9 arch/powerpc/kernel/paca.o :After
Also remove unneeded #includes from arch/powerpc/kernel/paca.c
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* Removed get_msr(), get_srr0(), and get_srr1() - not used anywhere
* Use STACK_FRAME_OVERHEAD instead of magic number
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
As BenH said the other day, it is an "accident" that prom_init.o is
linked with the rest of the kernel. The truth is a little more
subtle, prom_init isn't truly bootloader, it does access kernel data
in a few places.
What we can do is discourage people from adding new code that accesses
data outside of prom_init. And hence this patch; from the script:
# This script checks prom_init.o to see what external symbols it
# is using, if it finds symbols not in the whitelist it returns
# an error. The point of this is to discourage people from
# intentionally or accidentally adding new code to prom_init.c
# which has side effects on other parts of the kernel.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* Removed TI_EXECDOMAIN define as its not used anywhere
* Use STACK_INT_FRAME_SIZE to allow common define of INT_FRAME_SIZE
* Define TI_CPU on both ppc32 & ppc64 (removes an ifdef).
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use (31-THREAD_SHIFT) to get to thread_info from stack pointer. This makes
the code a bit easier to read and more robust if we ever change THREAD_SHIFT.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove the inclusion of asm-offsets.h from stacktrace.c. It isn't
supposed to be included in C code and it causes problems with multiple
definitions of things.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Added support to allow an 85xx kernel to be run from a non-zero physical
address (useful for cooperative asymmetric multiprocessing situations and
kdump). The support can be configured at compile time by setting
CONFIG_PAGE_OFFSET, CONFIG_KERNEL_START, and CONFIG_PHYSICAL_START as
desired.
Alternatively, the kernel build can set CONFIG_RELOCATABLE. Setting this
config option causes the kernel to determine at runtime the physical
addresses of CONFIG_PAGE_OFFSET and CONFIG_KERNEL_START. If
CONFIG_RELOCATABLE is set, then CONFIG_PHYSICAL_START has no meaning.
However, CONFIG_PHYSICAL_START will always be used to set the LOAD program
header physical address field in the resulting ELF image.
Currently we are limited to running at a physical address that is a
multiple of 256M. This is due to how we map TLBs to cover
lowmem. This should be fixed to allow 64M or maybe even 16M alignment
in the future. It is considered an error to try and run a kernel at a
non-aligned physical address.
All the magic for this support is accomplished by proper initialization
of the kernel memory subsystem and use of ARCH_PFN_OFFSET.
The use of ARCH_PFN_OFFSET only affects normal memory and not IO mappings.
ioremap uses map_page and isn't affected by ARCH_PFN_OFFSET.
/dev/mem continues to allow access to any physical address in the system
regardless of how CONFIG_PHYSICAL_START is set.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The powerpc kernel stacks need to be naturally aligned, as they
contain the thread info at the bottom, which is obtained by
clearing the low bits of the stack pointer.
However, when using 64K pages, the stack is smaller than a page,
so we use kmalloc to allocate it, but that doesn't provide the
alignment guarantee we need.
It appeared to work so far... until one enables SLUB debugging
which then returns unaligned pointers. Ooops...
This fixes it by using a slab cache with enforced alignment. It
relies on my previous patch that adds a thread_info_cache_init()
callback.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This reverts commit e4cc58944c, as
requested by Roland McGrath, because compat_ptrace_request (added in
commit e16b278164, "ptrace:
compat_ptrace_request siginfo") now handles this case.
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (202 commits)
[POWERPC] Fix compile breakage for 64-bit UP configs
[POWERPC] Define copy_siginfo_from_user32
[POWERPC] Add compat handler for PTRACE_GETSIGINFO
[POWERPC] i2c: Fix build breakage introduced by OF helpers
[POWERPC] Optimize fls64() on 64-bit processors
[POWERPC] irqtrace support for 64-bit powerpc
[POWERPC] Stacktrace support for lockdep
[POWERPC] Move stackframe definitions to common header
[POWERPC] Fix device-tree locking vs. interrupts
[POWERPC] Make pci_bus_to_host()'s struct pci_bus * argument const
[POWERPC] Remove unused __max_memory variable
[POWERPC] Simplify xics direct/lpar irq_host setup
[POWERPC] Use pseries_setup_i8259_cascade() in pseries_mpic_init_IRQ()
[POWERPC] Turn xics_setup_8259_cascade() into a generic pseries_setup_i8259_cascade()
[POWERPC] Move xics_setup_8259_cascade() into platforms/pseries/setup.c
[POWERPC] Use asm-generic/bitops/find.h in bitops.h
[POWERPC] 83xx: mpc8315 - fix USB UTMI Host setup
[POWERPC] 85xx: Fix the size of qe muram for MPC8568E
[POWERPC] 86xx: mpc86xx_hpcn - Temporarily accept old dts node identifier.
[POWERPC] 86xx: mark functions static, other minor cleanups
...
603 CPUs have the same issue that some 750 CPUs have in that they can crash
in funny ways if a store from an FPU register instruction is executed on a
register that has never been initialized since power on. This patch fixes
it by making sure all FP registers have been properly initialized at kernel
boot.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Use the generic pci_enable_resources() instead of the arch-specific code.
The generic version is functionally equivalent, but uses dev_printk.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Define the copy_siginfo_from_user32 entry point for powerpc, so
that generic CONFIG_COMPAT code can call it. We already had the
code rolled into compat_sys_rt_sigqueueinfo, this just moves it
out into the canonical function that other arch's define.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Current versions of gdb require a working implementation of
PTRACE_GETSIGINFO for proper watchpoint support. Since struct siginfo
contains pointers it must be converted when passed to a 32-bit debugger.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
This adds the low level irq tracing hooks to the powerpc architecture
needed to enable full lockdep functionality.
This is partly based on Johannes Berg's initial version. I removed
the asm trampoline that isn't needed (thus improving performance) and
modified all sorts of bits and pieces, reworking most of the assembly,
etc...
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds stacktrace support for powerpc, which will be needed for
lockdep.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves various definitions used all over the place to parse stack
frames to ptrace.h so only one definition is needed.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Lockdep found out that we can occasionally take the device-tree
lock for reading from softirq time (from rtas_token called
by the rtas real time clock code called by the NTP code),
while we take it occasionally for writing without masking
interrupts. The combination of those two can thus deadlock.
While some of those cases of interrupt read lock could be fixed
(such as caching the RTAS tokens) I figured that taking the
lock for writing is so rare (device-tree modification) that we
may as well penalize that case and allow reading from interrupts.
Thus, this turns all the writers to take the lock with irqs
masked to avoid the situation.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* Add special cases for pplus and prep to ide_default_{irq,io_base}()
(+ FIXMEs about the need to use IDE platform host driver instead).
* Remove no longer needed ppc_ide_md and struct ide_machdep_calls.
* Then remove <linux/ide.h> include from:
- arch/powerpc/kernel/setup_32.c
- arch/ppc/kernel/ppc_ksyms.c
- arch/ppc/kernel/setup.c
- arch/ppc/platforms/pplus.c
- arch/ppc/platforms/prep_setup.c
There should be no functional changes caused by this patch.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Semaphores are no longer performance-critical, so a generic C
implementation is better for maintainability, debuggability and
extensibility. Thanks to Peter Zijlstra for fixing the lockdep
warning. Thanks to Harvey Harrison for pointing out that the
unlikely() was unnecessary.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
global_dbcr0 needs to be a per cpu set of save areas instead of a single
global on all processors.
Also, we switch to using DBCR0_IDM to determine if the user space app is
being debugged as its a more consistent way. In the future we should
support features like hardware breakpoint and watchpoints which will
have DBCR0_IDM set but not necessarily DBCR0_IC (single step).
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The architecture allows for "Book-E" style debug interrupts to either go
to critial interrupts of their own debug interrupt level. To allow for
a dynamic kernel to support machines of either type we want to be able to
compile in the interrupt handling code for both exception levels.
Towards this goal we renamed the debug handling macros to specify the
interrupt level in their name (DEBUG_CRIT_EXCEPTION/DebugCrit and
DEBUG_DEBUG_EXCEPTION/DebugDebug).
Additionally, on the Freescale Book-e parts we expanded the exception
stacks to cover the maximum case of needing three exception stacks (normal,
machine check and debug).
There is some kernel text space optimization to be gained if a kernel is
configured for a specific Freescale implementation but we aren't handling
that now to allow for the single kernel image support.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This changes the way we calculate how much space to reserve for the
pHyp dump. Currently we reserve 256MB only. With this change, the
code first checks to see if an amount has been specified on the boot
command line with the "phyp_dump_reserve_size" option, and if so, uses
that much.
Otherwise it computes 5% of total ram and rounds it down to a multiple
of 256MB, and uses the larger of that or 256MB.
This is for large systems with a lot of memory (10GB or more). The
aim is to have more space available for the kernel on reboot on
machines with more resources. Although the dump will be collected
pretty fast and the memory released really early on allowing the
machine to have the full memory available, this alleviates any issues
that can be caused by having way too little memory on very very large
systems during those few minutes.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We can set LOAD_OFFSET and use the AT attribute on sections and the
linker will properly set the physical address of the LOAD program
header for us.
This allows us to know how the PHYSICAL_START the user configured a
kernel with by just looking at the resulting vmlinux ELF.
This is pretty much stolen from how x86 does things in their linker
scripts.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* PAGE_OFFSET is not always the start of code, use _stext instead.
* grab PAGE_SIZE and KERNELBASE from asm/page.h like ppc64 does. Makes the
code a bit more common and provide a single place to manipulate the
defines for things like kdump.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* Determine the RPN we are running the kernel at runtime rather
than using compile time constant for initial TLB
* Cleanup adjust_total_lowmem() to respect memstart_addr and
be a bit more clear on variables that are sizes vs addresses.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fedora 9 works on Efika without the separate 'device-tree supplement',
thanks to the kernel's own fixups. With one exception -- because 'CHRP'
still appears on the 'machine:' line in /proc/cpuinfo, the installer
misdetects the platform and misconfigures yaboot, putting it into a PReP
boot partition instead of in the /boot filesystem where the Efika's
firmware could find it.
The kernel's fixups for Efika already correct one instance of 'chrp', in
the 'device_type' property. This fixes it in the 'CODEGEN,description'
property too, since that's what's exposed to userspace in /proc/cpuinfo.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes the handling of the preempt count when switching
interrupt stacks so that HW interrupt properly get the softirq
mask copied over from the previous stack.
It also initializes the softirq stack preempt_count to 0 instead
of SOFTIRQ_OFFSET, like x86, as __do_softirq() does the increment,
and we hit some lockdep checks if we have it twice.
That means we do run for a little while off the softirq stack
with the preempt-count set to 0, which could be deadly if we
try to take a softirq at that point, however we do so with
interrupts disabled, so I think we are ok.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, we initialize the "current" pointer in the PACA (which
is used by the "current" macro in the kernel) before calling
setup_system(). That means that early_setup() is called with
current still "NULL" which is -not- a good idea. It happens to
work so far but breaks with lockdep when early code calls printk.
This changes it so that all PACAs are statically initialized with
__current pointing to the init task. For non-0 CPUs, this is fixed
up before use.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that we have the alpaca, the reg_save_ptr is no longer needed in the
paca. Eradicate all global uses of it and make it static in the iSeries
lpardata.c
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The iSeries HV only needs the first two fields of the paca statically
initialised, so create an alternate paca that contains only those and
switch to our real paca immediately after boot.
This is in order to make the 1024 cpu patches easier since they will no
longer have to statically initialise the pacas for iSeries.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If an SLB miss interrupt happens while the RI bit of MSR is zero, we
can't just return, because RI being zero indicates that SRR0/SRR1
potentially had live values in them, and the process of taking an
interrupt overwrites them.
This should never happen, but if it does, we try to print a nice oops
message. That doesn't work, however, because the code at unrecov_slb
assumes that the MMU has been turned on, but we call it with the MMU
off (and have done so since the SLB miss handler was rewritten to run
without turning the MMU on) -- except on iSeries, where everything runs
with the MMU on.
This fixes it by adding the necessary code to turn the MMU on if
necessary.
Signed-off-by: Paul Mackerras <paulus@samba.org>
A couple of places are duplicating the function of
of_device_is_available; convert them to use it.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We have an assembly version of strncmp for the bootwrapper, but not
for the kernel, so we end up using the C version in the kernel. This
takes the strncmp code from the bootup and copies it to the kernel
proper, adding two instructions so it copes correctly with len==0.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Handling of the proc_dir_entry->count was changed in 2.6.24-rc5.
After this change, the default value for pde->count is 1 and not 0 as
before. Therefore, if we want to check whether our procfs file is
already opened (already in use), we have to check if pde->count is
greater than 2 rather than 1.
Signed-off-by: Maxim Shchetynin <maxim@de.ibm.com>
Signed-off-by: Jens Osterkamp <jens@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A subtle bug sneaked into iSeries recently. On this platform, we must
not normally clear MSR:EE (the hardware external interrupt enable)
except for short periods of time. Taking an interrupt while
soft-disabled doesn't cause us to clear it for example.
The iSeries kernel expects to mostly run with MSR:EE enabled at all
times except in a few exception entry/exit code paths. Thus
local_irq_enable() doesn't check if it needs to hard-enable as it
expects this to be unnecessary on iSeries.
However, hard_irq_disable() _does_ cause MSR:EE to be cleared,
including on iSeries. A call to it was recently added to the
context switch code, thus causing interrupts to become disabled
for a long periods of time, causing the iSeries watchdog to kick
in under some circumstances and other nasty things.
This patch fixes it by making local_irq_enable() properly re-enable
MSR:EE on iSeries. It basically removes a return statement here
to make iSeries use the same code path as everybody else. That does
mean that we might occasionally get spurious decrementer interrupts
but I don't think that matters.
Another option would have been to make hard_irq_disable() a nop
on iSeries but I didn't like it much, in case we have good reasons
to hard-disable.
Part of the patch is fixes to make sure the hard_enabled PACA field
is properly set on iSeries as it used not to be before, since it
was mostly unused.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Old-world powermacs don't set L2CR or L3CR on processor upgrade cards.
This simple patch allows the setting of L3CR via a kernel parameter
(like the existing kernel parameter to set L2CR).
Signed-off-by: Robert Brose <bob@qbjnet.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
WARNING: vmlinux.o(.text+0x1e4c0): Section mismatch in reference from the
function .move_device_tree() to the function .init.text:.lmb_alloc_base()
The function .move_device_tree() references
the function __init .lmb_alloc_base().
This is often because .move_device_tree lacks a __init
annotation or the annotation of .lmb_alloc_base is wrong.
move_device_tree() is called from early_init_devtree() only, which is __init
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There is a bug in the powerpc DABR (data access breakpoint) handling,
which can result in us missing breakpoints if several threads are trying
to break on the same address.
The circumstances are that do_page_fault() calls do_dabr(), this clears
the DABR (sets it to 0) and sets up the signal which will report to
userspace that the DABR was hit. The do_signal() code will restore the DABR
value on the way out to userspace.
If we reschedule before calling do_signal(), __switch_to() will check the
cached DABR value and compare it to the new thread's value, if they match
we don't set the DABR in hardware.
So if two threads have the same DABR value, and we schedule from one to
the other after taking the interrupt for the first thread hitting the DABR,
the second thread will run without the DABR set in hardware.
The cleanest fix is to move the cache update into set_dabr(), that way we
can't forget to do it.
Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds basic support for the AMCC 460EX/460GT PPC's to arch/powerpc.
Currently those PPC's are still based on a 440 core and *not* a 460 core.
Here some basic features of those SoC's:
460EX:
- Up to 1.2GHz, 32kB L1 I-cache and D-cache, 256kB L2-cache, FPU
- 1 * PCI (max 66MHz), 2 * PCIe (one 4-lane, one 1-lane)
- 2 * GBit Ethernet with TCP/IP acceleration
- USB 2.0 Host/Device OTG and Host interface
- SATA controller
- Optional security feature
460GT (only changes to 460EX):
- 4 * GBit Ethernet with TCP/IP acceleration
- RapidIO
- No SATA
- No USB
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This adds a kernel command line option "phyp_dump", which takes a 0/1
value for disabling/ enabling phyp_dump at boot time. Kdump can use
this on cmdline (phyp_dump=0) to disable phyp-dump during boot when
enabling itself. This will ensure only one dumping mechanism is active
at any given time.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Initial patch for reserving memory in early boot, and freeing it
later. If the previous boot had ended with a crash, the reserved
memory would contain a copy of the crashed kernel data.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
These items in asm-offsets.c are not used anywhere. This removes them.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The PT_DTRACE flag is meaningless and obsolete.
Don't touch it.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It was being protected by CONFIG_PPC32, but we want to export it on
64-bit also. This moves it out of the ifdef.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Print out 'model' property of '/' node as a machine name
in generic show_cpuinfo() routine.
Signed-off-by: Marian Balakowicz <m8@semihalf.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since the PMU is an NMI now, it can come at any time we are only soft
disabled. We must hard disable around the two places we allow the kernel
stack SLB and r1 to go out of sync. Otherwise the PMU exception can
force a kernel stack SLB into another slot, which can lead to it
getting evicted, which can lead to a nasty unrecoverable SLB miss
in the exception entry code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The PTRACE_SETREGS request was only recently added on powerpc,
and gdb does not use it. So it slipped through without getting
all the testing it should have had.
The user_regset changes had a simple bug in storing to all of
the 32-bit general registers block on 64-bit kernels. This bug
only comes up with PTRACE_SETREGS, not PPC_PTRACE_SETREGS.
It causes a BUG_ON to hit, so this fix needs to go in ASAP.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Once again, this time with feeling....
- Ted
>From c91cfaabc17f8a53807a2f31f067a732e34a1550 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso@mit.edu>
Date: Wed, 12 Mar 2008 11:50:39 -0400
Subject: Export empty_zero_page
The empty_zero_page symbol is exported by most other architectures
(s390, ia64, x86, um), and an upcoming ext4 patch needs it because
ZERO_PAGE() references empty_zero_page, and we need it to zero out an
unitialized extents in ext4 files.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A bogus test for unassigned resources that came from our 32-bit
PCI code ended up being "merged" by my previous patch series,
breaking some 64-bit setups where devices have legal resources
ending at 0xffffffff.
This fixes it by completely changing the test. We now test for
res->start == 0, as the generic code expects, and we also only
do so on platforms that don't have the PPC_PCI_PROBE_ONLY flag
set, as there are cases of pSeries and iSeries where it could
be a valid value and those can't reassign devices.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Some drivers (such as V4L2) have code that causes gcc to generate
calls to __ucmpdi2 when compiling for 32-bit powerpc, which results
in either a link-time error or a module that can't be loaded, as
we don't currently have a __ucmpdi2. This adds one so these drivers
can be used.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This makes swap routines operate correctly on the ppc_8xx based machines.
Code has been revalidated on mpc885ads (8M sdram) with recent kernel. Based
on patch from Yuri Tikhonov <yur@emcraft.com> to do the same on arch/ppc
instance.
Recent kernel's size makes swap feature very important on low-memory platforms,
those are actually non-operable without it.
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Vitaly Bordug <vitb@kernel.crashing.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This code isn't referenced anywhere, so remove it.
Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
dt_mem_next_cell() currently does of_read_ulong(). This does not
allow for the case where #size-cells and/or #address-cells = 2 on a
32-bit system, as it will end up reading 32 bits instead of the
expected 64. Change it to use of_read_number instead and always
return a u64.
Signed-off-by: Becky Bruce <becky.bruce at freescale.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix sparse warnings in powerpc kprobes:
CHECK arch/powerpc/kernel/kprobes.c
arch/powerpc/kernel/kprobes.c:277:6: warning: symbol 'kretprobe_trampoline_holder' was not declared. Should it be static?
arch/powerpc/kernel/kprobes.c:287:15: warning: symbol 'trampoline_probe_handler' was not declared. Should it be static?
arch/powerpc/kernel/kprobes.c:525:16: warning: symbol 'jprobe_return_end' was not declared. Should it be static?
Fix along the same lines as http://lkml.org/lkml/2008/2/13/642
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes the following section mismatches:
<-- snip -->
...
WARNING: vmlinux.o(.text+0xe49c): Section mismatch in reference from the function .vdso_do_func_patch64() to the function .init.text:.find_symbol64()
WARNING: vmlinux.o(.text+0xe4d0): Section mismatch in reference from the function .vdso_do_func_patch64() to the function .init.text:.find_symbol64()
WARNING: vmlinux.o(.text+0xe56c): Section mismatch in reference from the function .vdso_do_func_patch32() to the function .init.text:.find_symbol32()
WARNING: vmlinux.o(.text+0xe5a0): Section mismatch in reference from the function .vdso_do_func_patch32() to the function .init.text:.find_symbol32()
...
<-- snip -->
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The powerpc show_regs prints CPU using smp_processor_id: change that to
raw_smp_processor_id, so that when it's showing a WARN_ON backtrace without
preemption disabled, DEBUG_PREEMPT doesn't mess up that warning with its own.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit ad7f71674a ("[POWERPC] Use a
sensible default for clock_getres() in the VDSO") corrected the clock
resolution reported by the VDSO clock_getres() but introduced another
problem in that older versions of gcc (gcc-4.0 and earlier) fail to
compile the new code in arch/powerpc/kernel/asm-offsets.c.
This fixes it by introducing a new MONOTONIC_RES_NSEC define in the
generic code which is equivalent to KTIME_MONOTONIC_RES but is just an
integer constant, not a ktime union.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes the SPE register data appear in ELF core dumps, using the
new n_type value NT_PPC_SPE (0x101). This new note type is not used
by any consumers of core files yet, but support can be added. I don't
even have any hardware with SPE capabilities, so I've never seen such
a note. But this demonstrates how simple it is to export register
information in core dumps when the user_regset style is used for the
low-level code.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This cleans up the 32-bit ptrace syscall support to use user_regset calls
to get at the register data for PTRACE_*REGS* calls.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This replaces powerpc's compat_sys_ptrace with a compat_arch_ptrace and
enables the new generic definition of compat_sys_ptrace instead.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This removes some duplicated code by calling the new generic
compat_ptrace_request from powerpc's compat_sys_ptrace.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that ptrace_request handles these, we can drop some more boilerplate.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This replaces all the code for powerpc PTRACE_*REGS* requests with
simple calls to copy_regset_from_user and copy_regset_to_user. All
the ptrace formats are either the whole corresponding user_regset
format (core dump format) or a leading subset of it, so we can get
rid of all the remaining embedded knowledge of both those layouts
and of the internal data structures they correspond to. Only the
user_regset accessors need to implement that.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This switches the CONFIG_PPC64 support for 32-bit ELF to use the
generic fs/compat_binfmt_elf.c implementation instead of our own
binfmt_elf32.c. Since so much is the same between 32/64, there is
only one macro we have to define to make the generic support work out
of the box.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This extends task_user_regset_view CONFIG_PPC64 with support for the
32-bit view of register state, compatible with what a CONFIG_PPC32
kernel provides. This will enable generic machine-independent code to
access user-mode threads' registers for debugging and dumping.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This provides the task_user_regset_view entry point and support for
all the native-mode (64 on CONFIG_PPC64, 32 on CONFIG_PPC32) thread
register state. This will enable generic machine-independent code to
access user-mode threads' registers for debugging and dumping.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This implements user_regset-style accessors for the powerpc general
registers. In the future these functions will be the only place that
needs to understand the user_regset layout (core dump format) and how
it maps to the internal representation of user thread state.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This isolates the ptrace code for the special-case registers msr and trap
from the ptrace-layout dispatch code. This should inline away completely.
It cleanly separates the low-level machine magic that has to be done for
deep reasons, from the superficial details of the ptrace interface.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This implements user_regset-style accessors for the powerpc SPE data,
and rewrites the existing ptrace code in terms of those calls.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This implements user_regset-style accessors for the powerpc Altivec data,
and rewrites the existing ptrace code in terms of those calls.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This implements user_regset-style accessors for the powerpc FPU data,
and rewrites the existing ptrace code in terms of those calls.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Looks like "[POWERPC] kdump shutdown hook support" broke builds when
CONFIG_DEBUGGER=n and CONFIG_KEXEC=y, such as in g5_defconfig:
arch/powerpc/kernel/crash.c: In function 'default_machine_crash_shutdown':
arch/powerpc/kernel/crash.c:388: error: '__debugger_fault_handler' undeclared (first use in this function)
arch/powerpc/kernel/crash.c:388: error: (Each undeclared identifier is reported only once
arch/powerpc/kernel/crash.c:388: error: for each function it appears in.)
Move the debugger hooks to under CONFIG_DEBUGGER || CONFIG_KEXEC, since
that's when the crash code is enabled.
(I should have caught this with my build-script pre-merge, my bad. :( )
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves the ability to scale cputime into generic code. This allows us
to fix the issue in kernel/timer.c (noticed by Balbir) where we could only
add an unscaled value to the scaled utime/stime.
This adds a cputime_to_scaled function. As before, the POWERPC version
does the scaling based on the last SPURR/PURR ratio calculated. The
generic and s390 (only other arch to implement asm/cputime.h) versions are
both NOPs.
Also moves the SPURR and PURR snapshots closer.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The e300 c3 and c4 variants support hardware performance monitor counters
which are identical to those found in the e500.
Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Some of the more recent e300 cores have the same performance monitor
implementation as the e500. e300 isn't book-e, so the name isn't
really appropriate. In preparation for e300 support, rename a bunch
of fsl_booke things to say fsl_emb (Freescale Embedded Performance Monitors).
Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The patch to legacy_serial.c (1a7507c7da,
Reduce code duplication in legacy_serial, add UART parent types) changed
the semantics for opb ports from type = "opb" || compatible = "ibm,opb"
to type = "opb" && compatible = "ibm,opb".
The result is serial ports on our QS21s (Cell blades) don't get found,
and for some reason the machine doesn't boot at all - possibly it's
panicking due to lack of a console?
The fix is to add two entries to the of_device_id table, one that looks
for type = "opb" and the other compatible = "ibm,opb".
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This ensures that the syscall and the (fast) vdso versions of
clock_getres() will return the same resolution.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
WARNING: vmlinux.o(.text+0x3017c): Section mismatch in reference from the function .vio_create_viodasd() to the function .devinit.text:.vio_register_device_node()
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Previously, during initialization of the IOMMU tables, the last entry
at each 4GB boundary is marked as used since there are many adapters
which cannot handle DMAing across any 4GB boundary.
The IOMMU doesn't allocate a memory area spanning LLD's segment
boundary anymore. The segment boundary of devices are set to 4GB by
default. So we can remove 4GB boundary protection now.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch converts PPC's IOMMU to use the IOMMU helper functions. The IOMMU
doesn't allocate a memory area spanning LLD's segment boundary anymore.
iseries_hv_alloc and iseries_hv_map don't have proper device
struct. 4GB boundary is used for them.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
PPC: Fix powerpc vio_find_name to not use devices_subsys
Driver core: add bus_find_device_by_name function
Module: check to see if we have a built in module with the same name
x86: fix runtime error in arch/x86/kernel/cpu/mcheck/mce_amd_64.c
Driver core: Fix up build when CONFIG_BLOCK=N
This removes the handling for PTRACE_CONT et al from the powerpc
ptrace code, so it uses the new generic code via ptrace_request.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This defines the new standard arch_has_single_step macro. It makes the
existing set_single_step and clear_single_step entry points global, and
renames them to the new standard names user_enable_single_step and
user_disable_single_step, respectively.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove the deprecated __attribute_used__.
[Introduce __section in a few places to silence checkpatch /sam]
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
This patch consolidate all definitions of .init.text, .init.data
and .exit.text, .exit.data section definitions in
the generic vmlinux.lds.h.
This is a preparational patch - alone it does not buy
us much good.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
All current 85xx/e500 implementations only have two TLB
arrays. We are wasting cycles by invalidating TLB2 and TLB3.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The legacy_serial was treating each UART parent in a separate code block.
Rather than continue this trend for the new parent IDs, this condenses
all (soc, tsi, opb, plus two more new types) into one of_device_id array.
The new types are wrs,epld-localbus for the Wind River sbc8560, and a
more generic "simple-bus" as requested by Scott Wood.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This fixes vio_find_name() in arch/powerpc/kernel/vio.c, which is
currently broken because it tries to use devices_subsys. That is bad
for two reasons: (1) it's doing (or trying to do) a scan of all
devices when it should only be scanning those on the vio bus, and
(2) devices_subsys was an internal symbol of the device system code
which was never meant for external use and has now gone away, and
thus the kernel fails to compile on pSeries.
The new version uses bus_find_device_by_name() on the vio bus
(vio_bus_type).
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Prune back Efika fixups to only include changes that are actually required
to get a working system. Most of the drivers can accept the compatible
properties, even if they don't match the what is recommented in the generic
names recommended practice document.
This patch also adds extra checks so that fixups are not performed blindly.
Instead, the code first verifies that the device tree is faulty before
making any changes. This way, if the Efika firmware is updated to fix
these issues, then the fixups will no longer get applied.
At this point; here is the list of fixups needed for the efika:
1. If the device_type property on the root node is 'chrp', then Linux won't
boot. Change device_type to 'efika' to avoid this condition
2. Add full interrupt list to the bestcomm node. In actual fact, the
bestcomm interrupts property is technically correct, it just doesn't
expose the same granularity as the device driver expects. All other
5200 device trees provide a separate irq number for each bestcomm
channel. Rather than hack the driver, it's simpler to fix it up
3. /builtin/sound node is missing an interrupts property
4. /builtin/ethernet node is missing a phy-handle property and the
device driver doesn't know what to do without one.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This adds the 440EP revision C PVR to the CPU table. The chip has an
FPU on it, so we also match the logical PVR
Signed-off-by: Sean MacLennan <smaclennan@pikatech.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This patch adds the 405EXr to the powerpc cuptable. Basically the 405EXr
is a 405EX with only one EMAC and only one PCIe interface.
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Since commit c80d9133e9 (Make direct DMA use
node local allocations) went in this comment makes no sense.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We no longer need the global dma_direct_offset, update the comment to
reflect the new reality.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that all platforms using dma_direct_offset setup the
archdata.dma_data correctly, we can change the dma_direct_ops to
retrieve the offset from the dma_data, rather than directly from the
global.
While we're here, change the way the offset is used - instead of
or'ing it into the value, add it. This should have no effect on
current implementations where the offset is far larger than memory,
however in future we may want to use smaller offsets.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We must always hookup the pci_bus resource 0 to the PHB io_resource
even if the latter is empty (the bus has no IO support). Otherwise,
some other code will end up hooking it up to something bogus and the
resource tree will end up being broken.
This fixes boot on QS20 Cell blades where the IDE driver failed to
allocate the IO resources due to breakage of the resource tree.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds hooks into the default_machine_crash_shutdown so drivers can
register a function to be run in the first kernel before we hand off
to the second kernel. This should only be used in exceptional
circumstances, like where the device can't be reset in the second
kernel alone (as is the case with eHEA). To emphasize this, the
number of handles allowed to be registered is currently #def to 1.
This uses the setjmp/longjmp code around the call out to the
registered hooks, so any bogus exceptions we encounter will hopefully
be recoverable.
Tested with bogus data and instruction exceptions.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This makes the setjmp/longjmp code used by xmon, generically available
to other code. It also removes the requirement for debugger hooks to
be only called on 0x300 (data storage) exception.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Export copy_page() on 32-bit powerpc; unionfs needs it.
Unionfs already builds as a module on 64bit powerpc, so the export is
placed within an existing CONFIG_PPC32 #ifdef.
Signed-off-by: Joseph Fannin <jfannin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
smp_send_stop() will send an IPI to all other cpus to shut them down.
However, for the case of xmon-based reboots (as well as potentially some
panics), the other cpus are (or might be) spinning with interrupts off,
and won't take the IPI.
Current code will drop us into the debugger when the IPI fails, which
means we're in an infinite loop that we can't get out of without an
external reset of some sort.
Instead, make the smp_send_stop() IPI call path just print the warning
about being unable to send IPIs, but make it return so the rest of the
shutdown sequence can continue. It's not perfect, but the lesser of
two evils.
Also move the call_lock handling outside of smp_call_function_map so we
can avoid deadlocks in smp_send_stop().
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
smp_call_function_map should be static, and for consistency prepend it
with __ like other local helper functions in the same file.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Do just enough to move the RapidIO support code for 85xx over from arch/ppc
into arch/powerpc and make it still build.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The e500 MMU init code previously assumed KERNELBASE always equaled
PAGE_OFFSET and PHYSICAL_START was 0. This is useful for kdump
support as well as asymetric multicore.
For the initial kdump support the secondary kernel will run at 32M
but need access to all of memory so we bump the initial TLB up to
64M. This also matches with the forth coming ePAPR spec.
Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
For transparent P2P bridges the first 3 resources may get set from based on
BAR registers and need to get fixed up. Where as the remainder come from the
parent bus and have already been fixed up.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The fixup code that handles the case for PowerMac's that leave bridge
windows open over an inaccessible region should only be applied to
memory resources (IORESOURCE_MEM). If not we can get it trying to fixup
IORESOURCE_IO on some systems since the other conditions that are used to
detect the case can easily match for IORESOURCE_IO.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Using 64k pages on 64-bit PowerPC systems makes life difficult for
emulators that are trying to emulate an ISA, such as x86, which use a
smaller page size, since the emulator can no longer use the MMU and
the normal system calls for controlling page protections. Of course,
the emulator can emulate the MMU by checking and possibly remapping
the address for each memory access in software, but that is pretty
slow.
This provides a facility for such programs to control the access
permissions on individual 4k sub-pages of 64k pages. The idea is
that the emulator supplies an array of protection masks to apply to a
specified range of virtual addresses. These masks are applied at the
level where hardware PTEs are inserted into the hardware page table
based on the Linux PTEs, so the Linux PTEs are not affected. Note
that this new mechanism does not allow any access that would otherwise
be prohibited; it can only prohibit accesses that would otherwise be
allowed. This new facility is only available on 64-bit PowerPC and
only when the kernel is configured for 64k pages.
The masks are supplied using a new subpage_prot system call, which
takes a starting virtual address and length, and a pointer to an array
of protection masks in memory. The array has a 32-bit word per 64k
page to be protected; each 32-bit word consists of 16 2-bit fields,
for which 0 allows any access (that is otherwise allowed), 1 prevents
write accesses, and 2 or 3 prevent any access.
Implicit in this is that the regions of the address space that are
protected are switched to use 4k hardware pages rather than 64k
hardware pages (on machines with hardware 64k page support). In fact
the whole process is switched to use 4k hardware pages when the
subpage_prot system call is used, but this could be improved in future
to switch only the affected segments.
The subpage protection bits are stored in a 3 level tree akin to the
page table tree. The top level of this tree is stored in a structure
that is appended to the top level of the page table tree, i.e., the
pgd array. Since it will often only be 32-bit addresses (below 4GB)
that are protected, the pointers to the first four bottom level pages
are also stored in this structure (each bottom level page contains the
protection bits for 1GB of address space), so the protection bits for
addresses below 4GB can be accessed with one fewer loads than those
for higher addresses.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Also check that __NR_syscalls has been updated appropriately.
Hopefully this will catch any out of order additions to the
table in the future.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
xlate_iomm_address() really wants the ds_addr to pass to the HV, so store
that value (instead of the BAR number) when we allocate the device bars.
This is not a fast path, so we can look up the device_node property
there instead of using the bussubno field of the pci_dn.
The other user of iseries_ds_addr() was already scanning the device tree,
so looking up a property will not slow it down any more.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Similar to of_find_compatible_node(), of_find_matching_node() and
for_each_matching_node() allow you to iterate over the device tree
looking for specific nodes, except that they take of_device_id
tables instead of strings.
This also moves of_match_node() from driver/of/device.c to
driver/of/base.c to colocate it with the of_find_matching_node which
depends on it.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit 5d2efba64b changed our iommu code
so that it always uses an iommu page size of 4kB. That means with our
current code, drivers may do a dma_map_sg() of a 64kB page and obtain
a dma_addr_t that is only 4k aligned.
This works fine in most cases except for some infiniband HW it seems,
where they tell the HW about the page size and it ignores the low bits
of the DMA address.
This works around it by making our IOMMU code enforce a PAGE_SIZE alignment
for mappings of objects that are page aligned in the first place and whose
size is larger or equal to a page.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The new network driver fec_mpc52xx will not work on efika because the
firmware does not provide all required properties.
http://www.powerdeveloper.org/asset/by-id/46 has a Forth script to
create more properties. But only the phy stuff is required to get a
working network.
This should go into the kernel because its appearently
impossible to boot the script via tftp and then load the real boot
binary (yaboot or zimage).
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This reverts commit 553aa7659b at Ben H's
request, because it confused IORESOURCE_* flags with command register
bits.
Requested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The e200 and e500 platforms are separated in various parts of the kernel with
ifdefs, most notably reg_booke.h and traps.c. The new machine_check rework
requires them to be similarly separated in cputable.c to avoid compile errors.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Renaming the CPU nodes with generic names put the CPU model in
the "model" property and thus broke the PowerPC 440EP(x)/440GR(x)
identical PVR workaround. The updates it to use the new model property
for CPU identification.
Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
The mechanism to do the setup for 440A cores changed recently. This fixes
the 440grx setup function to call __fixup_440A_mcheck.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This adds some basic real mode based early udbg support for 40x
in order to debug things more easily
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This adds a cputable function pointer for the CPU-side machine
check handling. The semantic is still the same as the old one,
the one in ppc_md. overrides the one in cputable, though
ultimately we'll want to change that so the CPU gets first.
This removes CONFIG_440A which was a problem for multiplatform
kernels and instead fixes up the IVOR at runtime from a setup_cpu
function. The "A" version of the machine check also tweaks the
regs->trap value to differenciate the 2 versions at the C level.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This will allow us to declare const all the statically declared arrrays
of these.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 32-bit PCI code tests if "bus" is non-NULL after calling
pci_scan_bus_parented() in one place but not another before
dereferencing it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
These hooks ensure that a decrementer interrupt is not pending when
suspending; otherwise, problems may occur on 6xx/7xx/7xxx-based
systems (except for powermacs, which use a separate suspend path).
For example, with deep sleep on the 831x, a pending decrementer will
cause a system freeze because the SoC thinks the decrementer interrupt
would have woken the system, but the core must have interrupts
disabled due to the setup required for deep sleep.
Changed via-pmu.c to use the new ppc_md hooks, and made the arch_*
functions call the generic_* functions unconditionally. -- paulus
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When a module has relocation sections with tens of thousands of
entries, counting the distinct/unique entries only (i.e. no
duplicates) at load time can take tens of seconds and up to minutes.
The sore point is the count_relocs() function which is called as part
of the architecture specific module loading processing path:
-> load_module() generic
-> module_frob_arch_sections() arch specific
-> get_plt_size() 32-bit
-> get_stubs_size() 64-bit
-> count_relocs()
Here count_relocs is being called to find out how many distinct
targets of R_PPC_REL24 relocations there are, since each distinct
target needs a PLT entry or a stub created for it.
The previous counting algorithm has O(n^2) complexity. Basically two
solutions were proposed on the e-mail list: a hash based approach and
a sort based approach.
The hash based approach is the fastest (O(n)) but the has it needs
additional memory and for certain corner cases it could take lots of
memory due to the degeneration of the hash. One such proposal was
submitted here:
http://ozlabs.org/pipermail/linuxppc-dev/2007-June/037641.html
The sort based approach is slower (O(n * log n + n)) but if the
sorting is done "in place" it doesn't need additional memory.
This has O(n + n * log n) complexity with no additional memory
requirements.
This commit implements the in-place sort option.
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Lucas Woods <woodzy@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We were using -mno-minimal-toc on everything in arch/powerpc/kernel,
which means that all the functions in there were putting all their
TOC entries in the top-level TOC, and it was overflowing on an
allyesconfig build. For various reasons, prom_init.c does need
-mno-minimal-toc, but the other .c files in there can use sub-TOCs
quite happily. This change is sufficient for now to stop the TOC
overflowing; other directories under arch/powerpc also use
-mno-minimal-toc and could also be changed later if necessary.
Lmbench runs with and without this patch showed no significant speed
differences.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The PCI IRQ code has a fallback when the device-tree parsing fails, that
tries to map the interrupt indicated by PCI_INTERRUPT_LINE if the firmware
set something in there. This is a bit fragile but has proven useful in some
cases so far. However, it's causing us to incorrectly try to map interrupt 0
on various setups, so let's prevent that case, as none of the cases where
the fallback is legit should have an IRQ 0.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch changes the PowerPC PCI code to disable IO and/or Memory
decoding on a PCI device when a resource of that type failed to be
allocated. This is done to avoid having unallocated dangling BARs
enabled that might try to decode on top of other devices.
If a proper resource is assigned later on, then pci_enable_device()
will take care of re-enabling decoding.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Apple firmware has a strange way to "close" bridge resources by setting
them to some bogus values that overlap RAM (strangely, I haven't seen it
conflicting with DMA so far...). This explicitely closes them to avoid
problems. Previously, they would be closed as a consequence of failing
to be allocated, but this makes it more explicit, and thus the log
message is more explicit too.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Our implementation of pcibios_enable_device() has a couple of problems.
One is that it should not check IORESOURCE_UNSET, as this might be
left dangling after resource assignment (shouldn't but there are
bugs), but instead, we make it check resource->parent which should
be a reliable indication that the resource has been successfully
claimed (it's in the resource tree).
Then, we also need to skip ROM resources that haven't been enabled
as x86 does.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This merge the two implementations, based on the previously
fixed up 32 bits one. The pcibios_enable_device_hook in ppc_md
is now available for ppc64 use. Also remove the new unused
"initial" parameter from it and fixup users.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Our implementation of pcibios_enable_device() incorrectly ignores
the mask argument and always checks that all resources have been
allocated, which isn't the right thing to do anymore.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The way iSeries manages PCI IO and Memory resources is a bit strange
and is based on overriding the content of those resources with home
cooked ones afterward.
This changes it a bit to better integrate with the new resource handling
so that the "virtual" tokens that iSeries replaces resources with are
done from the proper per-device fixup hook, and bridge resources are
set to enclose that token space. This fixes various things such as
the output of /proc/iomem & ioports, among others. This also fixes up
various boot messages as well.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 32 bits PCI code now uses the generic code for assigning unassigned
resources and an algorithm similar to x86 for claiming existing ones.
This works far better than the 64 bits code which basically can only
claim existing ones (pci_probe_only=1) or would fall apart completely.
This merges them so that the new 32 bits implementation is used for both.
64 bits now gets the new PCI flags for controlling the behaviour, though
the old pci_probe_only global is still there for now to be cleared if you
want to.
I kept a pcibios_claim_one_bus() function mostly based on the old 64
bits code for use by the DLPAR hotplug. This will have to be cleaned
up, thought I hope it will work in the meantime.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The PCI code in 32 and 64 bits fixes up resources differently.
32 bits uses a header quirk plus handles bridges in pcibios_fixup_bus()
while 64 bits does things in various places depending on whether you
are using OF probing, using PCI hotplug, etc...
This merges those by basically using the 32 bits approach for both,
with various tweaks to make 64 bits work with the new approach.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This merges the PowerPC 32 and 64 bits version of pcibios_resource_to_bus
and pcibios_bus_to_resource().
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds flags the platforms can use to enable domain numbers
in /proc/bus/pci.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 32 bits PCI code carries an old hack that was only useful for G5
machines. Nowdays, the 32 bits kernel doesn't support any of those
machines anymore so the hack is basically never used, so remove it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds to the 32 bits PCI code some flags, replacing the old
pci_assign_all_busses global, that allow us to control various
aspects of the PCI probing, such as whether to re-assign all
resources or not, or to not try to assign anything at all.
This also adds the flag x86 already has to avoid ISA alignment
on bridges that don't have ISA forwarding enabled (no legacy
devices on the top level bus) and sets it for PowerMacs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 32 bits PowerPC PCI code has a hack for use by some PowerMacs
to try to re-open PCI<->PCI bridge IO resources that were closed
by the firmware. This is no longer necessary as the generic code
will now do that for us.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This makes the 32 bits PowerPC PCI code use the generic code to assign
resources to devices that had unassigned or conflicting resources.
This allow us to remove the local implementation that was incomplete and
could not assign for example a PCI<->PCI bridge from scratch, which is
needed on various embedded platforms.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There's a stale & bogus piece of code in 32 bits PCI code that
complains about ISA related alignment issues. Just remove it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
PowerPC currently doesn't implement pci_set_dma_mask(), which means drivers
calling it will get the generic version in drivers/pci/pci.c.
The powerpc dma mapping ops include a dma_set_mask() hook, which luckily is
not implemented by anyone - so there is no bug in the fact that the hook
is currently never called.
However in future we'll add implementation(s) of dma_set_mask(), and so we
need pci_set_dma_mask() to call the hook.
To save adding a hook to the dma mapping ops, pci-set_consistent_dma_mask()
simply calls the dma_set_mask() hook and then copies the new mask into
dev.coherenet_dma_mask.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We have multiple calls to has_feature being inlined, but gcc can't
be sure that the store via get_paca() doesn't alias the path to
cur_cpu_spec->feature.
Reorder to put the calls to read_purr and read_spurr adjacent to each
other. To add a sense of consistency, reorder the remaining lines to
perform parallel steps on purr and scaled purr of each line instead of
calculating and then using one value before going on to the next.
In addition, we can tell gcc that no SPURR means no PURR. The test is
completely hidden in the PURR case, and in the !PURR case the second test
is eliminated resulting in the simple register copy in the out-of-line
branch.
Further, gcc sees get_paca()->system_time referenced several times and
allocates a register to address it (shadowing r13) instead of caching its
value. Reading into a local varable saves the shadow of r13 and removes
a potentially duplicate load (between the nested if and its parent).
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If CPU_FTR_PURR is not set, we will never set cpu_purr_data->initialized.
Checking via __get_cpu_var on 64 bit avoids one dependent load compared
to cpu_has_feature in the not-present case, and is always required when
it is present. The code is under CONFIG_VIRT_CPU_ACCOUNTING so 32 bit
will not be affected.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
timer_interrupt() was calculating per_cpu_offset several times, having to
start from the toc because of potential aliasing issues.
Placing both decrementer per_cpu varables in a struct and calculating
the address once with __get_cpu_var results in better code on both 32
and 64 bit.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use __get_cpu_var(x) instead of per_cpu(x, smp_processor_id()), as it
is optimized on ppc64 to access the current cpu's per-cpu offset directly;
it's local_paca.offset instead of TOC->paca[local_paca->processor_id].offset.
This is the trivial portion, two functions with one use each.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
as its only called from time_init, which is __init.
Also remove unneeded forward declaration.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove exports of __res and cpm_install_handler/cpm_free_handler. Remove
cpm_install_handler/cpm_free_handler from the commproc.h as well. Both
were used for ARCH=ppc and aren't defined for ARCH=powerpc.
CC arch/powerpc/kernel/ppc_ksyms.o
arch/powerpc/kernel/ppc_ksyms.c:180: error: '__res' undeclared here (not in a function)
arch/powerpc/kernel/ppc_ksyms.c:180: warning: type defaults to 'int' in declaration of '__res'
make[1]: *** [arch/powerpc/kernel/ppc_ksyms.o] Error 1
make: *** [arch/powerpc/kernel] Error 2
LD .tmp_vmlinux1
arch/powerpc/kernel/built-in.o:(__ksymtab+0x198): undefined reference to `cpm_free_handler'
arch/powerpc/kernel/built-in.o:(__ksymtab+0x1a0): undefined reference to `cpm_install_handler'
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vitaly Bordug <vitb@kernel.crashing.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
isel (Integer Select) is a new user space instruction in the
PowerISA 2.04 spec. Not all processors implement it so lets emulate
to ensure code built with isel will run everywhere.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This makes the early debug option force the console loglevel
to the max. The early debug option is meant to catch messages very
early in the kernel boot process, in many cases, before the kernel
has a chance to parse the "debug" command line argument. Thus it
makes sense when CONFIG_PPC_EARLY_DEBUG is set, to force the console
log level to the max at boot time.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds a variant of of_translate_address that uses the dma-ranges
property instead of "ranges", it's to be used by PCI code in parsing
the dma-ranges property.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This removes "volatile" from the MMIO pointer udbg_comport
in udbg_16550.c driver, it's useless and makes checkpatch.pl
complain when adding things to this file.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 32 bits PCI code will display a rather scary error message
PCI: Cannot allocate resource region N of device XXX
at boot when the existing setup of a device as left by the
firmware doesn't match the kernel needs and the device needs
to be moved. This is often not an error at all, as the kernel
will generally easily reallocate the device elsewhere.
This changes the message to something less scary and lowers
its level from error to warning.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 32-bit powerpc resource fixup code uses unsigned longs to do the
offsetting of resources which overflows on platforms such as 4xx where
resources can be 64 bits.
This fixes it by using resource_size_t instead.
However, the IO stuff does rely on some 32 bits arithmetic, so we hack
by cropping the result of the fixups for IO resources with a 32 bits
mask.
This isn't the prettiest but should work for now until we change the
32 bits PCI code to do IO mappings like 64 bits does, within a reserved
are of the kernel address space.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This merges the 32-bit and 64-bit implementations of
pci_process_bridge_OF_ranges(). The new function is cleaner than both
the old ones, and supports 64 bits ranges on ppc32 which is necessary
for the 4xx port.
It also adds some better (hopefully) output to the kernel log which
should help diagnose problems and makes better use of existing OF
parsing helpers (avoiding a few bugs of both implementations along
the way).
There are still a few unfortunate ifdef's but there is no way around
these for now at least not until some other bits of the PCI code are
made common.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This defines isa_mem_base on both 32 and 64 bits (it used to be 32 bits
only). This avoids a few ifdef's in later patches and potentially can
allow support for VGA text mode on 64 bits powerpc.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently we hardwire the number of SLBs to 64, but PAPR says we
should use the ibm,slb-size property to obtain the number of SLB
entries. This uses this property instead of assuming 64. If no
property is found, we assume 64 entries as before.
This soft patches the SLB handler, so it shouldn't change performance
at all.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
and it becomes clear that we should use zalloc_maybe_bootmem.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It only needs the iommu_table address. It also makes use of the node
name to print error messages. So just pass it the things it needs.
This reduces the places that know about the pci_dn by one.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 'data' member of proc_ppc64_lparcfg is unused, but the lparcfg
module's init routine allocates 4K for it.
Remove the code which allocates and frees this buffer.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Also use of_unregister_driver to implement of_unregister_platform_driver.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The size of swapper_pg_dir is 8k instead of 4k when using 64-bit PTEs
(CONFIG_PTE_64BIT).
This was reported by Cedric Hombourger <chombourger@gmail.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The commit fa13a5a1f2 (sched: restore
deterministic CPU accounting on powerpc), unconditionally calls
update_process_tick() in system context. In the deterministic
accounting case this is the correct thing to do. However, in the
non-deterministic accounting case we need to not do this, since doing
this results in the time accounted as hardware irq time being
artificially elevated.
Also this collapses 2 consecutive '#ifdef CONFIG_VIRT_CPU_ACCOUNTING'
checks in time.h into one for neatness.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It is already declared in ppc-pci.h which is included.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
since it's not used outside of arch/powerpc/kernel/pci-common.c.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This cleans up the SMT thread handling, removing some hard coded
assumptions and providing a set of helpers to convert between linux
cpu numbers, thread numbers and cores.
This implementation requires the number of threads per core to be a
power of 2 and identical on all cores in the system, but it's an
implementation detail, not an API requirement and so this limitation
can be lifted in the future if anybody ever needs it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This reverts commit a2b51812a4.
It turns out that this change caused some machines to fail to come
back up when being rebooted, and generated an error in the hypervisor
error log on some machines. The platform architecture (PAPR) is a
little unclear on exactly when the RTAS ibm,os-term function should be
called. Until that is clarified I'm reverting this commit.
Signed-off-by: Paul Mackerras <paulus@samba.org>
If we get no user time and no system time allocated since the last
account_system_vtime, the system to user time ratio estimate can end
up dividing by zero.
This was causing a problem noticed by Balbir Singh.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The rtas_os_term() routine was being called at the wrong time.
The actual rtas call "os-term" will not ever return, and so
calling it from the panic notifier is too early. Instead,
call it from the machine_reset() call.
This splits the rtas_os_term() routine into two: one part to capture
the kernel panic message, invoked during the panic notifier, and
another part that is invoked during machine_reset().
Prior to this patch, the os-term call was never being made,
because panic_timeout was always non-zero. Calling os-term
helps keep the hypervisor happy! We have to keep the hypervisor
happy to avoid service, dump and error reporting problems.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The current VDSO implementation is hardcoded to 128 byte cache blocks,
which are only used on IBM's 64-bit processors.
Convert it to get the cache block sizes out of vdso_data instead,
similar to how the ppc64 in-kernel cache flush does it.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There are several issues with the rtas_ibm_suspend_me code, which
enables platform-assisted suspension of an LPAR as covered in PAPR
2.2.
1.) rtas_ibm_suspend_me uses on_each_cpu() to invoke
rtas_percpu_suspend_me on all cpus via IPI:
if (on_each_cpu(rtas_percpu_suspend_me, &data, 1, 0))
...
'data' is on the calling task's stack, but rtas_ibm_suspend_me takes
no measures to ensure that all instances of rtas_percpu_suspend_me are
finished accessing 'data' before returning. This can result in the
IPI'd cpus accessing random stack data and getting stuck in H_JOIN.
This is addressed by using an atomic count of workers and a completion
on the stack.
2.) rtas_percpu_suspend_me is needlessly calling H_JOIN in a loop.
The only event that can cause a cpu to return from H_JOIN is an H_PROD
from another cpu or a NMI/system reset. Each cpu need call H_JOIN
only once per suspend operation.
Remove the loop and the now unnecessary 'waiting' state variable.
3.) H_JOIN must be called with MSR[EE] off, but lazy interrupt
disabling may cause the caller of rtas_ibm_suspend_me to call H_JOIN
with it on; the local_irq_disable() in on_each_cpu() is not
sufficient.
Fix this by explicitly saving the MSR and clearing the EE bit before
calling H_JOIN.
4.) H_PROD is being called with the Linux logical cpu number as the
parameter, not the platform interrupt server value. (It's also being
called for all possible cpus, which is harmless, but unnecessary.)
This is fixed by calling H_PROD for each online cpu using
get_hard_smp_processor_id(cpu) for the argument.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The early btext debug wouldn't work on PowerMac when booted from BootX
due to the code looking for the wrong property name.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
These don't need to be seen by everyone on every boot.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The context switch code in the kernel issues a dummy stwcx. to clear the
reservation, as recommended by the architecture. However, some processors
can have issues if this stwcx to address A occurs while the reservation
is already held to a different address B. To avoid this problem, the dummy
stwcx. needs to be paired with a dummy lwarx to the same address.
This adds the dummy lwarx, and creates a cpu feature bit to indicate
which cpus are affected. Tested on mpc8641_hpcn_defconfig in
arch/powerpc; build tested in arch/ppc.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since powerpc started using CONFIG_GENERIC_CLOCKEVENTS, the
deterministic CPU accounting (CONFIG_VIRT_CPU_ACCOUNTING) has been
broken on powerpc, because we end up counting user time twice: once in
timer_interrupt() and once in update_process_times().
This fixes the problem by pulling the code in update_process_times
that updates utime and stime into a separate function called
account_process_tick. If CONFIG_VIRT_CPU_ACCOUNTING is not defined,
there is a version of account_process_tick in kernel/timer.c that
simply accounts a whole tick to either utime or stime as before. If
CONFIG_VIRT_CPU_ACCOUNTING is defined, then arch code gets to
implement account_process_tick.
This also lets us simplify the s390 code a bit; it means that the s390
timer interrupt can now call update_process_times even when
CONFIG_VIRT_CPU_ACCOUNTING is turned on, and can just implement a
suitable account_process_tick().
account_process_tick() now takes the task_struct * as an argument.
Tested both with and without CONFIG_VIRT_CPU_ACCOUNTING.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This makes the altivec code in swsusp_32.S depend on CONFIG_ALTIVEC to
avoid build failures for systems that don't have altivec. I'm not sure
whether the code will actually work for other systems, but it was merged
for just ppc32 rather than powermac a very long time ago.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If the low level MMU hash table insertion returns an error (which
can happen in some rare circumstances when the hypervisor refuses
the insertion of a PTE, typically if you try to access junk via
/dev/mem), the generated signal had an incorrect si_addr value due
to a bug in the assembly, which was loading it as a 32 bits quantity
instead of a 64 bits quantity.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The size passing to memset is wrong.
Signed-off-by Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
An allyesconfig build creates a .text section that is so big that the
.text.init.refok and .fixup sections are too far away for the relocations
to be fixed up correctly. This patch fixes that by linking all the
relevent text sections for each file together.
Suggested by Paul Mackerras.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The decrementer in Book E and 4xx processors interrupts on the
transition from 1 to 0, rather than on the 0 to -1 transition as on
64-bit server and 32-bit "classic" (6xx/7xx/7xxx) processors. At the
moment we subtract 1 from the count of how many decrementer ticks are
required before the next interrupt before putting it into the
decrementer, which is correct for server/classic processors, but could
possibly cause the interrupt to happen too early on Book E and 4xx if
the timebase/decrementer frequency is low.
This fixes the problem by making set_dec subtract 1 from the count for
server and classic processors, instead of having the callers subtract
1. Since set_dec already had a bunch of ifdefs to handle different
processor types, there is no net increase in ugliness. :)
Note that calling set_dec(0) may not generate an interrupt on some
processors. To make sure that decrementer_set_next_event always calls
set_dec with an interval of at least 1 tick, we set min_delta_ns of
the decrementer_clockevent to correspond to 2 ticks (2 rather than 1
to compensate for truncations in the conversions between ticks and
ns).
This also removes a redundant call to set the decrementer to
0x7fffffff - it was already set to that earlier in timer_interrupt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
We had an historical confusion in the kernel between cache line
and cache block size. The former is an implementation detail of
the L1 cache which can be useful for performance optimisations,
the later is the actual size on which the cache control
instructions operate, which can be different.
For some reason, we had a weird hack reading the right property
on powermac and the wrong one on any other 64 bits (32 bits is
unaffected as it only uses the cputable for cache block size
infos at this stage).
This fixes the booting-without-of.txt documentation to mention
the right properties, and fixes the 64 bits initialization code
to look for the block size first, with a fallback to the line
size if the property is missing.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 44x family has an interesting "feature" which is a virtually
tagged instruction cache (yuck !). So far, we haven't dealt with
it properly, which means we've been mostly lucky or people didn't
report the problems, unless people have been running custom patches
in their distro...
This is an attempt at fixing it properly. I chose to do it by
setting a global flag whenever we change a PTE that was previously
marked executable, and flush the entire instruction cache upon
return to user space when that happens.
This is a bit heavy handed, but it's hard to do more fine grained
flushes as the icbi instruction, on those processor, for some very
strange reasons (since the cache is virtually mapped) still requires
a valid TLB entry for reading in the target address space, which
isn't something I want to deal with.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
On 4xx CPUs, the current implementation of flush_tlb_page() uses
a low level _tlbie() assembly function that only works for the
current PID. Thus, invalidations caused by, for example, a COW
fault triggered by get_user_pages() from a different context will
not work properly, causing among other things, gdb breakpoints
to fail.
This patch adds a "pid" argument to _tlbie() on 4xx processors,
and uses it to flush entries in the right context. FSL BookE
also gets the argument but it seems they don't need it (their
tlbivax form ignores the PID when invalidating according to the
document I have).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
PowerPC 440EP(x) 440GR(x) processors have the same PVR values, since
they have identical cores. However, FPU is not supported on GR(x) and
enabling APU instruction broadcast in the CCR0 register (to enable FPU)
may cause unpredictable results. There's no safe way to detect FPU
support at runtime. This patch provides a workarund for the issue.
We use a POWER6 "logical PVR approach". First, we identify all EP(x)
and GR(x) processors as GR(x) ones (which is safe). Then we check
the device tree cpu path. If we have a EP(x) processor entry,
we call identify_cpu again with PVR | 0x8. This bit is always 0
in the real PVR. This way we enable FPU only for 440EP(x).
Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
* Convert files to UTF-8.
* Also correct some people's names
(one example is Eißfeldt, which was found in a source file.
Given that the author used an ß at all in a source file
indicates that the real name has in fact a 'ß' and not an 'ss',
which is commonly used as a substitute for 'ß' when limited to
7bit.)
* Correct town names (Goettingen -> Göttingen)
* Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313)
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
This patch adapts the ppc64 code to use the generic parse_crashkernel()
function introduced in the generic patch of that series.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One of the easiest things to isolate is the pid printed in kernel log.
There was a patch, that made this for arch-independent code, this one makes
so for arch/xxx files.
It took some time to cross-compile it, but hopefully these are all the
printks in arch code.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
is_init() is an ambiguous name for the pid==1 check. Split it into
is_global_init() and is_container_init().
A cgroup init has it's tsk->pid == 1.
A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns. But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.
Changelog:
2.6.22-rc4-mm2-pidns1:
- Use 'init_pid_ns.child_reaper' to determine if a given task is the
global init (/sbin/init) process. This would improve performance
and remove dependence on the task_pid().
2.6.21-mm2-pidns2:
- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
ppc,avr32}/traps.c for the _exception() call to is_global_init().
This way, we kill only the cgroup if the cgroup's init has a
bug rather than force a kernel panic.
[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Herbert Poetzel <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds POWERPC specific hooks for scaled time accounting.
POWER6 includes a SPURR register. The SPURR is based off the PURR register
but is scaled based on CPU frequency and issue rates. This gives a more
accurate account of the instructions used per task. The PURR and timebase
will be constant relative to the wall clock, irrespective of the CPU
frequency.
This implementation reads the SPURR register in account_system_vtime which
is only call called on context witch and hard and soft irq entry and exit.
The percentage of user and system time is then estimated using the ratio of
these accounted by the PURR. If the SPURR is not present, the PURR read.
An earlier implementation of this patch read the SPURR whenever the PURR
was read, which included the system call entry and exit path.
Unfortunately this showed a performance regression on lmbench runs, so was
re-implemented.
I've included the lmbench results here when run bare metal on POWER6. 1st
column is the unpatch results. 2nd column is the results using the below
patch and the 3rd is the % diff of these results from the base. 4th and
5th columns are the results and % differnce from the base using the older
patch (SPURR read in syscall entry/exit path).
Base Scaled-Acct SPURR-in-syscall
Result Result % diff Result % diff
Simple syscall: 0.3086 0.3086 0.0000 0.3452 11.8600
Simple read: 0.4591 0.4671 1.7425 0.5044 9.86713
Simple write: 0.4364 0.4366 0.0458 0.4731 8.40971
Simple stat: 2.0055 2.0295 1.1967 2.0669 3.06158
Simple fstat: 0.5962 0.5876 -1.442 0.6368 6.80979
Simple open/close: 3.1283 3.1009 -0.875 3.2088 2.57328
Select on 10 fd's: 0.8554 0.8457 -1.133 0.8667 1.32101
Select on 100 fd's: 3.5292 3.6329 2.9383 3.6664 3.88756
Select on 250 fd's: 7.9097 8.1881 3.5197 8.2242 3.97613
Select on 500 fd's: 15.2659 15.836 3.7357 15.873 3.97814
Select on 10 tcp fd's: 0.9576 0.9416 -1.670 0.9752 1.83792
Select on 100 tcp fd's: 7.248 7.2254 -0.311 7.2685 0.28283
Select on 250 tcp fd's: 17.7742 17.707 -0.375 17.749 -0.1406
Select on 500 tcp fd's: 35.4258 35.25 -0.496 35.286 -0.3929
Signal handler installation: 0.6131 0.6075 -0.913 0.647 5.52927
Signal handler overhead: 2.0919 2.1078 0.7600 2.1831 4.35967
Protection fault: 0.7345 0.7478 1.8107 0.8031 9.33968
Pipe latency: 33.006 16.398 -50.31 33.475 1.42368
AF_UNIX sock stream latency: 14.5093 30.910 113.03 30.715 111.692
Process fork+exit: 219.8 222.8 1.3648 229.37 4.35623
Process fork+execve: 876.14 873.28 -0.32 868.66 -0.8533
Process fork+/bin/sh -c: 2830 2876.5 1.6431 2958 4.52296
File /var/tmp/XXX write bw: 1193497 1195536 0.1708 118657 -0.5799
Pagefaults on /var/tmp/XXX: 3.1272 3.2117 2.7020 3.2521 3.99398
Also, kernel compile times show no difference with this patch applied.
[pbadari@us.ibm.com: Avoid unnecessary PURR reading]
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (24 commits)
[POWERPC] Fix vmemmap warning in init_64.c
[POWERPC] Fix 64 bits vDSO DWARF info for CR register
[POWERPC] Add 1TB workaround for PA6T
[POWERPC] Enable NO_HZ and high res timers for pseries and ppc64 configs
[POWERPC] Quieten cache information at boot
[POWERPC] Quieten clockevent printk
[POWERPC] Enable SLUB in *_defconfig
[POWERPC] Fix 1TB segment detection
[POWERPC] Fix iSeries_hpte_insert prototype
[POWERPC] Fix copyright symbol
[POWERPC] ibmebus: Move to of_device and of_platform_driver, match eHCA and eHEA drivers
[POWERPC] ibmebus: Add device creation and bus probing based on of_device
[POWERPC] ibmebus: Remove bus match/probe/remove functions
[POWERPC] Move of_device allocation into of_device.[ch]
[POWERPC] mpc52xx: device tree changes for FEC and MDIO
[POWERPC] bestcomm: GenBD task support
[POWERPC] bestcomm: FEC task support
[POWERPC] bestcomm: ATA task support
[POWERPC] bestcomm: core bestcomm support for Freescale MPC5200
[POWERPC] mpc52xx: Update mpc52xx_psc structure with B revision changes
...
All asm/ipc.h files do only #include <asm-generic/ipc.h>.
This patch therefore removes all include/asm-*/ipc.h files and moves the
contents of include/asm-generic/ipc.h to include/linux/ipc.h.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes powerpc64's compat code use the new linux/elfcore-compat.h,
reducing some hand-copied duplication.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Slab constructors currently have a flags parameter that is never used. And
the order of the arguments is opposite to other slab functions. The object
pointer is placed before the kmem_cache pointer.
Convert
ctor(void *object, struct kmem_cache *s, unsigned long flags)
to
ctor(struct kmem_cache *s, void *object)
throughout the kernel
[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update dump_task_altivec() (which has so far never been put to use) so that
it dumps the Altivec/VMX registers (VR[0] - VR[31], VSCR and VRSAVE) in the
same format as the ptrace get_vrregs(), and add the appropriate glue
typedef and #defines to make it work.
A new note type of NT_PPC_VMX was chosen to be 0x100 (arbitrarily) because
it allows the low range values to be used for more generic purposes and
0x100 seems an adequate starting point for PowerPC extensions.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current DWARF info for CR are incorrect, causing the gcc unwinder to
go to lunch if we take a segfault in the vdso. This fixes it.
Problem identified by Andrew Haley, and fix provided by Jakub Jelinek
(thanks !).
Unfortunately, a bug in gcc cause it to not quite work either, but that
is being fixed separately with something around the lines of:
linux-unwind.h:
fs->regs.reg[R_CR2].loc.offset = (long) ®s->ccr - new_cfa;
+ /* CR? regs are just 32-bit and PPC is big-endian. */
+ fs->regs.reg[R_CR2].loc.offset += sizeof (long) - 4;
(According to Jakub)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
PA6T has a bug where the slbie instruction does not honor the large
segment bit. As a result, we have to always use slbia when switching
context.
We don't have to worry about changing the slbie's during fault processing,
since they should never be replacing one VSID with another using the
same ESID. I.e. there's no risk for inserting duplicate entries due to a
failed slbie of the old entry. So as long as we clear it out on context
switch we should be fine.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
After 6 years the ppc64 kernel still thinks its important to tell me my
cache line size is 0x80 bytes. I think most people who care know that by
now. The rest probably cant even understand the hex output.
Since we might have misconfigured firmware or cpus that have a linesize
that isnt 128 bytes, I still print it out for those cases. If people
would prefer to remove it completely, lets do it.
Also for lpar remove the htab_address printout since its not used.
Anton
ppc64 boot log usability expert
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The clockevent bootup message only needs to be KERN_INFO.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Replace struct ibmebus_dev and struct ibmebus_driver with struct of_device
and struct of_platform_driver, respectively. Match the external ibmebus
interface and drivers using it.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The devtree root is now searched for devices matching a built-in whitelist
during boot, so these devices appear on the bus from the beginning. It is
still possible to manually add/remove devices to/from the bus by using the
probe/remove sysfs interface. Also, when a device driver registers itself,
the devtree is matched against its matchlist.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove old code that will be replaced by rewritten and shorter functions in
the next patch. Keep struct ibmebus_dev and struct ibmebus_driver for now,
but replace ibmebus_{,un}register_driver() by dummy functions. This way, the
kernel will still compile and run during the transition and git bisect will
be happy.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Extract generic of_device allocation code from of_platform_device_create()
and move it into of_device.[ch], called of_device_alloc(). Also, there's now
of_device_free() which puts the device node.
This way, bus drivers that build on of_platform (like ibmebus will) can
build upon this code instead of reinventing the wheel.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This cleans up the formatting in the vDSO linker script, mostly just the
use of whitespace. It's intended to approximate the kernel standard
conventions for indenting C, treating elements of the linker script about
like initialized variable definitions.
Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This cleans up the formatting in the vDSO linker script, mostly just the
use of whitespace. It's intended to approximate the kernel standard
conventions for indenting C, treating elements of the linker script about
like initialized variable definitions.
Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce architecture dependent kretprobe blacklists to prohibit users
from inserting return probes on the function in which kprobes can be
inserted but kretprobes can not.
This patch also removes "__kprobes" mark from "__switch_to" on x86_64 and
registers "__switch_to" to the blacklist on x86-64, because that mark is to
prohibit user from inserting only kretprobe.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert cpu_sibling_map from a static array sized by NR_CPUS to a per_cpu
variable. This saves sizeof(cpumask_t) * NR unused cpus. Access is mostly
from startup and CPU HOTPLUG functions.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Identical handlers of PTRACE_DETACH go into ptrace_request().
Not touching compat code.
Not touching archs that don't call ptrace_request.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This updates the ppc iommu/pci dma mappers to sg chaining. Includes
further fixes from FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This changes the uevent buffer functions to use a struct instead of a
long list of parameters. It does no longer require the caller to do the
proper buffer termination and size accounting, which is currently wrong
in some places. It fixes a known bug where parts of the uevent
environment are overwritten because of wrong index calculations.
Many thanks to Mathieu Desnoyers for finding bugs and improving the
error handling.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (408 commits)
[POWERPC] Add memchr() to the bootwrapper
[POWERPC] Implement logging of unhandled signals
[POWERPC] Add legacy serial support for OPB with flattened device tree
[POWERPC] Use 1TB segments
[POWERPC] XilinxFB: Allow fixed framebuffer base address
[POWERPC] XilinxFB: Add support for custom screen resolution
[POWERPC] XilinxFB: Use pdata to pass around framebuffer parameters
[POWERPC] PCI: Add 64-bit physical address support to setup_indirect_pci
[POWERPC] 4xx: Kilauea defconfig file
[POWERPC] 4xx: Kilauea DTS
[POWERPC] 4xx: Add AMCC Kilauea eval board support to platforms/40x
[POWERPC] 4xx: Add AMCC 405EX support to cputable.c
[POWERPC] Adjust TASK_SIZE on ppc32 systems to 3GB that are capable
[POWERPC] Use PAGE_OFFSET to tell if an address is user/kernel in SW TLB handlers
[POWERPC] 85xx: Enable FP emulation in MPC8560 ADS defconfig
[POWERPC] 85xx: Killed <asm/mpc85xx.h>
[POWERPC] 85xx: Add cpm nodes for 8541/8555 CDS
[POWERPC] 85xx: Convert mpc8560ads to the new CPM binding.
[POWERPC] mpc8272ads: Remove muram from the CPM reg property.
[POWERPC] Make clockevents work on PPC601 processors
...
Fixed up conflict in Documentation/powerpc/booting-without-of.txt manually.
Implement show_unhandled_signals sysctl + support to print when a process
is killed due to unhandled signals just as i386 and x86_64 does.
Default to having it off, unlike x86 that defaults on.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently find_legacy_serial_ports() can find no serial ports on the
OPB with flattened device tree. Thus no legacy boot console can be
initialized. Just the early udbg console works, which is initialized
with udbg_init_44x_as1 on the UART's physical address specified in
kernel config. This happens because we look for ns16750 serial
devices only and expect opb node to have a device type property. This
patch makes it look for ns16550-compatible devices and use
of_device_is_compatible() for opb in case device type is not
specified.
Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This makes the kernel use 1TB segments for all kernel mappings and for
user addresses of 1TB and above, on machines which support them
(currently POWER5+, POWER6 and PA6T).
We detect that the machine supports 1TB segments by looking at the
ibm,processor-segment-sizes property in the device tree.
We don't currently use 1TB segments for user addresses < 1T, since
that would effectively prevent 32-bit processes from using huge pages
unless we also had a way to revert to using 256MB segments. That
would be possible but would involve extra complications (such as
keeping track of which segment size was used when HPTEs were inserted)
and is not addressed here.
Parts of this patch were originally written by Ben Herrenschmidt.
Signed-off-by: Paul Mackerras <paulus@samba.org>