print_dump_header() will be called at least once with a NULL pointer in
a normal boot sequence. If DEBUG is defined then we will dereference
the pointer and crash. Add a quick fix to exit early in the NULL pointer
case.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Manish Ahuja <mahujam@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch fixes a long-standing performance bug in classic RCU that
results in massive internal-to-RCU lock contention on systems with
more than a few hundred CPUs. Although this patch creates a separate
flavor of RCU for ease of review and patch maintenance, it is intended
to replace classic RCU.
This patch still handles stress better than does mainline, so I am still
calling it ready for inclusion. This patch is against the -tip tree.
Nevertheless, experience on an actual 1000+ CPU machine would still be
most welcome.
Most of the changes noted below were found while creating an rcutiny
(which should permit ejecting the current rcuclassic) and while doing
detailed line-by-line documentation.
Updates from v9 (http://lkml.org/lkml/2008/12/2/334):
o Fixes from remainder of line-by-line code walkthrough,
including comment spelling, initialization, undesirable
narrowing due to type conversion, removing redundant memory
barriers, removing redundant local-variable initialization,
and removing redundant local variables.
I do not believe that any of these fixes address the CPU-hotplug
issues that Andi Kleen was seeing, but please do give it a whirl
in case the machine is smarter than I am.
A writeup from the walkthrough may be found at the following
URL, in case you are suffering from terminal insomnia or
masochism:
http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf
o Made rcutree tracing use seq_file, as suggested some time
ago by Lai Jiangshan.
o Added a .csv variant of the rcudata debugfs trace file, to allow
people having thousands of CPUs to drop the data into
a spreadsheet. Tested with oocalc and gnumeric. Updated
documentation to suit.
Updates from v8 (http://lkml.org/lkml/2008/11/15/139):
o Fix a theoretical race between grace-period initialization and
force_quiescent_state() that could occur if more than three
jiffies were required to carry out the grace-period
initialization. Which it might, if you had enough CPUs.
o Apply Ingo's printk-standardization patch.
o Substitute local variables for repeated accesses to global
variables.
o Fix comment misspellings and redundant (but harmless) increments
of ->n_rcu_pending (this latter after having explicitly added it).
o Apply checkpatch fixes.
Updates from v7 (http://lkml.org/lkml/2008/10/10/291):
o Fixed a number of problems noted by Gautham Shenoy, including
the cpu-stall-detection bug that he was having difficulty
convincing me was real. ;-)
o Changed cpu-stall detection to wait for ten seconds rather than
three in order to reduce false positive, as suggested by Ingo
Molnar.
o Produced a design document (http://lwn.net/Articles/305782/).
The act of writing this document uncovered a number of both
theoretical and "here and now" bugs as noted below.
o Fix dynticks_nesting accounting confusion, simplify WARN_ON()
condition, fix kerneldoc comments, and add memory barriers
in dynticks interface functions.
o Add more data to tracing.
o Remove unused "rcu_barrier" field from rcu_data structure.
o Count calls to rcu_pending() from scheduling-clock interrupt
to use as a surrogate timebase should jiffies stop counting.
o Fix a theoretical race between force_quiescent_state() and
grace-period initialization. Yes, initialization does have to
go on for some jiffies for this race to occur, but given enough
CPUs...
Updates from v6 (http://lkml.org/lkml/2008/9/23/448):
o Fix a number of checkpatch.pl complaints.
o Apply review comments from Ingo Molnar and Lai Jiangshan
on the stall-detection code.
o Fix several bugs in !CONFIG_SMP builds.
o Fix a misspelled config-parameter name so that RCU now announces
at boot time if stall detection is configured.
o Run tests on numerous combinations of configurations parameters,
which after the fixes above, now build and run correctly.
Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):
o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
changeset some time ago, and finally got around to retesting
this option).
o Fix some tracing bugs in rcupreempt that caused incorrect
totals to be printed.
o I now test with a more brutal random-selection online/offline
script (attached). Probably more brutal than it needs to be
on the people reading it as well, but so it goes.
o A number of optimizations and usability improvements:
o Make rcu_pending() ignore the grace-period timeout when
there is no grace period in progress.
o Make force_quiescent_state() avoid going for a global
lock in the case where there is no grace period in
progress.
o Rearrange struct fields to improve struct layout.
o Make call_rcu() initiate a grace period if RCU was
idle, rather than waiting for the next scheduling
clock interrupt.
o Invoke rcu_irq_enter() and rcu_irq_exit() only when
idle, as suggested by Andi Kleen. I still don't
completely trust this change, and might back it out.
o Make CONFIG_RCU_TRACE be the single config variable
manipulated for all forms of RCU, instead of the prior
confusion.
o Document tracing files and formats for both rcupreempt
and rcutree.
Updates from v4 for those missing v5 given its bad subject line:
o Separated dynticks interface so that NMIs and irqs call separate
functions, greatly simplifying it. In particular, this code
no longer requires a proof of correctness. ;-)
o Separated dynticks state out into its own per-CPU structure,
avoiding the duplicated accounting.
o The case where a dynticks-idle CPU runs an irq handler that
invokes call_rcu() is now correctly handled, forcing that CPU
out of dynticks-idle mode.
o Review comments have been applied (thank you all!!!).
For but one example, fixed the dynticks-ordering issue that
Manfred pointed out, saving me much debugging. ;-)
o Adjusted rcuclassic and rcupreempt to handle dynticks changes.
Attached is an updated patch to Classic RCU that applies a hierarchy,
greatly reducing the contention on the top-level lock for large machines.
This passes 10-hour concurrent rcutorture and online-offline testing on
128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
bugs in presence of dynticks (exciting working on a system where
"sleep 1" hangs until interrupted...), which were fixed in the
2.6.27 kernel. It is getting more reliable than mainline by some
measures, so the next version will be against -tip for inclusion.
See also Manfred Spraul's recent patches (or his earlier work from
2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
We will converge onto a common patch in the fullness of time, but are
currently exploring different regions of the design space. That said,
I have already gratefully stolen quite a few of Manfred's ideas.
This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on
64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
there is no hierarchy. By default, the RCU initialization code will
adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
this balancing, allowing the hierarchy to be exactly aligned to the
underlying hardware. Up to two levels of hierarchy are permitted
(in addition to the root node), allowing up to 16,384 CPUs on 32-bit
systems and up to 262,144 CPUs on 64-bit systems. I just know that I
am going to regret saying this, but this seems more than sufficient
for the foreseeable future. (Some architectures might wish to set
CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
If this becomes a real problem, additional levels can be added, but I
doubt that it will make a significant difference on real hardware.)
In the common case, a given CPU will manipulate its private rcu_data
structure and the rcu_node structure that it shares with its immediate
neighbors. This can reduce both lock and memory contention by multiple
orders of magnitude, which should eliminate the need for the strange
manipulations that are reported to be required when running Linux on
very large systems.
Some shortcomings:
o More bugs will probably surface as a result of an ongoing
line-by-line code inspection.
Patches will be provided as required.
o There are probably hangs, rcutorture failures, &c. Seems
quite stable on a 128-CPU machine, but that is kind of small
compared to 4096 CPUs. However, seems to do better than
mainline.
Patches will be provided as required.
o The memory footprint of this version is several KB larger
than rcuclassic.
A separate UP-only rcutiny patch will be provided, which will
reduce the memory footprint significantly, even compared
to the old rcuclassic. One such patch passes light testing,
and has a memory footprint smaller even than rcuclassic.
Initial reaction from various embedded guys was "it is not
worth it", so am putting it aside.
Credits:
o Manfred Spraul for ideas, review comments, and bugs spotted,
as well as some good friendly competition. ;-)
o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
for reviews and comments.
o Thomas Gleixner for much-needed help with some timer issues
(see patches below).
o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
alive despite my heavy abuse^Wtesting.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently there are a number of platforms that open code access to
the ppc_pci_flags global variable. However, that variable is not
present if CONFIG_PCI is not set, which can lead to a build break.
This introduces a number of accessor functions that are defined
to be empty in the case of CONFIG_PCI being disabled. The
various platform files in the kernel are updated to use these.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since "Factor out cpu joining/unjoining the GIQ"
(b4963255ad) the WARN_ON in
xics_set_cpu_giq() is being triggered during boot on JS20 because the
GIQ indicator is not available on that platform. While the warning is
harmless and the system runs normally, it's nicer to check for the
existence of the indicator before trying to manipulate it.
Implement rtas_indicator_present(), which searches the
/rtas/rtas-indicators property for the given indicator token, and use
this function in xics_set_cpu_giq().
Also use a WARN statement in xics_set_cpu_giq to get better
information on failure.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The hard_smp_processor_id functions are the appropriate interfaces for
managing physical CPU ids.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
commit 059e4938f8 ("powerpc/ps3: Add a sub-match
id to ps3_system_bus") forgot to update the module alias support:
- Add the sub-match ids to the module aliases, so udev can distinguish
between different types of sub-devices.
- Rename PS3_MODULE_ALIAS_GRAPHICS to PS3_MODULE_ALIAS_GPU_FB, as ps3fb
binds to the "FB" sub-device.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Change the debug message in dma_sb_region_create() from
pr_info() to DBG() to quiet the dmesg output.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
of_node_put is needed before discarding a value received from
of_find_node_by_name, eg in error handling code or when the device
node is no longer used.
The semantic match that catches the bug is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r exists@
local idexpression struct device_node *n;
position p1, p2;
statement S1,S2;
expression E,E1;
expression *ptr != NULL;
@@
(
if (!(n@p1 = of_find_node_by_name(...))) S1
|
n@p1 = of_find_node_by_name(...)
)
<... when != of_node_put(n)
when != if (...) { <+... of_node_put(n) ...+> }
when != true !n || ...
when != n = E
when != E = n
if (!n || ...) S2
...>
(
return \(0\|<+...n...+>\|ptr\);
|
return@p2 ...;
|
n = E1
|
E1 = n
)
@script:python@
p1 << r.p1;
p2 << r.p2;
@@
print "* file: %s of_find_node_by_name %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>
Signed-off-by: Nicolas Palix <npalix@diku.dk>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit d015fe995 'powerpc/cell/axon-msi: Retry on missing interrupt'
has turned a rare failure to kexec on QS22 into a reproducible
error, which we have now analysed.
The problem is that after a kexec, the MSIC hardware still points
into the middle of the old ring buffer. We set up the ring buffer
during reboot, but not the offset into it. On older kernels, this
would cause a storm of thousands of spurious interrupts after a
kexec, which would most of the time get dropped silently.
With the new code, we time out on each interrupt, waiting for
it to become valid. If more interrupts come in that we time
out on, this goes on indefinitely, which eventually leads to
a hard crash.
The solution in this commit is to read the current offset from
the MSIC when reinitializing it. This now works correctly, as
expected.
Reported-by: Dirk Herrendoerfer <d.herrendoerfer@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Impact: change existing irq_chip API
Not much point with gentle transition here: the struct irq_chip's
setaffinity method signature needs to change.
Fortunately, not widely used code, but hits a few architectures.
Note: In irq_select_affinity() I save a temporary in by mangling
irq_desc[irq].affinity directly. Ingo, does this break anything?
(Folded in fix from KOSAKI Motohiro)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: jeremy@xensource.com
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Impact: change calling convention of existing cpumask APIs
Most cpumask functions started with cpus_: these have been replaced by
cpumask_ ones which take struct cpumask pointers as expected.
These four functions don't have good replacement names; fortunately
they're rarely used, so we just change them over.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: mingo@redhat.com
Cc: tony.luck@intel.com
Cc: ralf@linux-mips.org
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: cl@linux-foundation.org
Cc: srostedt@redhat.com
Conflicts:
fs/nfsd/nfs4recover.c
Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().
Signed-off-by: James Morris <jmorris@namei.org>
The flag MPIC_WANTS_RESET shouldn't be set if we are doing cooperative
asymmetric MP. The second linux shouldn't reset the pic or the first
one gets very confused.
Signed-off-by: Haiying Wang <Haiying.Wang@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Basic support for the GPIO available on the SBC610 VPX Single Board Computer
from GE Fanuc (PowerPC MPC8641D).
This patch adds basic support for the GPIO in the devices I/O FPGA, the GPIO
functionality is exposed through the AFIX pins on the backplane, unless used
by an AFIX card.
This code currently does not support switching between totem-pole and
open-drain outputs (when used as outputs, GPIOs default to totem-pole).
The interrupt capabilites of the GPIO lines is also not currently supported.
Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
arch/powerpc/platforms/85xx/mpc85xx_mds.c: In function 'board_fixups':
arch/powerpc/platforms/85xx/mpc85xx_mds.c:244: warning: format '%x' expects type 'unsigned int', but argument 4 has type 'resource_size_t'
arch/powerpc/platforms/85xx/mpc85xx_mds.c:250: warning: format '%x' expects type 'unsigned int', but argument 4 has type 'resource_size_t'
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Added 85xx specifc smp_ops structure. We use ePAPR style boot release
and the MPIC for IPIs at this point.
Additionally added routines for secondary cpu entry and initializtion.
Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
An earlier patch from Jens Osterkamp attempted to fix GDB
watchpoints by enabling the DABRX register at boot time.
Unfortunately, this did not work on SMP setups, where
secondary CPUs were still using the power-on DABRX value.
This introduces the same change for secondary CPUs on cell
as well.
Reported-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Tested-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The MSI capture logic for the axon bridge can sometimes
lose interrupts in case of high DMA and interrupt load,
when it signals an MSI interrupt to the MPIC interrupt
controller while we are already handling another MSI.
Each MSI vector gets written into a FIFO buffer in main
memory using DMA, and that DMA access is normally flushed
by the actual interrupt packet on the IOIF. An MMIO
register in the MSIC holds the position of the last
entry in the FIFO buffer that was written. However,
reading that position does not flush the DMA, so that
we can observe stale data in the buffer.
In a stress test, we have observed the DMA to arrive
up to 14 microseconds after reading the register.
This patch works around this problem by retrying the
access to the FIFO buffer.
We can reliably detect the conditioning by writing
an invalid MSI vector into the FIFO buffer after
reading from it, assuming that all MSIs we get
are valid. After detecting an invalid MSI vector,
we udelay(1) in the interrupt cascade for up to
100 times before giving up.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, we can end up in an infinite loop if we get a signal
while the kernel has faulted in spufs_ps_fault. Eg:
alarm(1);
write(fd, some_spu_psmap_register_address, 4);
- the write's copy_from_user will fault on the ps mapping, and
signal_pending will be non-zero. Because returning from the fault
handler will never clear TIF_SIGPENDING, so we'll just keep faulting,
resulting in an unkillable process using 100% of CPU.
This change returns VM_FAULT_SIGBUS if there's a fatal signal pending,
letting us escape the loop.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Introduce ps3_gpu_mutex to synchronizes GPU-related operations, like:
- invoking the L1GPU_CONTEXT_ATTRIBUTE_FB_BLIT command using the
lv1_gpu_context_attribute() hypervisor call,
- handling the PS3AV_CID_AVB_PARAM packet in the PS3 A/V Settings driver.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Conflicts:
security/keys/internal.h
security/keys/process_keys.c
security/keys/request_key.c
Fixed conflicts above by using the non 'tsk' versions.
Signed-off-by: James Morris <jmorris@namei.org>
Pass credentials through dentry_open() so that the COW creds patch can have
SELinux's flush_unauthorized_files() pass the appropriate creds back to itself
when it opens its null chardev.
The security_dentry_open() call also now takes a creds pointer, as does the
dentry_open hook in struct security_operations.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: James Morris <jmorris@namei.org>
Free dynamically allocated device data structures when device registration
fails. This fixes memory leakage when the registration fails.
Signed-off-by: Masakazu Mokuno <mokuno@sm.sony.co.jp>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The pseries PCI hotplug code has a number of issues, ranging from
incorrect resource setup to crashes, depending on what is added,
when, whether it contains a bridge, etc etc....
This fixes a whole bunch of these, while actually simplifying the code
a bit, using more generic code in the process and factoring out common
code between adding of a PHB, a slot or a device.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
To properly fix PCI hotplug, it's useful to be able to make the fixup
passes on all devices whether they were just hot plugged or already
there.
The EEH code however used to not be very friendly with calling
eeh_add_device_late() multiple time, and not very rebust in the way it
generally tests whether a device is in the expected state vs. the EEH
code.
This improves it, along with cleaning up a couple of debug printk's.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 'ibm,interrupt-server#-size' properties are not in the cpu nodes,
which is where we currently look for them, but rather live under the
interrupt source controller nodes (which have "ibm,ppc-xics" in their
compatible property).
This moves the code that looks for the ibm,interrupt-server#-size
properties from xics_update_irq_servers() into xics_init_IRQ().
Also this adds a check for mismatched sizes across the interrupt
source controller nodes. Not sure this is necessary as in this case
the firmware might be seriously busted.
This property only appears on POWER6 boxes and is only used in the
set-indicator(gqirm) call, and apparently firmware currently ignores
the value we pass. Nevertheless we need to fix it in case future
firmware versions use it.
Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This gets rid of this build warning:
arch/powerpc/platforms/pseries/pci_dlpar.c: In function 'init_phb_dynamic':
arch/powerpc/platforms/pseries/pci_dlpar.c:192: warning: unused variable 'b'
This is one of the very few warnings left in a ppc64_defconfig build and
getting rid of it will make it easier to see future introduced ones (in
fact this was introduced very recently).
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes this error on Cell when CONFIG_KEXEC = n:
arch/powerpc/platforms/cell/ras.c:299: error: implicit declaration of function 'crash_shutdown_register'
We have to include <asm/kexec.h> because it contains the dummy
definition of crash_shutdown_register that is used when
CONFIG_KEXEC=n, but <linux/kexec.h> doesn't include <asm/kexec.h> in
that case.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (23 commits)
Revert "powerpc: Sync RPA note in zImage with kernel's RPA note"
powerpc: Fix compile errors with CONFIG_BUG=n
powerpc: Fix format string warning in arch/powerpc/boot/main.c
powerpc: Fix bug in kernel copy of libfdt's fdt_subnode_offset_namelen()
powerpc: Remove duplicate DMA entry from mpc8313erdb device tree
powerpc/cell/OProfile: Fix on-stack array size in activate spu profiling function
powerpc/mpic: Fix regression caused by change of default IRQ affinity
powerpc: Update remaining dma_mapping_ops to use map/unmap_page
powerpc/pci: Fix unmapping of IO space on 64-bit
powerpc/pci: Properly allocate bus resources for hotplug PHBs
OF-device: Don't overwrite numa_node in device registration
powerpc: Fix swapcontext system for VSX + old ucontext size
powerpc: Fix compiler warning for the relocatable kernel
powerpc: Work around ld bug in older binutils
powerpc/ppc64/kdump: Better flag for running relocatable
powerpc: Use is_kdump_kernel()
powerpc: Kexec exit should not use magic numbers
powerpc/44x: Update 44x defconfigs
powerpc/40x: Update 40x defconfigs
powerpc: enable heap randomization for linkstations
...
The Freescale implementation of MPIC only allows a single CPU destination
for non-IPI interrupts. We add a flag to the mpic_init to distinquish
these variants of MPIC. We pull in the irq_choose_cpu from sparc64 to
select a single CPU as the destination of the interrupt.
This is to deal with the fact that the default smp affinity was
changed by commit 1840475676 ("genirq:
Expose default irq affinity mask (take 3)") to be all CPUs.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
After the merge of the 32 and 64bit DMA code, dma_direct_ops lost
their map/unmap_single() functions but gained map/unmap_page(). This
caused a problem for Cell because Cell's dma_iommu_fixed_ops called
the dma_direct_ops if the fixed linear mapping was to be used or the
iommu ops if the dynamic window was to be used. So in order to fix
this problem we need to update the 64bit DMA code to use
map/unmap_page.
First, we update the generic IOMMU code so that iommu_map_single()
becomes iommu_map_page() and iommu_unmap_single() becomes
iommu_unmap_page(). Then we propagate these changes up through all
the callers of these two functions and in the process update all the
dma_mapping_ops so that they have map/unmap_page rahter than
map/unmap_single. We can do this because on 64bit there is no HIGHMEM
memory so map/unmap_page ends up performing exactly the same function
as map/unmap_single, just taking different arguments.
This has no affect on drivers because the dma_map_single_attrs() just
ends up calling the map_page() function of the appropriate
dma_mapping_ops and similarly the dma_unmap_single_attrs() calls
unmap_page().
This fixes an oops on Cell blades, which oops on boot without this
because they call dma_direct_ops.map_single, which is NULL.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Resources for PHB's that are dynamically added to a system are not
properly allocated in the resource tree.
Not having these resources allocated causes an oops when removing
the PHB when we try to release them.
The diff appears a bit messy, this is mainly due to moving everything
one tab to the left in the pcibios_allocate_bus_resources routine.
The functionality change in this routine is only that the
list_for_each_entry() loop is pulled out and moved to the necessary
calling routine.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
linux/crash_dump.h defines is_kdump_kernel() to be used by code that
needs to know if the previous kernel crashed instead of a (clean) boot
or reboot.
This updates the just added powerpc code to use it. This is needed
for the next commit, which will remove __kdump_flag.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The i2c bus defn is broken on linkstation / kurobox machines since at
least 2.6.27. Fix it. Also remove CONFIG_SERIAL_OF_PLATFORM, which, if
enabled, breaks the serial console after the
"console handover: boot [udbg0] -> real [ttyS1]" message.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Fix the HCU4 Kconfig option to 'default n'. We don't want the
board to always be enabled for other board defconfigs.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This adds relocatable kernel support for kdump. With this one can
use the same regular kernel to capture the kdump. A signature (0xfeed1234)
is passed in r6 from panic code to the next kernel through kexec_sequence
and purgatory code. The signature is used to differentiate between
kdump kernel and non-kdump kernels.
The purgatory code compares the signature and sets the __kdump_flag in
head_64.S. During the boot up, kernel code checks __kdump_flag and if it
is set, the kernel will behave as relocatable kdump kernel. This kernel
will boot at the address where it was loaded by kexec-tools ie. at the
address reserved through crashkernel boot parameter.
CONFIG_CRASH_DUMP depends on CONFIG_RELOCATABLE option to build kdump
kernel as relocatable. So the same kernel can be used as production and
kdump kernel.
This patch incorporates the changes suggested by Paul Mackerras to avoid
GOT use and to avoid two copies of the code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Mohan Kumar M <mohan@in.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Most of the platforms were printing the size of the memory
in their show_cpuinfo implementations. This moves that to
the common show_cpuinfo, so that all 32-bit platforms will
now print the size of memory. I also update the code
to deal with the fact that total_memory is now a phys_addr_t.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We used to assume that even numbered threads were the primary
threads, ie those that would be listed and started as a cpu from
open firmware. Replace a left over is even (% 2) check with a check
for it being a primary thread and update the comments.
Tested with a debug print on pseries, identical code found for cell.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The pfn of the memory to be removed should be validated prior to
attempting to remove the memory. In cases where the probe of a
memory section fails during hotplug add, the pfn for the lmb may
not be valid.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds a comment to clarify why atomic_dec_if_positive is being used
to decrement gang's aff_sched_count on SPU context unbind.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
This patch improves redability of the code responsible for trying to find
a node with enough SPUs not committed to other affinity gangs.
An additional check is also added, to avoid taking into account gangs that
have no SPU affinity.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
With most file readers (eg cat, dd), reading a context's regs file will
result in two reads: the first to read the data, and the second to
return EOF. Because each read performs a spu_acquire_saved, we end up
descheduling and re-scheduling the context twice.
This change does a simple check to see if we'd return EOF before
calling spu_acquire_saved(), saving the extra schedule operation.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, read() on the sputrace log will block until the read buffer
is full. This makes it difficult to retrieve the end of the buffer, as
the user will need to read with the right-sized buffer.
In a similar method as 91553a1b5e0df006a3573a88d98ee7cd48a3818a, this
change makes the switch_log return if there has already been data
read.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we use ctx->mapping_lock and ctx->switch_log->lock for the
context switch log. The mapping lock only prevents concurrent open()s,
so we require the switch_lock->lock for reads.
Since writes to the switch log buffer occur on context switches, we're
better off synchronising with the state_mutex, which is held during a
switch. Since we're serialised througout the buffer reads and writes,
we can use the state mutex to protect open and release too, and
can now kfree() the log buffer on release. This allows us to perform
the switch log notify without taking any extra locks.
Because the buffer is only present while the file is open, we can use
it to prevent multiple simultaneous openers.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, read() on the sputrace buffer will only return data when
the user buffer is exhausted. This may mean that we never see the
end of the event log, unless we read() with exactly the right-sized
buffer.
This change makes sputrace_read not block if we have data ready to
return.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, sputrace will start logging to the event buffer before the
log buffer has been open()ed. This results in a heap of "lost samples"
warnings if the sputrace file hasn't yet been opened.
Since the buffer is reset on open() anyway, there's no need to enable
logging when no-one has opened the log.
Because open clears the log, make it return EBUSY for mutliple open
calls.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Due to confusion between the ftrace infrastructure and the gcc profiling
tracer "ftrace", this patch renames the config options from FTRACE to
FUNCTION_TRACER. The other two names that are offspring from FTRACE
DYNAMIC_FTRACE and FTRACE_MCOUNT_RECORD will stay the same.
This patch was generated mostly by script, and partially by hand.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds support for the GPIO functions of PPC40x and PPC44x
SOCs.
Signed-off-by: Steve Falco <sfalco@harris.com>
Acked-by: Stefan Roese <sr@denx.de>
Acked-by: Sean MacLennan <smaclennan@pikatech.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Adds support for a HCU4 PPC405GPr based board from Netstal Maschinen AG.
Signed-off-by: Niklaus Giger <niklaus.giger@member.fsf.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This adds a common board file for almost all of the "simple" PowerPC 40x
boards that exist today. This is intended to be a single place to add
support for boards that do not differ in platform support from most of the
evaluation boards that are used as reference platforms. Boards that have
specific requirements or custom hardware setup should still have their own
board.c file.
The first board ported to this is the AMCC PowerPC 405EZ Acadia board.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
MPC5200 needs to have pipelining disabled for ATA to work. MPC5200B does not.
So, for the latter, don't touch the original setting from the bootloader.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Recently, indirect_pci was changed to test if the bus number requested
is the one hanging straight off the PHB, then it substitutes the bus
number with another one contained in a new "self_busno" field of the
pci_controller structure.
However, this breaks CHRP which didn't initialize this new field, and
which relies on having the right bus number passed to the hardware.
This fixes it by initializing this variable properly for all CHRP bridges
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The detection of the IBM "Python" PCI host bridge on IBM CHRP
machines such as old RS6000 was broken when we changed
of_device_is_compatible() from strncasecmp to strcasecmp (dropped
the "n" variant) due to the way IBM encodes the chip version.
We fix that by instead doing a match on the model property like
we do for others bridges in that file. It should be good enough
for those machines. If yours is still broken, let me know.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need a marker_synchronize_unregister() before the end of exit() to make sure
every probe callers have exited the non preemptible section and thus are not
executing the probe code anymore.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.
This was posted for review some time ago and I believe its been in -mm
since then.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Support for the SBC610 VPX Single Board Computer from GE Fanuc (PowerPC
MPC8641D).
This patch adds support for the registers held in the devices main FPGA,
exposing extra information about the revision of the board through cpuinfo.
Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Early versions of the Freescale DIU framebuffer driver depended on a bootmem
allocation of memory for the video buffer. The need for this feature was
removed in commit 6b51d51a, so now we can remove the platform-specific code
that allocated that memory.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Uses mpc83xx_add_bridge in fsl_pci.c
Adds second register tuple to pci node register property
as done for 83xx device trees in a previous patch.
Signed-off-by: John Rigby <jrigby@freescale.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
In the standalone setup the board's CPLD disables the PCI internal
arbiter, thus any access to the PCI bus will hang the board.
The common way to disable particular devices in the device tree is to
put the "status" property with any value other than "ok" or "okay"
into the device node we want to disable.
So, when there is no PCI arbiter on the bus the u-boot adds status =
"broken (no arbiter)" property into the PCI controller's node, and so
marks the PCI controller as unavailable.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Specifying user-selectable option in the qe_lib/Kconfig was a bad idea
because the qe_lib/Kconfig is included into the top level Kconfig, and
thus the QE_GPIO option appears at the top level menu.
This patch effectively moves the QE_GPIO option under the platform menu
instead.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Modify the Kconfig so that Freescale QUICC Engine (QE) support is a selectable
option, thereby allowing users to compile kernels without any QE support.
The drawback is that QE support is now disabled by default on platforms that
have a QE, and so a defconfig is needed to enable QE and QE devices (like
UCC GETH). Fortunately, all the current relevant defconfigs do that already.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Support for the SBC610 VPX Single Board Computer from GE Fanuc (PowerPC MPC8641D).
A number of MPC8641D based route interrupts for on-board interrupts through
a FPGA based interrupt controller, which is chained with the
MPC8641D's mpic. This patch provides a basic driver to allow basic routing
of interrupts to the mpic.
Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
A single full sync (mb()) is requrired to order the mmio to the qirr reg
with the set or clear of the message word. However, test_and_clear_bit
has the effect of smp_mb() and we are not doing any other io from here,
so we don't need a mb per bit processed.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Several printks were broken at word boundaries for line length. Some
even referred to old function names. Using __func__ and changing the
text slightly for the format allows these printk formats to fit on one
line.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It is physically per-cpu, and we want the irq layer to treat it that way.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
EOI normally has the side effect of returning the cpu to the base
priority to recieve the next interrupt. This is actually controlled
by the top byte of the xirr register. When we are exiting the
kernel in kexec we must eoi the ipi for the next kernel because we
never return from the handler, but we want to leave interrupt
delivery blocked until the next kernel takes action.
Since the hardware ipi vector is fixed, its easiest to just do the
eoi explicitly.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This factors out processors joining and unjoining the Global Interrupt
Queue into a separate function.
There is a bit of math to calculate the arguments to rtas to join
or leave the global interrupt queue, and a warning on failure
afterwards. Make a helper for the 3 callers.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We only need to check the ibm,interrupt-server#-size property once, not
once per global server and thread.
We can use !CONFIG_SMP cpu masks and hard_smp_processor_id() to avoid an ifdef.
Put the node when breaking out of the loop on lpar systems.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Trim unneeded includes from xics.c. We don't use signals or gfp
flags, we use only OF functions and don't need prom, and the 8259
is now handled by our caller.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The xirr is 32 bits in hardware, but the hypervisor requries the upper
bits of the register to be clear on the hcall. By changing the type
from signed to unsigned int we can drop masking it back to 32 bits.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that xics_update_irq_servers is called only from init and hotplug
code, it becomes possible to clean up the ordering of functions in the
file, grouping them but the interfaces they implement.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
xics supports only one ipi per cpu, and expects software to use some
queue to know why the interrupt was sent. In Linux, we use a an array
of bitmaps indexed by cpu to identify the message. Currently the bits
are set in smp.c and decoded in xics.c, with the data structure in a
header file. Consolidate the code in xics.c similar to mpic and other
interrupt controllers.
Also, while making the the array static, the message word doesn't need
to be volatile as set_bit and test_clear_bit take care of it for us, and
put it under ifdef smp.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, every time we determine which irq server to use, we check if
default_server, which is the id of the bootcpu, is still online. But
default_server is a hardware cpu, not the logical cpu id needed to index
cpu_online_map.
Since the default server can only go offline during a cpu hotplug event,
explicitly check the default server and choose the new one when we move
irqs away from the cpu being offlined.
This has the added benefit of only needing the boot_cpuid to be updated
and not relying on the cpu being marked offline during migrate_irqs_away.
Also, since xics_update_irq_servers only reads device tree information, we
can call it before xics_init_host in xics_init_IRQ and then default_server
will always be valid when we can reach get_irq_server via the host ops.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When reciving an irq vector that does not have a linux mapping, the kernel
prints a message and calls RTAS to disable the irq source. Previously
the kernel did not EOI the interrupt, causing the source to think it is
still being processed by software. While this does add an additional
layer of protection against interrupt storms had RTAS failed to disable
the source, it also prevents the interrupt from working when a driver
later enables it. (We could alternatively send an EOI on startup, but
that strategy would likely fail on an emulated xics.)
All interrupts should be disabled when the kernel starts, but this can
be observed if a driver does not shutdown an interrupt in its reboot
hook before starting a new kernel with kexec.
Michael reports this can be reproduced trivially by banging the keyboard
while kexec'ing on a P5 LPAR: even though the hvc_console driver request's
the console irq later in boot, the console is non-functional because
we're receiving no console interrupts.
Reported-By: Michael Ellerman
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Testing hotplug memory remove has revealed that we can oops in
pseries_lmb_remove(). The incorrect shift causes a NULL pointer
dereference in the page_zone() inline routine.
I have only been able to reproduce the oops on kernels with large pages
enabled.
Tested on Power5 and Power6 with and without large pages enabled.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A mutex_unlock(&gang->aff_mutex) in spufs_create_context() is missing
in case spufs_context_open() fails. As a result, spu_create syscall
and spu_get_idle() may block.
This patch adds the mutex_unlock.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Andre Detsch <adetsch@br.ibm.com>