Commit Graph

506 Commits

Author SHA1 Message Date
Kumar Gala
2eb4afb69f powerpc/pci: Move pseries code into pseries platform specific area
There doesn't appear to be any specific reason that we need to setup the
pseries specific notifier in generic arch pci code.  Move it into pseries
land.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:44:24 +10:00
Robert Jennings
14f966e794 powerpc/pseries: CMO unused page hinting
Adds support for the "unused" page hint which can be used in shared
memory partitions to flag pages not in use, which will then be stolen
before active pages by the hypervisor when memory needs to be moved to
LPARs in need of additional memory.  Failure to mark pages as 'unused'
makes the LPAR slower to give up unused memory to other partitions.

This adds the kernel parameter 'cmo_free_hint' to disable this
functionality.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:43:58 +10:00
Yinghai Lu
d5dedd4507 irq: change ->set_affinity() to return status
according to Ingo, change set_affinity() in irq_chip should return int,
because that way we can handle failure cases in a much cleaner way, in
the genirq layer.

v2: fix two typos

[ Impact: extend API ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-arch@vger.kernel.org
LKML-Reference: <49F654E9.4070809@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-28 12:21:16 +02:00
Sachin Sant
b71a0c296c powerpc: pseries/dtl.c should include asm/firmware.h
A randconfig build on powerpc failed with:

dtl.c: In function 'dtl_init':
dtl.c:238: error: implicit declaration of function 'firmware_has_feature'
dtl.c:238: error: 'FW_FEATURE_SPLPAR' undeclared (first use in this function)

- We need firmware.h for these definitions.

Signed-off-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-15 15:23:55 +10:00
Mike Mason
c58dc575f3 powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset()
While adding native EEH support to Emulex and Qlogic drivers, it was
discovered that dev->error_state was set to pci_io_channel_normal too
late in the recovery process. These drivers rely on error_state to
determine if they can access the device in their slot_reset callback,
thus error_state needs to be set to pci_io_channel_normal in
eeh_report_reset(). Below is a detailed explanation (courtesy of Richard
Lary) as to why this is necessary.

Background:
PCI MMIO or DMA accesses to a frozen slot generate additional EEH
errors. If the number of additional EEH errors exceeds EEH_MAX_FAILS the
adapter will be shutdown. To avoid triggering excessive EEH errors and
an undesirable adapter shutdown, some drivers use the
pci_channel_offline(dev) wrapper function to return a Boolean value
based on the value of pci_dev->error_state to determine if PCI MMIO or
DMA accesses are safe. If the wrapper returns TRUE, drivers must not
make PCI MMIO or DMA access to their hardware.

The pci_dev structure member error_state reflects one of three values,
1) pci_channel_io_normal, 2) pci_channel_io_frozen, 3)
pci_channel_io_perm_failure.  Function pci_channel_offline(dev) returns
TRUE if error_state is pci_channel_io_frozen or pci_channel_io_perm_failure.

The EEH driver sets pci_dev->error_state to pci_channel_io_frozen at the
point where the PCI slot is frozen. Currently, the EEH driver restores
dev->error_state to pci_channel_io_normal in eeh_report_resume() before
calling the driver's resume callback. However, when the EEH driver calls
the driver's slot_reset callback() from eeh_report_reset(), it
incorrectly indicates the error state is still pci_channel_io_frozen.

Waiting until eeh_report_resume() to restore dev->error_state to
pci_channel_io_normal is too late for Emulex and QLogic FC drivers and
any other drivers which are designed to use common code paths in these
two cases: i) those called after the driver's slot_reset callback() and
ii) those called after the PCI slot is frozen but before the driver's
slot_reset callback is called. Case i) all driver paths executed to
reinitialize the hardware after a reset and case ii) all code paths
executed by driver kernel threads that run asynchronous to the main
driver thread, such as interrupt handlers and worker threads to process
driver work queues.

Emulex and QLogic FC drivers are designed with common code paths which
require that pci_channel_offline(dev) reflect the true state of the
hardware. The state transitions that the hardware takes from Normal
Operations to Slot Frozen to Reset to Normal Operations are documented
in the Power Architecture™ Platform Requirements+ (PAPR+) in Table 75.
PE State Control.

PAPR defines the following 3 states:

0 -- Not reset, Not EEH stopped, MMIO load/store allowed, DMA allowed
     (Normal Operations)
1 -- Reset, Not EEH stopped, MMIO load/store disabled, DMA disabled
2 -- Not reset, EEH stopped, MMIO load/store disabled, DMA disabled
     (Slot Frozen)

An EEH error places the slot in state 2 (Frozen) and the adapter driver
is notified that an EEH error was detected. If the adapter driver
returns PCI_ERS_RESULT_NEED_RESET, the EEH driver calls
eeh_reset_device() to place the slot into state 1 (Reset) and
eeh_reset_device completes by placing the slot into State 0 (Normal
Operations). Upon return from eeh_reset_device(), the EEH driver calls
eeh_report_reset, which then calls the adapter's slot_reset callback. At
the time the adapter's slot_reset callback is called, the true state of
the hardware is Normal Operations and should be accurately reflected by
setting dev->error_state to pci_channel_io_normal.

The current implementation of EEH driver does not do so and requires
this change to correct this deficiency.

Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Acked-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-15 15:23:53 +10:00
Benjamin Herrenschmidt
9ff9a26b78 Merge commit 'origin/master' into next
Manual merge of:
	arch/powerpc/include/asm/elf.h
	drivers/i2c/busses/i2c-mpc.c
2009-03-30 14:04:53 +11:00
Jeremy Kerr
82631f5dd1 powerpc: Add write barrier before enabling DTL flags
Currently, we don't enforce any ordering for updates to the lppaca
when enabling dtl logging, so we may end up enabling logging before the
index fields have been established.

This change adds a smp_wmb() before doing the actual enable.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-27 16:58:23 +11:00
Jeremy Kerr
fc59a3fc8e powerpc: Add virtual processor dispatch trace log
pseries SPLPAR machines are able to retrieve a log of dispatch and
preempt events from the hypervisor. With this information, we can
see when and why each dispatch & preempt is occuring.

This change adds a set of debugfs files allowing userspace to read this
dispatch log.

Based on initial patches from Nishanth Aravamudan <nacc@us.ibm.com>.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:28 +11:00
Nathan Fontenot
c5785f9e1c powerpc/pseries: Failed reconfig notifier chain call cleanup
The return code from invoking the notifier chain when updating the
ibm,dynamic-memory property is not handled properly. In failure
cases (rc == NOTIFY_BAD) we should be restoring the original value
of the property.  In success (rc == NOTIFY_OK) we should be returning
zero from the calling routine.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:43:52 +11:00
Benjamin Herrenschmidt
28794d34ec powerpc/kconfig: Kill PPC_MULTIPLATFORM
CONFIG_PPC_MULTIPLATFORM is a remain of the pre-powerpc days and isn't
really meaningful anymore. It was basically equivalent to PPC64 || 6xx.

This removes it along with the following changes:

 - 32-bit platforms that relied on PPC32 && PPC_MULTIPLATFORM now rely
   on 6xx which is what they want anyway.

 - A new symbol, PPC_BOOK3S, is defined that represent compliance with
   the "Server" variant of the architecture. This is set when either 6xx
   or PPC64 is set and open the door for future BOOK3E 64-bit.

 - 64-bit platforms that relied on PPC64 && PPC_MULTIPLATFORM now use
   PPC64 && PPC_BOOK3S

 - A separate and selectable CONFIG_PPC_OF_BOOT_TRAMPOLINE option is now
   used to control the use of prom_init.c

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:11:35 +11:00
Michael Ellerman
1bac022155 powerpc/pseries: The pseries MSI code depends on EEH
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:11:33 +11:00
Michael Ellerman
94afa5a5f5 powerpc/pseries: Reject discontiguous/non-zero based MSI-X requests
There's no way for us to express to firmware that we want a
discontiguous, or non-zero based, range of MSI-X entries. So we
must reject such requests.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:11:33 +11:00
Michael Ellerman
448e2ca0e3 powerpc/pseries: Implement a quota system for MSIs
There are hardware limitations on the number of available MSIs,
which firmware expresses using a property named "ibm,pe-total-#msi".
This property tells us how many MSIs are available for devices below
the point in the PCI tree where we find the property.

For old firmwares which don't have the property, we assume there are
8 MSIs available per "partitionable endpoint" (PE). The PE can be
found using existing EEH code, which uses the methods described in
PAPR. For our purposes we want the parent of the node that's
identified using this method.

When a driver requests n MSIs for a device, we first establish where
the "ibm,pe-total-#msi" property above that device is, or we find the
PE if the property is not found. In both cases we call this node
the "pe_dn".

We then count all non-bridge devices below the pe_dn, to establish
how many devices in total may need MSIs. The quota is then simply the
total available divided by the number of devices, if the request is
less than or equal to the quota, the request is fine and we're done.

If the request is greater than the quota, we try to determine if there
are any "spare" MSIs which we can give to this device. Spare MSIs are
found by looking for other devices which can never use their full
quota, because their "req#msi(-x)" property is less than the quota.

If we find any spare, we divide the spares by the number of devices
that could request more than their quota. This ensures the spare
MSIs are spread evenly amongst all over-quota requestors.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:03 +11:00
Michael Ellerman
d523cc379d powerpc/pseries: Return req#msi(-x) if request is larger
If a driver asks for more MSIs than the devices "req#msi(-x)" property,
we currently return -ENOSPC. This doesn't give the driver any chance to
make a new request with a number that might work.

So if "req#msi(-x)" is less than the request, return its value. To be
100% safe, make sure we return an error if req_msi == 0.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:03 +11:00
Ingo Molnar
f8a6b2b9ce Merge branch 'linus' into x86/apic
Conflicts:
	arch/x86/kernel/acpi/boot.c
	arch/x86/mm/fault.c
2009-02-13 09:44:22 +01:00
Mike Mason
8535ef05a6 powerpc/eeh: Only disable/enable LSI interrupts in EEH
The EEH code disables and enables interrupts during the
device recovery process.  This is unnecessary for MSI
and MSI-X interrupts because they are effectively disabled
by the DMA Stopped state when an EEH error occurs.  The
current code is also incorrect for MSI-X interrupts.  It
doesn't take into account that MSI-X interrupts are tracked
in a different way than LSI/MSI interrupts.  This patch
ensures only LSI interrupts are disabled/enabled.

Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Acked-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 16:00:08 +11:00
Michael Ellerman
6071ed0487 powerpc/pseries: Return the number of MSIs we could allocate
If we can't allocate the requested number of MSIs, we can still tell the
generic code how many we were able to allocate. That can then be passed
onto the driver, allowing it to request that many in future, and
probably succeeed.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 13:38:02 +11:00
Michael Ellerman
649781f827 powerpc/pseries: Check for MSI-X also in rtas_msi_pci_irq_fixup()
We also need to check that the device isn't using MSI-X in the irq fixup
routine, otherwise we might leave MSI-Xs configured at boot.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 13:38:01 +11:00
Michael Ellerman
3a51c0cbea powerpc/pseries: Add support for ibm,req#msi-x
Firmware encodes the number of MSI-X requested by a device in a

different property than for MSI. Pull the property name out as a
parameter and share the logic for both cases.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 13:38:01 +11:00
Michael Ellerman
e27ed698b8 powerpc/pseries: Fix MSI-X interrupt querying
We need to increment i in the loop that queries what interrupts firmware
gave us, otherwise we'll incorrectly use the first value over and over.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 13:38:01 +11:00
Milton Miller
7ce14a315d powerpc/pseries: Remove write only variable in PCI DLPAR
Since we never hotplug add an isa bus, we never need to set primary.
Delete this write-only variable.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 13:37:59 +11:00
Michael Neuling
0b2f82872f powerpc: Add missing sparsemem.h include
arch/powerpc/platforms/pseries/hotplug-memory.c uses
remove_section_mapping() but doesn't include sparsemem.h which defines
it.  This can cause compilation fails for some configs.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-10 14:39:09 +11:00
Ingo Molnar
6a385db5ce Merge branch 'core/percpu' into x86/core
Conflicts:
	kernel/irq/handle.c
2009-01-28 23:12:55 +01:00
Stephen Rothwell
802bdea875 powerpc: Printing fix for l64 to ll64 conversion: phyp_dump.c
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-28 17:15:51 +11:00
Ingo Molnar
198030782c Merge branch 'x86/mm' into core/percpu
Conflicts:
	arch/x86/mm/fault.c
2009-01-21 10:39:51 +01:00
Ingo Molnar
fe333321e2 powerpc: Change u64/s64 to a long long integer type
Convert arch/powerpc/ over to long long based u64:

 -#ifdef __powerpc64__
 -# include <asm-generic/int-l64.h>
 -#else
 -# include <asm-generic/int-ll64.h>
 -#endif
 +#include <asm-generic/int-ll64.h>

This will avoid reoccuring spurious warnings in core kernel code that
comes when people test on their own hardware. (i.e. x86 in ~98% of the
cases) This is what x86 uses and it generally helps keep 64-bit code
32-bit clean too.

[Adjusted to not impact user mode (from paulus) - sfr]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-13 14:47:59 +11:00
Mike Travis
e65e49d0f3 irq: update all arches for new irq_desc
Impact: cleanup, update to new cpumask API

Irq_desc.affinity and irq_desc.pending_mask are now cpumask_var_t's
so access to them should be using the new cpumask API.

Signed-off-by: Mike Travis <travis@sgi.com>
2009-01-12 15:27:13 -08:00
Linus Torvalds
b840d79631 Merge branch 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
  x86: export vector_used_by_percpu_irq
  x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
  sched: nominate preferred wakeup cpu, fix
  x86: fix lguest used_vectors breakage, -v2
  x86: fix warning in arch/x86/kernel/io_apic.c
  sched: fix warning in kernel/sched.c
  sched: move test_sd_parent() to an SMP section of sched.h
  sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
  sched: activate active load balancing in new idle cpus
  sched: bias task wakeups to preferred semi-idle packages
  sched: nominate preferred wakeup cpu
  sched: favour lower logical cpu number for sched_mc balance
  sched: framework for sched_mc/smt_power_savings=N
  sched: convert BALANCE_FOR_xx_POWER to inline functions
  x86: use possible_cpus=NUM to extend the possible cpus allowed
  x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
  x86: update io_apic.c to the new cpumask code
  x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
  x86: xen: use smp_call_function_many()
  x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
  ...

Fixed up trivial conflict in kernel/time/tick-sched.c manually
2009-01-02 11:44:09 -08:00
Linus Torvalds
5f34fe1cfc Merge branch 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
  stacktrace: provide save_stack_trace_tsk() weak alias
  rcu: provide RCU options on non-preempt architectures too
  printk: fix discarding message when recursion_bug
  futex: clean up futex_(un)lock_pi fault handling
  "Tree RCU": scalable classic RCU implementation
  futex: rename field in futex_q to clarify single waiter semantics
  x86/swiotlb: add default swiotlb_arch_range_needs_mapping
  x86/swiotlb: add default phys<->bus conversion
  x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
  x86: add swiotlb allocation functions
  swiotlb: consolidate swiotlb info message printing
  swiotlb: support bouncing of HighMem pages
  swiotlb: factor out copy to/from device
  swiotlb: add arch hook to force mapping
  swiotlb: allow architectures to override phys<->bus<->phys conversions
  swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
  rcu: fix rcutorture behavior during reboot
  resources: skip sanity check of busy resources
  swiotlb: move some definitions to header
  swiotlb: allow architectures to override swiotlb pool allocation
  ...

Fix up trivial conflicts in
  arch/x86/kernel/Makefile
  arch/x86/mm/init_32.c
  include/linux/hardirq.h
as per Ingo's suggestions.
2008-12-30 16:10:19 -08:00
Sebastien Dugue
b906cfa397 powerpc/pseries: Fix cpu hotplug
Currently, pseries_cpu_die() calls msleep() while polling RTAS for
the status of the dying cpu.

However, if the cpu that is going down also happens to be the one
doing the tick then we're hosed as the tick_do_timer_cpu 'baton' is
only passed later on in tick_shutdown() when _cpu_down() does the
CPU_DEAD notification.  Therefore jiffies won't be updated anymore.

This replaces that msleep() with a cpu_relax() to make sure we're not
going to schedule at that point.

With this patch my test box survives a 100k iterations hotplug stress
test on _all_ cpus, whereas without it, it quickly dies after ~50
iterations.

Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 15:13:27 +11:00
Brian King
fecba96268 powerpc: Add reboot notifier to Collaborative Memory Manager
When running Active Memory Sharing, pages can get marked as
"loaned" with the hypervisor by the CMM driver. This state gets
cleared by the system firmware when rebooting the partition.
When using kexec to boot a new kernel, this state never gets
cleared and the hypervisor and CMM driver can get out of sync
with respect to the number of pages currently marked "loaned".
Fix this by adding a reboot notifier to the CMM driver to deflate
the balloon and mark all pages as active.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:15 +11:00
Brian King
2218108e18 powerpc: Disable Collaborative Memory Manager for kdump
When running Active Memory Sharing, the Collaborative Memory Manager
(CMM) may mark some pages as "loaned" with the hypervisor.
Periodically, the CMM will query the hypervisor for a loan request,
which is a single signed value.  When kexec'ing into a kdump kernel,
the CMM driver in the kdump kernel is not aware of the pages the
previous kernel had marked as "loaned", so the hypervisor and the CMM
driver are out of sync.  This results in the CMM driver getting a
negative loan request, which can then get treated as a large unsigned
value and can cause kdump to hang due to the CMM driver inflating too
large.  Since there really is no clean way for the CMM driver in the
kdump kernel to clean this up, simply disable CMM in the kdump kernel.
This fixes hangs we were seeing doing kdump with AMS.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:15 +11:00
Tony Breeds
532774ec7f powerpc: Pass a valid token to rtas_call() in phyp-dump code
ibm_configure_kernel_dump is passed as the token to rtas_call() is
never initialised.  This sets it to something sane.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Manish Ahuja <mahujam@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:15 +11:00
Tony Breeds
7a2eab0d4e powerpc: Protect against NULL pointer deref in phyp-dump code
print_dump_header() will be called at least once with a NULL pointer in
a normal boot sequence.  If DEBUG is defined then we will dereference
the pointer and crash.  Add a quick fix to exit early in the NULL pointer
case.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Manish Ahuja <mahujam@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:14 +11:00
Paul E. McKenney
64db4cfff9 "Tree RCU": scalable classic RCU implementation
This patch fixes a long-standing performance bug in classic RCU that
results in massive internal-to-RCU lock contention on systems with
more than a few hundred CPUs.  Although this patch creates a separate
flavor of RCU for ease of review and patch maintenance, it is intended
to replace classic RCU.

This patch still handles stress better than does mainline, so I am still
calling it ready for inclusion.  This patch is against the -tip tree.
Nevertheless, experience on an actual 1000+ CPU machine would still be
most welcome.

Most of the changes noted below were found while creating an rcutiny
(which should permit ejecting the current rcuclassic) and while doing
detailed line-by-line documentation.

Updates from v9 (http://lkml.org/lkml/2008/12/2/334):

o	Fixes from remainder of line-by-line code walkthrough,
	including comment spelling, initialization, undesirable
	narrowing due to type conversion, removing redundant memory
	barriers, removing redundant local-variable initialization,
	and removing redundant local variables.

	I do not believe that any of these fixes address the CPU-hotplug
	issues that Andi Kleen was seeing, but please do give it a whirl
	in case the machine is smarter than I am.

	A writeup from the walkthrough may be found at the following
	URL, in case you are suffering from terminal insomnia or
	masochism:

	http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf

o	Made rcutree tracing use seq_file, as suggested some time
	ago by Lai Jiangshan.

o	Added a .csv variant of the rcudata debugfs trace file, to allow
	people having thousands of CPUs to drop the data into
	a spreadsheet.	Tested with oocalc and gnumeric.  Updated
	documentation to suit.

Updates from v8 (http://lkml.org/lkml/2008/11/15/139):

o	Fix a theoretical race between grace-period initialization and
	force_quiescent_state() that could occur if more than three
	jiffies were required to carry out the grace-period
	initialization.  Which it might, if you had enough CPUs.

o	Apply Ingo's printk-standardization patch.

o	Substitute local variables for repeated accesses to global
	variables.

o	Fix comment misspellings and redundant (but harmless) increments
	of ->n_rcu_pending (this latter after having explicitly added it).

o	Apply checkpatch fixes.

Updates from v7 (http://lkml.org/lkml/2008/10/10/291):

o	Fixed a number of problems noted by Gautham Shenoy, including
	the cpu-stall-detection bug that he was having difficulty
	convincing me was real.  ;-)

o	Changed cpu-stall detection to wait for ten seconds rather than
	three in order to reduce false positive, as suggested by Ingo
	Molnar.

o	Produced a design document (http://lwn.net/Articles/305782/).
	The act of writing this document uncovered a number of both
	theoretical and "here and now" bugs as noted below.

o	Fix dynticks_nesting accounting confusion, simplify WARN_ON()
	condition, fix kerneldoc comments, and add memory barriers
	in dynticks interface functions.

o	Add more data to tracing.

o	Remove unused "rcu_barrier" field from rcu_data structure.

o	Count calls to rcu_pending() from scheduling-clock interrupt
	to use as a surrogate timebase should jiffies stop counting.

o	Fix a theoretical race between force_quiescent_state() and
	grace-period initialization.  Yes, initialization does have to
	go on for some jiffies for this race to occur, but given enough
	CPUs...

Updates from v6 (http://lkml.org/lkml/2008/9/23/448):

o	Fix a number of checkpatch.pl complaints.

o	Apply review comments from Ingo Molnar and Lai Jiangshan
	on the stall-detection code.

o	Fix several bugs in !CONFIG_SMP builds.

o	Fix a misspelled config-parameter name so that RCU now announces
	at boot time if stall detection is configured.

o	Run tests on numerous combinations of configurations parameters,
	which after the fixes above, now build and run correctly.

Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):

o	Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
	changeset some time ago, and finally got around to retesting
	this option).

o	Fix some tracing bugs in rcupreempt that caused incorrect
	totals to be printed.

o	I now test with a more brutal random-selection online/offline
	script (attached).  Probably more brutal than it needs to be
	on the people reading it as well, but so it goes.

o	A number of optimizations and usability improvements:

	o	Make rcu_pending() ignore the grace-period timeout when
		there is no grace period in progress.

	o	Make force_quiescent_state() avoid going for a global
		lock in the case where there is no grace period in
		progress.

	o	Rearrange struct fields to improve struct layout.

	o	Make call_rcu() initiate a grace period if RCU was
		idle, rather than waiting for the next scheduling
		clock interrupt.

	o	Invoke rcu_irq_enter() and rcu_irq_exit() only when
		idle, as suggested by Andi Kleen.  I still don't
		completely trust this change, and might back it out.

	o	Make CONFIG_RCU_TRACE be the single config variable
		manipulated for all forms of RCU, instead of the prior
		confusion.

	o	Document tracing files and formats for both rcupreempt
		and rcutree.

Updates from v4 for those missing v5 given its bad subject line:

o	Separated dynticks interface so that NMIs and irqs call separate
	functions, greatly simplifying it.  In particular, this code
	no longer requires a proof of correctness.  ;-)

o	Separated dynticks state out into its own per-CPU structure,
	avoiding the duplicated accounting.

o	The case where a dynticks-idle CPU runs an irq handler that
	invokes call_rcu() is now correctly handled, forcing that CPU
	out of dynticks-idle mode.

o	Review comments have been applied (thank you all!!!).
	For but one example, fixed the dynticks-ordering issue that
	Manfred pointed out, saving me much debugging.  ;-)

o	Adjusted rcuclassic and rcupreempt to handle dynticks changes.

Attached is an updated patch to Classic RCU that applies a hierarchy,
greatly reducing the contention on the top-level lock for large machines.
This passes 10-hour concurrent rcutorture and online-offline testing on
128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
bugs in presence of dynticks (exciting working on a system where
"sleep 1" hangs until interrupted...), which were fixed in the
2.6.27 kernel.  It is getting more reliable than mainline by some
measures, so the next version will be against -tip for inclusion.
See also Manfred Spraul's recent patches (or his earlier work from
2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
We will converge onto a common patch in the fullness of time, but are
currently exploring different regions of the design space.  That said,
I have already gratefully stolen quite a few of Manfred's ideas.

This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
of the RCU hierarchy.  Defaults to 32 on 32-bit machines and 64 on
64-bit machines.  If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
there is no hierarchy.  By default, the RCU initialization code will
adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
this balancing, allowing the hierarchy to be exactly aligned to the
underlying hardware.  Up to two levels of hierarchy are permitted
(in addition to the root node), allowing up to 16,384 CPUs on 32-bit
systems and up to 262,144 CPUs on 64-bit systems.  I just know that I
am going to regret saying this, but this seems more than sufficient
for the foreseeable future.  (Some architectures might wish to set
CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
If this becomes a real problem, additional levels can be added, but I
doubt that it will make a significant difference on real hardware.)

In the common case, a given CPU will manipulate its private rcu_data
structure and the rcu_node structure that it shares with its immediate
neighbors.  This can reduce both lock and memory contention by multiple
orders of magnitude, which should eliminate the need for the strange
manipulations that are reported to be required when running Linux on
very large systems.

Some shortcomings:

o	More bugs will probably surface as a result of an ongoing
	line-by-line code inspection.

	Patches will be provided as required.

o	There are probably hangs, rcutorture failures, &c.  Seems
	quite stable on a 128-CPU machine, but that is kind of small
	compared to 4096 CPUs.  However, seems to do better than
	mainline.

	Patches will be provided as required.

o	The memory footprint of this version is several KB larger
	than rcuclassic.

	A separate UP-only rcutiny patch will be provided, which will
	reduce the memory footprint significantly, even compared
	to the old rcuclassic.  One such patch passes light testing,
	and has a memory footprint smaller even than rcuclassic.
	Initial reaction from various embedded guys was "it is not
	worth it", so am putting it aside.

Credits:

o	Manfred Spraul for ideas, review comments, and bugs spotted,
	as well as some good friendly competition.  ;-)

o	Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
	Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
	for reviews and comments.

o	Thomas Gleixner for much-needed help with some timer issues
	(see patches below).

o	Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
	Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
	Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
	alive despite my heavy abuse^Wtesting.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 21:56:04 +01:00
Nathan Lynch
edc72ac4a0 powerpc/pseries: Check for GIQ indicator before calling set-indicator
Since "Factor out cpu joining/unjoining the GIQ"
(b4963255ad) the WARN_ON in
xics_set_cpu_giq() is being triggered during boot on JS20 because the
GIQ indicator is not available on that platform.  While the warning is
harmless and the system runs normally, it's nicer to check for the
existence of the indicator before trying to manipulate it.

Implement rtas_indicator_present(), which searches the
/rtas/rtas-indicators property for the given indicator token, and use
this function in xics_set_cpu_giq().

Also use a WARN statement in xics_set_cpu_giq to get better
information on failure.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-16 15:53:13 +11:00
Rusty Russell
0de26520c7 cpumask: make irq_set_affinity() take a const struct cpumask
Impact: change existing irq_chip API

Not much point with gentle transition here: the struct irq_chip's
setaffinity method signature needs to change.

Fortunately, not widely used code, but hits a few architectures.

Note: In irq_select_affinity() I save a temporary in by mangling
irq_desc[irq].affinity directly.  Ingo, does this break anything?

(Folded in fix from KOSAKI Motohiro)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: jeremy@xensource.com
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
2008-12-13 21:20:26 +10:30
Rusty Russell
29c0177e6a cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.
Impact: change calling convention of existing cpumask APIs

Most cpumask functions started with cpus_: these have been replaced by
cpumask_ ones which take struct cpumask pointers as expected.

These four functions don't have good replacement names; fortunately
they're rarely used, so we just change them over.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: mingo@redhat.com
Cc: tony.luck@intel.com
Cc: ralf@linux-mips.org
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: cl@linux-foundation.org
Cc: srostedt@redhat.com
2008-12-13 21:20:25 +10:30
Benjamin Herrenschmidt
fd6852c8fa powerpc/pci: Fix various pseries PCI hotplug issues
The pseries PCI hotplug code has a number of issues, ranging from
incorrect resource setup to crashes, depending on what is added,
when, whether it contains a bridge, etc etc....

This fixes a whole bunch of these, while actually simplifying the code
a bit, using more generic code in the process and factoring out common
code between adding of a PHB, a slot or a device.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-06 09:31:52 +11:00
Benjamin Herrenschmidt
57b066ff4e powerpc/eeh: Make EEH device add/remove more robust
To properly fix PCI hotplug, it's useful to be able to make the fixup
passes on all devices whether they were just hot plugged or already
there.

The EEH code however used to not be very friendly with calling
eeh_add_device_late() multiple time, and not very rebust in the way it
generally tests whether a device is in the expected state vs. the EEH
code.

This improves it, along with cleaning up a couple of debug printk's.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-06 09:25:15 +11:00
Sebastien Dugue
1ef8014deb powerpc/pseries: Fix getting the server number size
The 'ibm,interrupt-server#-size' properties are not in the cpu nodes,
which is where we currently look for them, but rather live under the
interrupt source controller nodes (which have "ibm,ppc-xics" in their
compatible property).

This moves the code that looks for the ibm,interrupt-server#-size
properties from xics_update_irq_servers() into xics_init_IRQ().

Also this adds a check for mismatched sizes across the interrupt
source controller nodes.  Not sure this is necessary as in this case
the firmware might be seriously busted.

This property only appears on POWER6 boxes and is only used in the
set-indicator(gqirm) call, and apparently firmware currently ignores
the value we pass.  Nevertheless we need to fix it in case future
firmware versions use it.

Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-05 22:08:28 +11:00
Stephen Rothwell
454666eb78 powerpc: Fix "unused variable" warning in pci_dlpar.c
This gets rid of this build warning:

arch/powerpc/platforms/pseries/pci_dlpar.c: In function 'init_phb_dynamic':
arch/powerpc/platforms/pseries/pci_dlpar.c:192: warning: unused variable 'b'

This is one of the very few warnings left in a ppc64_defconfig build and
getting rid of it will make it easier to see future introduced ones (in
fact this was introduced very recently).

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-05 19:59:08 +11:00
Nathan Fontenot
e90a131846 powerpc/pci: Properly allocate bus resources for hotplug PHBs
Resources for PHB's that are dynamically added to a system are not
properly allocated in the resource tree.

Not having these resources allocated causes an oops when removing
the PHB when we try to release them.

The diff appears a bit messy, this is mainly due to moving everything
one tab to the left in the pcibios_allocate_bus_resources routine.
The functionality change in this routine is only that the
list_for_each_entry() loop is pulled out and moved to the necessary
calling routine.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-10-31 16:12:03 +11:00
Milton Miller
62a8bd6c92 powerpc: Use is_kdump_kernel()
linux/crash_dump.h defines is_kdump_kernel() to be used by code that
needs to know if the previous kernel crashed instead of a (clean) boot
or reboot.

This updates the just added powerpc code to use it.  This is needed
for the next commit, which will remove __kdump_flag.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-10-31 16:11:47 +11:00
Mohan Kumar M
54622f10a6 powerpc: Support for relocatable kdump kernel
This adds relocatable kernel support for kdump. With this one can
use the same regular kernel to capture the kdump. A signature (0xfeed1234)
is passed in r6 from panic code to the next kernel through kexec_sequence
and purgatory code. The signature is used to differentiate between
kdump kernel and non-kdump kernels.

The purgatory code compares the signature and sets the __kdump_flag in
head_64.S.  During the boot up, kernel code checks __kdump_flag and if it
is set, the kernel will behave as relocatable kdump kernel. This kernel
will boot at the address where it was loaded by kexec-tools ie. at the
address reserved through crashkernel boot parameter.

CONFIG_CRASH_DUMP depends on CONFIG_RELOCATABLE option to build kdump
kernel as relocatable. So the same kernel can be used as production and
kdump kernel.

This patch incorporates the changes suggested by Paul Mackerras to avoid
GOT use and to avoid two copies of the code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Mohan Kumar M <mohan@in.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-22 15:01:22 +11:00
Milton Miller
6a75a6b8e8 powerpc: Use cpu_thread_in_core in smp_init for of_spin_map
We used to assume that even numbered threads were the primary
threads, ie those that would be listed and started as a cpu from
open firmware.  Replace a left over is even (% 2) check with a check
for it being a primary thread and update the comments.

Tested with a debug print on pseries, identical code found for cell.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-21 15:19:12 +11:00
Nathan Fontenot
04badfd293 powerpc/pseries: Validate PFN in pseries_remove_lmb()
The pfn of the memory to be removed should be validated prior to
attempting to remove the memory.  In cases where the probe of a
memory section fails during hotplug add, the pfn for the lmb may
not be valid.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-21 15:17:47 +11:00
Benjamin Herrenschmidt
6dc6472581 Merge commit 'origin'
Manual fixup of conflicts on:

	arch/powerpc/include/asm/dcr-regs.h
	drivers/net/ibm_newemac/core.h
2008-10-15 11:31:54 +11:00
Milton Miller
199f45c45e powerpc/xics: Reduce and comment xics IPI use of memory barriers
A single full sync (mb()) is requrired to order the mmio to the qirr reg
with the set or clear of the message word.  However, test_and_clear_bit
has the effect of smp_mb() and we are not doing any other io from here,
so we don't need a mb per bit processed.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-13 16:24:19 +11:00
Milton Miller
2172fe8704 powerpc/xics: Make printk format strings fit on one line
Several printks were broken at word boundaries for line length.   Some
even referred to old function names.   Using __func__ and changing the
text slightly for the format allows these printk formats to fit on one
line.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-13 16:24:19 +11:00
Milton Miller
d879f3849c powerpc/xics: Mark xics IPI interrupt as per-cpu
It is physically per-cpu, and we want the irq layer to treat it that way.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-13 16:24:19 +11:00
Milton Miller
1a57c926b6 powerpc/xics: EOI xics ipi by hand in kexec
EOI normally has the side effect of returning the cpu to the base
priority to recieve the next interrupt.  This is actually controlled
by the top byte of the xirr register.   When we are exiting the
kernel in kexec we must eoi the ipi for the next kernel because we
never return from the handler, but we want to leave interrupt
delivery blocked until the next kernel takes action.

Since the hardware ipi vector is fixed, its easiest to just do the
eoi explicitly.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-13 16:24:18 +11:00
Milton Miller
b4963255ad powerpc/xics: Factor out cpu joining/unjoining the GIQ
This factors out processors joining and unjoining the Global Interrupt
Queue into a separate function.

There is a bit of math to calculate the arguments to rtas to join
or leave the global interrupt queue, and a warning on failure
afterwards.  Make a helper for the 3 callers.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-13 16:24:18 +11:00
Milton Miller
a244a957ab powerpc/xics: Initialization code cleanups
We only need to check the ibm,interrupt-server#-size property once, not
once per global server and thread.

We can use !CONFIG_SMP cpu masks and hard_smp_processor_id() to avoid an ifdef.
Put the node when breaking out of the loop on lpar systems.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-13 16:24:18 +11:00
Milton Miller
188bdddd24 powerpc/xics: Trim #include list
Trim unneeded includes from xics.c.  We don't use signals or gfp
flags, we use only OF functions and don't need prom, and the 8259
is now handled by our caller.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-13 16:24:17 +11:00
Milton Miller
9dc2d44113 powerpc/xics: Change *_xirr_info_set() prototype to avoid casts
The xirr is 32 bits in hardware, but the hypervisor requries the upper
bits of the register to be clear on the hcall.  By changing the type
from signed to unsigned int we can drop masking it back to 32 bits.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-13 16:24:17 +11:00
Milton Miller
0641cc91b0 powerpc/xics: Rearrange file to group code by function
Now that xics_update_irq_servers is called only from init and hotplug
code, it becomes possible to clean up the ordering of functions in the
file, grouping them but the interfaces they implement.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-13 16:24:17 +11:00
Milton Miller
d13f7208b2 powerpc/xics: Consolidate ipi message encode and decode
xics supports only one ipi per cpu, and expects software to use some
queue to know why the interrupt was sent.  In Linux, we use a an array
of bitmaps indexed by cpu to identify the message.  Currently the bits
are set in smp.c and decoded in xics.c, with the data structure in a
header file.   Consolidate the code in xics.c similar to mpic and other
interrupt controllers.

Also, while making the the array static, the message word doesn't need
to be volatile as set_bit and test_clear_bit take care of it for us, and
put it under ifdef smp.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-13 16:24:16 +11:00
Milton Miller
302905a347 powerpc/xics: Update default_server during migrate_irqs_away
Currently, every time we determine which irq server to use, we check if
default_server, which is the id of the bootcpu, is still online.  But
default_server is a hardware cpu, not the logical cpu id needed to index
cpu_online_map.

Since the default server can only go offline during a cpu hotplug event,
explicitly check the default server and choose the new one when we move
irqs away from the cpu being offlined.

This has the added benefit of only needing the boot_cpuid to be updated
and not relying on the cpu being marked offline during migrate_irqs_away.

Also, since xics_update_irq_servers only reads device tree information, we
can call it before xics_init_host in xics_init_IRQ and then default_server
will always be valid when we can reach get_irq_server via the host ops.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-13 10:55:47 +11:00
Milton Miller
8767e9badc powerpc/xics: EOI unmapped irqs after disabling them
When reciving an irq vector that does not have a linux mapping, the kernel
prints a message and calls RTAS to disable the irq source.   Previously
the kernel did not EOI the interrupt, causing the source to think it is
still being processed by software.  While this does add an additional
layer of protection against interrupt storms had RTAS failed to disable
the source, it also prevents the interrupt from working when a driver
later enables it.  (We could alternatively send an EOI on startup, but
that strategy would likely fail on an emulated xics.)

All interrupts should be disabled when the kernel starts, but this can
be observed if a driver does not shutdown an interrupt in its reboot
hook before starting a new kernel with kexec.

Michael reports this can be reproduced trivially by banging the keyboard
while kexec'ing on a P5 LPAR: even though the hvc_console driver request's
the console irq later in boot, the console is non-functional because
we're receiving no console interrupts.

Reported-By: Michael Ellerman
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-13 10:55:47 +11:00
Nathan Fontenot
9fd3f88cb6 powerpc: Oops in pseries_lmb_remove()
Testing hotplug memory remove has revealed that we can oops in
pseries_lmb_remove().  The incorrect shift causes a NULL pointer
dereference in the page_zone() inline routine.

I have only been able to reproduce the oops on kernels with large pages
enabled.

Tested on Power5 and Power6 with and without large pages enabled.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-10 15:55:17 +11:00
Vitaly Mayatskikh
76c31f239e powerpc: Honor O_NONBLOCK flag when reading RTAS log
rtas_log_read() doesn't check file flags for O_NONBLOCK and blocks
non-blocking readers of /proc/ppc64/rtas/error_log when there is
no data available. This fixes it.

Also rtas_log_read() returns now with ENODATA to prevent suspending of
process in wait_event_interruptible() when logging facility was
switched off and log is already empty.

Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-07 14:26:21 +11:00
Sebastien Dugue
967e012ef3 powerpc: Separate the irq radix tree insertion and lookup
irq_radix_revmap() currently serves 2 purposes, irq mapping lookup
and insertion which happen in interrupt and process context respectively.

Separate the function into its 2 components, one for lookup only and one
for insertion only.

Fix the only user of the revmap tree (XICS) to use the new functions.

Also, move the insertion into the radix tree of those irqs that were
requested before it was initialized at said tree initialization.

Mutual exclusion between the tree initialization and readers/writers is
handled via a state variable (revmap_trees_allocated) set to 1 when the tree
has been initialized and set to 2 after the already requested irqs have been
inserted in the tree by the init path. This state is checked before any reader
or writer access just like we used to check for tree.gfp_mask != 0 before.

Finally, now that we're not any longer inserting nodes into the radix-tree
in interrupt context, turn the GFP_ATOMIC allocations into GFP_KERNEL ones.

Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-09-15 11:08:44 -07:00
Nathan Fontenot
525c411d40 powerpc: Check rc of notifier chain for memory remove
The return code from invocation of the notifier for
pSeries_reconfig_chain during update of the device tree is not
checked.  This causes writes to /proc/ppc64/ofdt to update memory
properties (i.e. ibm,dyamic-reconfiguration-memory) to always
return success, instead of the result of the notifier chain.

This happens specifically when we remove/add memory from the
device tree on machines using memory specified in the
ibm,dynamic-reconfiguration-memory property of the device tree.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-09-15 11:07:52 -07:00
Paul Mackerras
7e392f8c29 Merge branch 'linux-2.6' 2008-09-10 11:36:13 +10:00
Adrian Bunk
9d5a9e7465 Remove asm/a.out.h files for all architectures without a.out support.
This patch also includes the required removal of (unused) inclusion of
<asm/a.out.h> <linux/a.out.h>'s in the arch/ code for these
architectures.

[dwmw2: updated for 2.6.27-rc]
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2008-09-06 19:30:24 +01:00
Andrew Morton
d617a40227 powerpc: Export CMO_PageSize
This fixes an error building powerpc allmodconfig:

ERROR: "CMO_PageSize" [arch/powerpc/platforms/pseries/cmm.ko] undefined!

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-26 10:24:47 +10:00
Tony Breeds
dcfcfe7567 powerpc: Guard print_device_node_tree() with #if 0
Currently print_device_node_tree() isn't called but it can be useful for
debugging.  Leave the function there but hide it behind '#if 0' to save
it being rewritten.  If you want to call it you're already editing this
file anyway. ;P

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-20 16:34:57 +10:00
Harvey Harrison
5df72bf3f7 powerpc: Replace __FUNCTION__ with __func__
__FUNCTION__ is gcc-specific, use __func__

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-20 16:34:57 +10:00
Brian King
370e4587d0 powerpc: Fix CMM page loaning on 64k page kernel with 4k hardware pages
If the firmware page size used for collaborative memory overcommit
is 4k, but the kernel is using 64k pages, the page loaning is currently
broken as it only marks the first 4k page of each 64k page as loaned.
This fixes this to iterate through each 4k page and mark them all as
loaned/active.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-18 14:22:35 +10:00
Robert Jennings
81f14997e8 powerpc: Make CMO paging space pool ID and page size available
During platform setup, save off the primary/secondary paging space
pool IDs and the page size.  Added accessors in hvcall.h for these
variables.  This is needed for a subsequent fix.

Submitted-by: Robert Jennings <rcj@linux.vnet.ibm.com>

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-18 14:22:34 +10:00
Stephen Rothwell
3cee67f779 powerpc/pseries: Fix CMO sysdev attribute API change fallout
Noticed due to these wanings:

arch/powerpc/platforms/pseries/cmm.c:298: warning: initialization from incompatible pointer type
arch/powerpc/platforms/pseries/cmm.c:299: warning: initialization from incompatible pointer type
arch/powerpc/platforms/pseries/cmm.c:320: warning: initialization from incompatible pointer type
arch/powerpc/platforms/pseries/cmm.c:320: warning: initialization from incompatible pointer type

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-28 16:30:50 +10:00
Robert Jennings
6490c4903d powerpc/pseries: iommu enablement for CMO
To support Cooperative Memory Overcommitment (CMO), we need to check
for failure from some of the tce hcalls.

These changes for the pseries platform affect the powerpc architecture;
patches for the other affected platforms are included in this patch.

pSeries platform IOMMU code changes:
 * platform TCE functions must handle H_NOT_ENOUGH_RESOURCES errors and
   return an error.

Architecture IOMMU code changes:
 * Calls to ppc_md.tce_build need to check return values and return
   DMA_MAPPING_ERROR for transient errors.

Architecture changes:
 * struct machdep_calls for tce_build*_pSeriesLP functions need to change
   to indicate failure.
 * all other platforms will need updates to iommu functions to match the new
   calling semantics; they will return 0 on success.  The other platforms
   default configs have been built, but no further testing was performed.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-25 15:44:43 +10:00
Brian King
84af458bb2 powerpc/pseries: Add collaborative memory manager
Adds a collaborative memory manager, which acts as a simple balloon driver
for System p machines that support cooperative memory overcommitment
(CMO).

Adds a platform configuration option for CMO called PPC_SMLPAR.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-25 15:44:42 +10:00
Brian King
86630a3232 powerpc/pseries: Utilities to set firmware page state
Newer versions of firmware support page states, which are used by the
collaborative memory manager (future patch) to "loan" pages to the
hypervisor for use by other partitions.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-25 15:44:42 +10:00
Robert Jennings
e46de429cb powerpc/pseries: Enable CMO feature during platform setup
For Cooperative Memory Overcommitment (CMO), set the FW_FEATURE_CMO
flag in powerpc_firmware_features from the rtas ibm,get-system-parameters
table prior to calling iommu_init_early_pSeries.

With this, any CMO specific functionality can be controlled by checking:
 firmware_has_feature(FW_FEATURE_CMO)

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-25 15:44:42 +10:00
Mike Mason
f36c5227cd powerpc/eeh: Don't panic when EEH_MAX_FAILS is exceeded
This patch changes the EEH_MAX_FAILS action from panic to printing an
error message.  Panicking under under this condition is too harsh.
Although performance will be affected and the device may not recover,
the system is still running, which at the very least should allow for a
more graceful shutdown. The patch also removes the msleep() within a
spinlock, which can lead to a deadlock and is not recommended.

Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Acked-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-22 10:39:37 +10:00
Mark Nelson
4f3dd8a062 powerpc/dma: Use the struct dma_attrs in iommu code
Update iommu_alloc() to take the struct dma_attrs and pass them on to
tce_build(). This change propagates down to the tce_build functions of
all the platforms.

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-22 10:39:32 +10:00
John Rigby
b500563b22 powerpc: pci config cleanup
Choosing PCI or not at config time is allowed on some
platforms via an if expression in arch/powerpc/Kconfig.
To add a new platform with PCI support selectable at
config time, you must change the if expression.  This
patch makes this easier by changing:
    bool "PCI support" if <long expression>
to
    bool "PCI support" if PPC_PCI_CHOICE
and adding select PPC_PCI_CHOICE to all the config nodes that
were previously in the PCI if expression.

Platforms with unconditional PCI support continue to
just select PCI in their config nodes.

Signed-off-by: John Rigby <jrigby@freescale.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-07-16 17:57:34 -05:00
Benjamin Herrenschmidt
84c3d4aaec Merge commit 'origin/master'
Manual merge of:

	arch/powerpc/Kconfig
	arch/powerpc/kernel/stacktrace.c
	arch/powerpc/mm/slice.c
	arch/ppc/kernel/smp.c
2008-07-16 11:07:59 +10:00
Dave Kleikamp
443dcac4d8 powerpc: Remove unnecessary condition when sanity-checking WIMG bits
It is okay for both _PAGE_GUARDED and _PAGE_COHERENT (G and M) to be set
in the same pte.  In fact, even if that were not the case, there doesn't
seem to be any place where G is set without also setting I (_PAGE_NO_CACHE),
so the test for I is sufficient as a condition to clear _PAGE_COHERENT
when filling the hash table.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-15 12:24:59 +10:00
Mike Mason
cde274c0c7 powerpc/eeh: PERR/SERR bit settings during EEH device recovery
The following patch restores the PERR and SERR bits in the PCI
command register during an EEH device recovery. We have found
at least one case (an Agilent test card) where the PERR/SERR
bits are set to 1 by firmware at boot time, but are not restored
to 1 during EEH recovery.  The patch fixes the Agilent card
problem.  It has been tested on several other EEH-enabled cards
with no regressions.

Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Acked-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-09 16:30:48 +10:00
Dave Kleikamp
801eb73f45 powerpc/mm: Don't clear _PAGE_COHERENT when _PAGE_SAO is set
The current low level hash code on LPAR configurations clears
_PAGE_COHERENT (M) when either _PAGE_GUARDED (G) or _PAGE_NO_CACHE (I)
is set. This conflicts with _PAGE_SAO which has M, I and W bits sets at
once (normally invalid combo) to indicate the new SAO attribute.

This changes the code to allow that case.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-09 16:30:46 +10:00
Nathan Fontenot
3c3f67eafa powerpc/pseries: Update the device tree correctly for drconf memory add/remove
This updates the device tree manipulation routines so that memory
add/remove of lmbs represented under the
ibm,dynamic-reconfiguration-memory node of the device tree invokes the
hotplug notifier chain.

This change is needed because of the change in the way memory is
represented under the ibm,dynamic-reconfiguration-memory node.  All lmbs
are described in the ibm,dynamic-memory property instead of having a
separate node for each lmb as in previous device tree layouts.  This
requires the update_node() routine to check for updates to the
ibm,dynamic-memory property and invoke the hotplug notifier chain.

This also updates the pseries hotplug notifier to be able to gather information
for lmbs represented under the ibm,dynamic-reconfiguration-memory node and
have the lmbs added/removed.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-03 16:58:16 +10:00
Nathan Fontenot
92ecd1790b powerpc/pseries: Use base address to derive starting page frame number
Use the base address of the lmb to derive the starting page frame number
instead of trying to extract it from the drc index of the lmb.  The drc
index should not be used for this as it will, and did, break.

Until this point, systems that have had memory represented in the device
tree with a node for each lmb the drc index would (luckily) closely
track the base address of the lmb.  For example a lmb with a drc index
of 8000000a would have a base address of a0000000.  This correlation
allowed the current code to derive the starting page frame number from
the drc inddex

Device tree layouts where lmbs are represented under the
ibm,dynamic-reconfiguration-memory node in the ibm,dynamic-memory
property do not have this correlation between the drc index and base
address of the lmb.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-03 16:58:15 +10:00
Nathan Fontenot
4b6e805e4a powerpc/pseries: Allow phandle to be specified in formats other than decimal
Allow the phandle passed to the /proc/ppc64/ofdt file to be specified
in formats other than decimal.  This allows us to easily specify phandle
values in hex that would otherwise appear as negative integers.

This is an issue on systems where the value of
/proc/device-tree/ibm,dynamic-reconfiguration-memory.ibm,phandle is
fffffff9.  Having to pass this to the ofdt file as a string results in
a large negative number, and simple_strtoul() does not handle negative
numbers.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-03 16:58:14 +10:00
Arnd Bergmann
cf2076012f powerpc/pseries: Call pseries_kexec_setup only on pseries
The pseries_kexec_setup function overwrites some ppc_md
pointers, so make sure it only gets called when running on
the right architecture.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-30 22:30:57 +10:00
Paul Mackerras
e9a4b6a3f6 Merge branch 'linux-2.6' 2008-06-30 10:16:50 +10:00
Jens Axboe
b7d7a2404f powerpc: convert to generic helpers for IPI function calls
This converts ppc to use the new helpers for smp_call_function() and
friends, and adds support for smp_call_function_single().

ppc loses the timeout functionality of smp_call_function_mask() with
this change, as the generic code does not provide that.

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:22:13 +02:00
Andrew Morton
8e01520c06 [POWERPC] Fix warning in pseries/eeh_driver.c
Fix this:

/usr/src/devel/arch/powerpc/platforms/pseries/eeh_driver.c: In function 'print_device_node_tree':
/usr/src/devel/arch/powerpc/platforms/pseries/eeh_driver.c:55: warning: ISO C90 forbids mixed declarations and code

also make that function look like it's part of Linux.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-16 15:00:44 +10:00
Julia Lawall
bad5232ba2 [POWERPC] Add missing of_node_put in pseries/nvram.c
of_node_put is needed before discarding a value received from
of_find_node_by_type, eg in error handling code.

The semantic patch that makes the change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
struct device_node *n;
struct device_node *n1;
struct device_node *n2;
statement S;
identifier f1,f2;
expression E1,E2;
constant C;
@@

n = of_find_node_by_type(...)
...
if (!n) S
... when != of_node_put(n)
    when != n1 = f1(n,...)
    when != E1 = n
    when any
    when strict
(
+ of_node_put(n);
  return -C;
|
  of_node_put(n);
|
  n2 = f2(n,...)
|
  E2 = n
|
  return ...;
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-16 15:00:32 +10:00
Michael Ellerman
541b2755c2 [POWERPC] Fix sparse warnings in arch/powerpc/platforms/pseries
Don't return void in pseries/iommu.c
Make mce_data_buf static in pseries/ras.c
Make things static in pseries/rtasd.c
Make things static in pseries/setup.c
vtermno may as well be static in platforms/pseries/lpar.c

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-14 22:32:02 +10:00
Michael Ellerman
3ff1999b2c [POWERPC] pseries/firmware.c should include pseries/pseries.h
The declaration for fw_feature_init() is in pseries.h and
implemented in firmware.c, so the latter should include the
former.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-14 22:32:00 +10:00
Denis V. Lunev
9185ef6787 [POWERPC] Assign PDE->data before gluing PDE into /proc tree
Simply replace proc_create and further data assigned with proc_create_data.
No need to check for data!=NULL after that.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-05 16:47:14 +10:00
Linus Torvalds
867a89e0b7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [RAPIDIO] Change RapidIO doorbell source and target ID field to 16-bit
  [RAPIDIO] Add RapidIO connection info print out and re-training for broken connections
  [RAPIDIO] Add serial RapidIO controller support, which includes MPC8548, MPC8641
  [RAPIDIO] Add RapidIO node probing into MPC86xx_HPCN board id table
  [RAPIDIO] Add RapidIO node into MPC8641HPCN dts file
  [RAPIDIO] Auto-probe the RapidIO system size
  [RAPIDIO] Add OF-tree support to RapidIO controller driver
  [RAPIDIO] Add RapidIO multi mport support
  [RAPIDIO] Move include/asm-ppc/rio.h to asm-powerpc
  [RAPIDIO] Add RapidIO option to kernel configuration
  [RAPIDIO] Change RIO function mpc85xx_ to fsl_
  [POWERPC] Provide walk_memory_resource() for powerpc
  [POWERPC] Update lmb data structures for hotplug memory add/remove
  [POWERPC] Hotplug memory remove notifications for powerpc
  [POWERPC] windfarm: Add PowerMac 12,1 support
  [POWERPC] Fix building of pmac32 when CONFIG_NVRAM=m
  [POWERPC] Add IRQSTACKS support on ppc32
  [POWERPC] Use __always_inline for xchg* and cmpxchg*
  [POWERPC] Add fast little-endian switch system call
2008-04-29 08:19:14 -07:00
Denis V. Lunev
667471386d powerpc: use non-racy method for proc entries creation
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.

Add correct ->owner to proc_fops to fix reading/module unloading race.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:22 -07:00
Badari Pulavarty
98d5c21c81 [POWERPC] Update lmb data structures for hotplug memory add/remove
The powerpc kernel maintains information about logical memory blocks
in the lmb.memory structure, which is initialized and updated at boot
time, but not when memory is added or removed while the kernel is
running.

This adds a hotplug memory notifier which updates lmb.memory when
memory is added or removed.  This information is useful for eHEA
driver to find out the memory layout and holes.

NOTE: No special locking is needed for lmb_add() and lmb_remove().
Calls to these are serialized by caller. (pSeries_reconfig_chain).

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 15:57:53 +10:00
Badari Pulavarty
57b539269e [POWERPC] Hotplug memory remove notifications for powerpc
Hotplug memory remove notifier for 64-bit powerpc.  This gets invoked
by writing to /proc/ppc64/ofdt the string "remove_node " followed by
the firmware device tree pathname of the node that needs to be removed.

In response, this adjusts the sections and removes sysfs entries by
calling __remove_pages().  Then it calls arch-specific code to get rid
of the hardware MMU mappings for the section of memory.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Reviewed-by: Michael Ellerman <michael@ellerman.id.au>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 15:57:52 +10:00
Michael Ellerman
36f8a2c4c6 [POWERPC] Add CONFIG_PPC_PSERIES_DEBUG to enable debugging for platforms/pseries
Add a DEBUG config setting which turns on all (most) of the debugging
under platforms/pseries.

To have this take effect we need to remove all the #undef DEBUG's, in
various files. We leave the #undef DEBUG in platforms/pseries/lpar.c,
as this enables debugging printks from the low-level hash table routines,
and tends to make your system unusable. If you want those enabled you
still have to turn them on by hand.

Also some of the RAS code has a DEBUG block which causes a functional
change, so I've keyed this off a different (non-existant) debug #define.

This is only enabled if you have PPC_EARLY_DEBUG enabled also.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-24 21:08:12 +10:00
Michael Ellerman
f7ebf352b2 [POWERPC] Convert from DBG() to pr_debug() in platforms/pseries/
In pseries/lpar.c, fix some printf specifier mismatches, and add
a newline to one printk.

In pseries/rtasd.c add "rtasd" to some messages to make it clear
where they're coming from.

In pseries/scanlog.c remove the hand-rolled runtime debugging support
in there. This file has been largely unchanged for eons, if we need to
debug it in future we can recompile.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-24 21:08:12 +10:00