linux

Author	SHA1	Message	Date
Dave Kleikamp	c48d0dbaac	powerpc/476: define specific cpu table entry DD2 core The DD2 core still has some unstability. Define CPU_FTR_476_DD2 to enable workarounds in later patches. This is based on an earlier, unreleased patch for DD1 by Ben Herrenschmidt. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2011-02-02 06:58:53 -05:00
Anton Blanchard	64ff312876	powerpc: Add support for popcnt instructions POWER5 added popcntb, and POWER7 added popcntw and popcntd. As a first step this patch does all the work out of line, but it would be nice to implement them as inlines with an out of line fallback. The performance issue with hweight was noticed when disabling SMT on a large (192 thread) POWER7 box. The patch improves that testcase by about 8%. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-29 15:48:17 +11:00
Anton Blanchard	f89451fbd2	powerpc: Feature nop out reservation clear when stcx checks address The POWER architecture does not require stcx to check that it is operating on the same address as the larx. This means it is possible for an an exception handler to execute a larx, get a reservation, decide not to do the stcx and then return back with an active reservation. If the interrupted code was in the middle of a larx/stcx sequence the stcx could incorrectly succeed. All recent POWER CPUs check the address before letting the stcx succeed so we can create a CPU feature and nop it out. As Ben suggested, we can only do this in our syscall path because there is a remote possibility some kernel code gets interrupted by an exception that ends up operating on the same cacheline. Thanks to Paul Mackerras and Derek Williams for the idea. To test this I used a very simple null syscall (actually getppid) testcase at http://ozlabs.org/~anton/junkcode/null_syscall.c I tested against 2.6.35-git10 with the following changes against the pseries_defconfig: CONFIG_VIRT_CPU_ACCOUNTING=n CONFIG_AUDIT=n CONFIG_PPC_4K_PAGES=n CONFIG_PPC_64K_PAGES=y CONFIG_FORCE_MAX_ZONEORDER=9 CONFIG_PPC_SUBPAGE_PROT=n CONFIG_FUNCTION_TRACER=n CONFIG_FUNCTION_GRAPH_TRACER=n CONFIG_IRQSOFF_TRACER=n CONFIG_STACK_TRACER=n to remove the overhead of virtual CPU accounting, syscall auditing and the ftrace mcount tracers. 64kB pages were enabled to minimise TLB misses. POWER6: +8.2% POWER7: +7.0% Another suggestion was to use a larx to something in the L1 instead of a stcx. This was almost as fast as removing the larx on POWER6, but only 3.5% faster on POWER7. We can use this to speed up the reservation clear in our exception exit code. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-09-02 14:07:30 +10:00
Linus Torvalds	c4efd6b569	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits) sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug sched: No need for bootmem special cases sched: Revert nohz_ratelimit() for now sched: Reduce update_group_power() calls sched: Update rq->clock for nohz balanced cpus sched: Fix spelling of sibling sched, cpuset: Drop __cpuexit from cpu hotplug callbacks sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check() sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand() sched: thread_group_cputime: Simplify, document the "alive" check sched: Remove the obsolete exit_state/signal hacks sched: task_tick_rt: Remove the obsolete ->signal != NULL check sched: __sched_setscheduler: Read the RLIMIT_RTPRIO value lockless sched: Fix comments to make them DocBook happy sched: Fix fix_small_capacity powerpc: Exclude arch_sd_sibiling_asym_packing() on UP powerpc: Enable asymmetric SMT scheduling on POWER7 sched: Add asymmetric group packing option for sibling domain sched: Fix capacity calculations for SMT4 sched: Change nohz idle load balancing logic to push model ...	2010-08-06 09:39:22 -07:00
K.Prasad	5aae8a5370	powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors Implement perf-events based hw-breakpoint interfaces for PowerPC 64-bit server (Book III S) processors. This allows access to a given location to be used as an event that can be counted or profiled by the perf_events subsystem. This is done using the DABR (data breakpoint register), which can also be used for process debugging via ptrace. When perf_event hw_breakpoint support is configured in, the perf_event subsystem manages the DABR and arbitrates access to it, and ptrace then creates a perf_event when it is requested to set a data breakpoint. [Adopted suggestions from Paul Mackerras <paulus@samba.org> to - emulate_step() all system-wide breakpoints and single-step only the per-task breakpoints - perform arch-specific cleanup before unregistration through arch_unregister_hw_breakpoint() ] Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2010-06-22 19:40:50 +10:00
Michael Neuling	76cbd8a8f8	powerpc: Enable asymmetric SMT scheduling on POWER7 The POWER7 core has dynamic SMT mode switching which is controlled by the hypervisor. There are 3 SMT modes: SMT1 uses thread 0 SMT2 uses threads 0 & 1 SMT4 uses threads 0, 1, 2 & 3 When in any particular SMT mode, all threads have the same performance as each other (ie. at any moment in time, all threads perform the same). The SMT mode switching works such that when linux has threads 2 & 3 idle and 0 & 1 active, it will cede (H_CEDE hypercall) threads 2 and 3 in the idle loop and the hypervisor will automatically switch to SMT2 for that core (independent of other cores). The opposite is not true, so if threads 0 & 1 are idle and 2 & 3 are active, we will stay in SMT4 mode. Similarly if thread 0 is active and threads 1, 2 & 3 are idle, we'll go into SMT1 mode. If we can get the core into a lower SMT mode (SMT1 is best), the threads will perform better (since they share less core resources). Hence when we have idle threads, we want them to be the higher ones. This adds a feature bit for asymmetric packing to powerpc and then enables it on POWER7. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@ozlabs.org LKML-Reference: <20100608045702.31FB5CC8C7@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-09 11:13:14 +02:00
Scott Wood	fe04b11215	powerpc/e500mc: Implement machine check handler. Most of the MSCR bit assigments are different in e500mc versus e500, and they are now write-one-to-clear. Some e500mc machine check conditions are made recoverable (as long as they aren't stuck on), most notably L1 instruction cache parity errors. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2010-05-21 07:41:52 -05:00
Dave Kleikamp	fc5e709731	powerpc/476: add machine check handler for 47x core The 47x core's MCSR varies from 44x, so it needs it's own machine check handler. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2010-05-05 09:27:22 -04:00
Dave Kleikamp	e7f75ad01d	powerpc/47x: Base ppc476 support This patch adds the base support for the 476 processor. The code was primarily written by Ben Herrenschmidt and Torez Smith, but I've been maintaining it for a while. The goal is to have a single binary that will run on 44x and 47x, but we still have some details to work out. The biggest is that the L1 cache line size differs on the two platforms, but it's currently a compile-time option. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Torez Smith <lnxtorez@linux.vnet.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2010-05-05 09:11:10 -04:00
Anton Blanchard	5a0e9b5718	powerpc: Use lwsync for acquire barrier if CPU supports it Nick Piggin discovered that lwsync barriers around locks were faster than isync on 970. That was a long time ago and I completely dropped the ball in testing his patches across other ppc64 processors. Turns out the idea helps on other chips. Using a microbenchmark that uses a lot of threads to contend on a global pthread mutex (and therefore a global futex), POWER6 improves 8% and POWER7 improves 2%. I checked POWER5 and while I couldn't measure an improvement, there was no regression. This patch uses the lwsync patching code to replace the isyncs with lwsyncs on CPUs that support the instruction. We were marking POWER3 and RS64 as lwsync capable but in reality they treat it as a full sync (ie slow). Remove the CPU_FTR_LWSYNC bit from these CPUs so they continue to use the faster isync method. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-17 14:03:16 +11:00
Benjamin Herrenschmidt	9e41d9597e	Merge commit 'origin/master' into next	2009-03-24 13:38:30 +11:00
Piotr Ziecik	c9310920e6	powerpc/5200: Enable CPU_FTR_NEED_COHERENT for MPC52xx BestComm, a DMA engine in MPC52xx SoC, requires snooping when CPU caches are enabled to work properly. Adding CPU_FTR_NEED_COHERENT fixes NFS problems on MPC52xx machines introduced by 'powerpc/mm: Fix handling of _PAGE_COHERENT in BAT setup code' (sha1: `4c456a67f5`). Signed-off-by: Piotr Ziecik <kosmo@semihalf.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2009-03-17 09:17:50 -06:00
Kumar Gala	620165f971	powerpc: Add support for using doorbells for SMP IPI The e500mc supports the new msgsnd/doorbell mechanisms that were added in the Power ISA 2.05 architecture. We use the normal level doorbell for doing SMP IPIs at this point. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:03 +11:00
Benjamin Herrenschmidt	7c03d653cd	powerpc/mm: Introduce MMU features We're soon running out of CPU features and I need to add some new ones for various MMU related bits, so this patch separates the MMU features from the CPU features. I moved over the 32-bit MMU related ones, added base features for MMU type families, but didn't move over any 64-bit only feature yet. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	6d2170be45	powerpc/4xx: Extended DCR support v2 This adds supports to the "extended" DCR addressing via the indirect mfdcrx/mtdcrx instructions supported by some 4xx cores (440H6 and later). I enabled the feature for now only on AMCC 460 chips. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:15 +11:00
Benjamin Herrenschmidt	8309ce7280	powerpc: Fix bogus cache flushing on all 40x and BookE processors v2 We were missing the CPU_FTR_NOEXECUTE bit in our cputable for all these processors. The result is that update_mmu_cache() would flush the cache for all pages mapped to userspace which is totally unnecessary on those processors since we already handle flushing on execute in the page fault path. This should provide a nice speed up ;-) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-12-15 14:29:37 -06:00
Mark Nelson	4ec577a289	powerpc: Add new CPU feature: CPU_FTR_UNALIGNED_LD_STD Add a new CPU feature bit, CPU_FTR_UNALIGNED_LD_STD, to be added to the 64bit powerpc chips that can do unaligned load double and store double without any performance hit. This is added to Power6 and Cell and will be used in the next commit to disable the code that gets the destination address aligned on those CPUs where doing that doesn't improve performance. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-05 22:08:28 +11:00
Mark Nelson	2a9294369b	powerpc: Add new CPU feature: CPU_FTR_CP_USE_DCBTZ Add a new CPU feature bit, CPU_FTR_CP_USE_DCBTZ, to be added to the 64bit powerpc chips that benefit from having dcbt and dcbz instructions used in their memory copy routines. This will be used in a subsequent patch that updates copy_4K_page(). The new bit is added to Cell, PPC970 and Power4 because they show better performance with the new copy_4K_page() when dcbt and dcbz instructions are used. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-09-15 11:07:38 -07:00
Benjamin Herrenschmidt	b950bdd0fc	powerpc: Expose PMCs & cache topology in sysfs on 32-bit The file arch/powerpc/kernel/sysfs.c is currently only compiled for 64-bit kernels. It contain code to register CPU sysdevs in sysfs and add various properties such as cache topology and raw access by root to performance monitor counters (PMCs). A lot of that can be re-used as is on 32-bits. This makes the file be built for both, with appropriate ifdef'ing for the few bits that are really 64-bit specific, and adds some support for the raw PMCs for 75x and 74xx processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-20 16:34:58 +10:00
Stephen Rothwell	b8b572e101	powerpc: Move include files to arch/powerpc/include/asm from include/asm-powerpc. This is the result of a mkdir arch/powerpc/include/asm git mv include/asm-powerpc/* arch/powerpc/include/asm Followed by a few documentation/comment fixups and a couple of places where <asm-powepc/...> was being used explicitly. Of the latter only one was outside the arch code and it is a driver only built for powerpc. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-04 12:02:00 +10:00

20 Commits