linux

Author	SHA1	Message	Date
Linus Torvalds	5f3d2f2e1a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Benjamin Herrenschmidt: "Some highlights in addition to the usual batch of fixes: - 64TB address space support for 64-bit processes by Aneesh Kumar - Gavin Shan did a major cleanup & re-organization of our EEH support code (IBM fancy PCI error handling & recovery infrastructure) which paves the way for supporting different platform backends, along with some rework of the PCIe code for the PowerNV platform in order to remove home made resource allocations and instead use the generic code (which is possible after some small improvements to it done by Gavin). - Uprobes support by Ananth N Mavinakayanahalli - A pile of embedded updates from Freescale folks, including new SoC and board supports, more KVM stuff including preparing for 64-bit BookE KVM support, ePAPR 1.1 updates, etc..." Fixup trivial conflicts in drivers/scsi/ipr.c * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (146 commits) powerpc/iommu: Fix multiple issues with IOMMU pools code powerpc: Fix VMX fix for memcpy case driver/mtd:IFC NAND:Initialise internal SRAM before any write powerpc/fsl-pci: use 'Header Type' to identify PCIE mode powerpc/eeh: Don't release eeh_mutex in eeh_phb_pe_get powerpc: Remove tlb batching hack for nighthawk powerpc: Set paca->data_offset = 0 for boot cpu powerpc/perf: Sample only if SIAR-Valid bit is set in P7+ powerpc/fsl-pci: fix warning when CONFIG_SWIOTLB is disabled powerpc/mpc85xx: Update interrupt handling for IFC controller powerpc/85xx: Enable USB support in p1023rds_defconfig powerpc/smp: Do not disable IPI interrupts during suspend powerpc/eeh: Fix crash on converting OF node to edev powerpc/eeh: Lock module while handling EEH event powerpc/kprobe: Don't emulate store when kprobe stwu r1 powerpc/kprobe: Complete kprobe and migrate exception frame powerpc/kprobe: Introduce a new thread flag powerpc: Remove unused __get_user64() and __put_user64() powerpc/eeh: Global mutex to protect PE tree powerpc/eeh: Remove EEH PE for normal PCI hotplug ...	2012-10-06 03:16:12 +09:00
Linus Torvalds	0b981cb94b	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "Continued quest to clean up and enhance the cputime code by Frederic Weisbecker, in preparation for future tickless kernel features. Other than that, smallish changes." Fix up trivial conflicts due to additions next to each other in arch/{x86/}Kconfig * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) cputime: Make finegrained irqtime accounting generally available cputime: Gather time/stats accounting config options into a single menu ia64: Reuse system and user vtime accounting functions on task switch ia64: Consolidate user vtime accounting vtime: Consolidate system/idle context detection cputime: Use a proper subsystem naming for vtime related APIs sched: cpu_power: enable ARCH_POWER sched/nohz: Clean up select_nohz_load_balancer() sched: Fix load avg vs. cpu-hotplug sched: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW sched: Fix nohz_idle_balance() sched: Remove useless code in yield_to() sched: Add time unit suffix to sched sysctl knobs sched/debug: Limit sd->*_idx range on sysctl sched: Remove AFFINE_WAKEUPS feature flag s390: Remove leftover account_tick_vtime() header cputime: Consolidate vtime handling on context switch sched: Move cputime code to its own file cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING tile: Remove SD_PREFER_LOCAL leftover ...	2012-10-01 10:43:39 -07:00
Michael Neuling	4474ef055c	powerpc: Rework set_dabr so it can take a DABRX value as well Rework set_dabr to take a DABRX value as well. Both the pseries and PS3 hypervisors do some checks on the DABRX values that are passed in the hcall. This patch stops bogus values from being passed to hypervisor. Also, in the case where we are clearing the breakpoint, where DABR and DABRX are zero, we modify the DABRX value to make it valid so that the hcall won't fail. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-10 09:59:10 +10:00
Benjamin Herrenschmidt	fff34b3412	Merge branch 'merge' into next Brings in various bug fixes from 3.6-rcX	2012-09-07 09:48:59 +10:00
Anton Blanchard	1021cb268b	powerpc: Fix DSCR inheritance in copy_thread() If the default DSCR is non zero we set thread.dscr_inherit in copy_thread() meaning the new thread and all its children will ignore future updates to the default DSCR. This is not intended and is a change in behaviour that a number of our users have hit. We just need to inherit thread.dscr and thread.dscr_inherit from the parent which ends up being much simpler. This was found with the following test case: http://ozlabs.org/~anton/junkcode/dscr_default_test.c Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@kernel.org> # 3.0+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-05 16:05:21 +10:00
Ananth N Mavinakayanahalli	41ab5266c3	powerpc: Add trap_nr to thread_struct Add thread_struct.trap_nr and use it to store the last exception the thread experienced. In this patch, we populate the field at various places where we force_sig_info() to the process. This is also used in uprobes to determine if the probed instruction caused an exception. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-05 15:19:36 +10:00
Frederic Weisbecker	baa36046d0	cputime: Consolidate vtime handling on context switch The archs that implement virtual cputime accounting all flush the cputime of a task when it gets descheduled and sometimes set up some ground initialization for the next task to account its cputime. These archs all put their own hooks in their context switch callbacks and handle the off-case themselves. Consolidate this by creating a new account_switch_vtime() callback called in generic code right after a context switch and that these archs must implement to flush the prev task cputime and initialize the next task cputime related state. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>	2012-08-20 13:05:28 +02:00
Linus Torvalds	ec0d7f18ab	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull fpu state cleanups from Ingo Molnar: "This tree streamlines further aspects of FPU handling by eliminating the prepare_to_copy() complication and moving that logic to arch_dup_task_struct(). It also fixes the FPU dumps in threaded core dumps, removes and old (and now invalid) assumption plus micro-optimizes the exit path by avoiding an FPU save for dead tasks." Fixed up trivial add-add conflict in arch/sh/kernel/process.c that came in because we now do the FPU handling in arch_dup_task_struct() rather than the legacy (and now gone) prepare_to_copy(). * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, fpu: drop the fpu state during thread exit x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() coredump: ensure the fpu state is flushed for proper multi-threaded core dump fork: move the real prepare_to_copy() users to arch_dup_task_struct()	2012-05-23 10:59:07 -07:00
Linus Torvalds	6f73b3629f	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Benjamin Herrenschmidt: "Here are the powerpc goodies for 3.5. Main highlights are: - Support for the NX crypto engine in Power7+ - A bunch of Anton goodness, including some micro optimization of our syscall entry on Power7 - I converted a pile of our thermal control drivers to the new i2c APIs (essentially turning the old therm_pm72 into a proper set of windfarm drivers). That's one more step toward removing the deprecated i2c APIs, there's still a few drivers to fix, but we are getting close - kexec/kdump support for 47x embedded cores The big missing thing here is no updates from Freescale. Not sure what's up here, but with Kumar not working for them anymore things are a bit in a state of flux in that area." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (71 commits) powerpc: Fix irq distribution Revert "powerpc/hw-breakpoint: Use generic hw-breakpoint interfaces for new PPC ptrace flags" powerpc: Fixing a cputhread code documentation powerpc/crypto: Enable the PFO-based encryption device powerpc/crypto: Build files for the nx device driver powerpc/crypto: debugfs routines and docs for the nx device driver powerpc/crypto: SHA512 hash routines for nx encryption powerpc/crypto: SHA256 hash routines for nx encryption powerpc/crypto: AES-XCBC mode routines for nx encryption powerpc/crypto: AES-GCM mode routines for nx encryption powerpc/crypto: AES-ECB mode routines for nx encryption powerpc/crypto: AES-CTR mode routines for nx encryption powerpc/crypto: AES-CCM mode routines for nx encryption powerpc/crypto: AES-CBC mode routines for nx encryption powerpc/crypto: nx driver code supporting nx encryption powerpc/pseries: Enable the PFO-based RNG accelerator powerpc/pseries/hwrng: PFO-based hwrng driver powerpc/pseries: Add PFO support to the VIO bus powerpc/pseries: Add pseries update notifier for OFDT prop changes powerpc/pseries: Add new hvcall constants to support PFO ...	2012-05-23 09:02:42 -07:00
Suresh Siddha	55ccf3fe3f	fork: move the real prepare_to_copy() users to arch_dup_task_struct() Historical prepare_to_copy() is mostly a no-op, duplicated for majority of the architectures and the rest following the x86 model of flushing the extended register state like fpu there. Remove it and use the arch_dup_task_struct() instead. Suggested-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1336692811-30576-1-git-send-email-suresh.b.siddha@intel.com Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Chris Zankel <chris@zankel.net> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-05-16 15:16:26 -07:00
Thomas Gleixner	96c9511797	powerpc: Use common threadinfo allocator The core now has a threadinfo allocator which uses a kmemcache when THREAD_SIZE < PAGE_SIZE. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: http://lkml.kernel.org/r/20120505150142.059161130@linutronix.de	2012-05-08 14:08:45 +02:00
Anton Blanchard	35000870fc	powerpc: Optimise enable_kernel_altivec Add two optimisations to enable_kernel_altivec: - enable_kernel_altivec has already determined if we need to save the previous task's state but we call giveup_altivec in both cases, requiring an extra branch in giveup_altivec. Create giveup_altivec_notask which only turns on the VMX bit in the MSR. - We write the VMX MSR bit each time we call enable_kernel_altivec even it was already set. Check the bit and branch out if we have already set it. The classic case for this is vectored IO where we have to copy multiple buffers to or from userspace. The following testcase was used to confirm this patch improves performance: http://ozlabs.org/~anton/junkcode/copy_to_user.c Since the current breakpoint for using VMX in copy_tofrom_user is 4096 bytes, I'm using buffers of 4096 + 1 cacheline (4224) bytes. A benchmark of 16 entry readvs (-s 16): time copy_to_user -l 4224 -s 16 -i 1000000 completes 5.2% faster on a POWER7 PS700. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-04-30 15:37:17 +10:00
Benjamin Herrenschmidt	fae2e0fb24	powerpc: Fix typo in runlatch code Commit `fe1952fc0a` "powerpc: Rework runlatch code" has a nasty typo where it uses "TLF_RUNLATCH" instead of "_TLF_RUNLATCH" (bit number instead of bit mask), causing some flags to be potentially lost such as _TLF_RESTORE_SIGMASK (Brown paper bag for me ! We should be able to make that break at compile time with a bit of magic, any volunteer ?) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-04-11 10:42:15 +10:00
David Howells	ae3a197e3d	Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PowerPC. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> cc: linuxppc-dev@lists.ozlabs.org	2012-03-28 18:30:02 +01:00
Benjamin Herrenschmidt	7230c56441	powerpc: Rework lazy-interrupt handling The current implementation of lazy interrupts handling has some issues that this tries to address. We don't do the various workarounds we need to do when re-enabling interrupts in some cases such as when returning from an interrupt and thus we may still lose or get delayed decrementer or doorbell interrupts. The current scheme also makes it much harder to handle the external "edge" interrupts provided by some BookE processors when using the EPR facility (External Proxy) and the Freescale Hypervisor. Additionally, we tend to keep interrupts hard disabled in a number of cases, such as decrementer interrupts, external interrupts, or when a masked decrementer interrupt is pending. This is sub-optimal. This is an attempt at fixing it all in one go by reworking the way we do the lazy interrupt disabling from the ground up. The base idea is to replace the "hard_enabled" field with a "irq_happened" field in which we store a bit mask of what interrupt occurred while soft-disabled. When re-enabling, either via arch_local_irq_restore() or when returning from an interrupt, we can now decide what to do by testing bits in that field. We then implement replaying of the missed interrupts either by re-using the existing exception frame (in exception exit case) or via the creation of a new one from an assembly trampoline (in the arch_local_irq_enable case). This removes the need to play with the decrementer to try to create fake interrupts, among others. In addition, this adds a few refinements: - We no longer hard disable decrementer interrupts that occur while soft-disabled. We now simply bump the decrementer back to max (on BookS) or leave it stopped (on BookE) and continue with hard interrupts enabled, which means that we'll potentially get better sample quality from performance monitor interrupts. - Timer, decrementer and doorbell interrupts now hard-enable shortly after removing the source of the interrupt, which means they no longer run entirely hard disabled. Again, this will improve perf sample quality. - On Book3E 64-bit, we now make the performance monitor interrupt act as an NMI like Book3S (the necessary C code for that to work appear to already be present in the FSL perf code, notably calling nmi_enter instead of irq_enter). (This also fixes a bug where BookE perfmon interrupts could clobber r14 ... oops) - We could make "masked" decrementer interrupts act as NMIs when doing timer-based perf sampling to improve the sample quality. Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2: - Add hard-enable to decrementer, timer and doorbells - Fix CR clobber in masked irq handling on BookE - Make embedded perf interrupt act as an NMI - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want to retrigger an interrupt without preventing hard-enable v3: - Fix or vs. ori bug on Book3E - Fix enabling of interrupts for some exceptions on Book3E v4: - Fix resend of doorbells on return from interrupt on Book3E v5: - Rebased on top of my latest series, which involves some significant rework of some aspects of the patch. v6: - 32-bit compile fix - more compile fixes with various .config combos - factor out the asm code to soft-disable interrupts - remove the C wrapper around preempt_schedule_irq v7: - Fix a bug with hard irq state tracking on native power7	2012-03-09 13:25:06 +11:00
Benjamin Herrenschmidt	fe1952fc0a	powerpc: Rework runlatch code This moves the inlines into system.h and changes the runlatch code to use the thread local flags (non-atomic) rather than the TIF flags (atomic) to keep track of the latch state. The code to turn it back on in an asynchronous interrupt is now simplified and partially inlined. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 10:55:02 +11:00
Ira Snyder	40c8cefaaf	powerpc: Fix kernel log of oops/panic instruction dump A kernel oops/panic prints an instruction dump showing several instructions before and after the instruction which caused the oops/panic. The code intended that the faulting instruction be enclosed in angle brackets, however a bug caused the faulting instruction to be interpreted by printk() as the message log level. To fix this, the KERN_CONT log level is added before the actual text of the printed message. === Before the patch === [ 1081.587266] Instruction dump: [ 1081.590236] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001 [ 1081.598034] 3d20c03a 9009a114 7c0004ac 39200000 [ 1081.602500] 4e800020 3803ffd0 2b800009 <4>[ 1081.587266] Instruction dump: <4>[ 1081.590236] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001 <4>[ 1081.598034] 3d20c03a 9009a114 7c0004ac 39200000 <98090000>[ 1081.602500] 4e800020 3803ffd0 2b800009 === After the patch === [ 51.385216] Instruction dump: [ 51.388186] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001 [ 51.395986] 3d20c03a 9009a114 7c0004ac 39200000 <98090000> 4e800020 3803ffd0 2b800009 <4>[ 51.385216] Instruction dump: <4>[ 51.388186] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001 <4>[ 51.395986] 3d20c03a 9009a114 7c0004ac 39200000 <98090000> 4e800020 3803ffd0 2b800009 Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-02-16 16:11:23 +11:00
Anton Blanchard	3bfd0c9c8f	powerpc: Decode correct MSR bits in oops output On a 64bit book3s machine I have an oops from a system reset that claims the book3e CE bit was set: MSR: 8000000000021032 <ME,CE,IR,DR> CR: 24004082 XER: 00000010 On a book3s machine system reset sets IBM bit 46 and 47 depending on the power saving mode. Separate the definitions by type and for completeness add the rest of the bits in. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-11-28 11:42:09 +11:00
Kumar Gala	187b9f2aa7	powerpc/book3e-64: Fix debug support for userspace With the introduction of CONFIG_PPC_ADV_DEBUG_REGS user space debug is broken on Book-E 64-bit parts that support delayed debug events. When switch_booke_debug_regs() sets DBCR0 we'll start getting debug events as MSR_DE is also set and we aren't able to handle debug events from kernel space. We can remove the hack that always enables MSR_DE and loads up DBCR0 and just utilize switch_booke_debug_regs() to get user space debug working again. We still need to handle critical/debug exception stacks & proper save/restore of state for those exception levles to support debug events from kernel space like we have on 32-bit. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-11-17 16:26:07 +11:00
Kumar Gala	ba28c9aae2	powerpc: Revert show_regs() define for readability We had an existing ifdef for 4xx & BOOKE processors that got changed to CONFIG_PPC_ADV_DEBUG_REGS. The define has nothing to do with CONFIG_PPC_ADV_DEBUG_REGS. The define really should be: #if defined(CONFIG_4xx) \|\| defined(CONFIG_BOOKE) and not #ifdef CONFIG_PPC_ADV_DEBUG_REGS Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-11-17 16:26:07 +11:00
Paul Gortmaker	4b16f8e2d6	powerpc: various straight conversions from module.h --> export.h All these files were including module.h just for the basic EXPORT_SYMBOL infrastructure. We can shift them off to the export.h header which is a way smaller footprint and thus realize some compile time gains. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:30:44 -04:00
Linus Torvalds	184475029a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (99 commits) drivers/virt: add missing linux/interrupt.h to fsl_hypervisor.c powerpc/85xx: fix mpic configuration in CAMP mode powerpc: Copy back TIF flags on return from softirq stack powerpc/64: Make server perfmon only built on ppc64 server devices powerpc/pseries: Fix hvc_vio.c build due to recent changes powerpc: Exporting boot_cpuid_phys powerpc: Add CFAR to oops output hvc_console: Add kdb support powerpc/pseries: Fix hvterm_raw_get_chars to accept < 16 chars, fixing xmon powerpc/irq: Quieten irq mapping printks powerpc: Enable lockup and hung task detectors in pseries and ppc64 defeconfigs powerpc: Add mpt2sas driver to pseries and ppc64 defconfig powerpc: Disable IRQs off tracer in ppc64 defconfig powerpc: Sync pseries and ppc64 defconfigs powerpc/pseries/hvconsole: Fix dropped console output hvc_console: Improve tty/console put_chars handling powerpc/kdump: Fix timeout in crash_kexec_wait_realmode powerpc/mm: Fix output of total_ram. powerpc/cpufreq: Add cpufreq driver for Momentum Maple boards powerpc: Correct annotations of pmu registration functions ... Fix up trivial Kconfig/Makefile conflicts in arch/powerpc, drivers, and drivers/cpufreq	2011-07-25 22:59:39 -07:00
Michael Neuling	5115a026ce	powerpc: Add CFAR to oops output Now we have the CFAR saved add it to the oops output. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-07-19 15:13:33 +10:00
Mathias Krause	0e0ebdb9c2	powerpc: Remove redundant set_fs(USER_DS) The address limit is already set in flush_old_exec() so this set_fs(USER_DS) is redundant. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-07-19 15:12:40 +10:00
Paul Mackerras	de56a948b9	KVM: PPC: Add support for Book3S processors in hypervisor mode This adds support for KVM running on 64-bit Book 3S processors, specifically POWER7, in hypervisor mode. Using hypervisor mode means that the guest can use the processor's supervisor mode. That means that the guest can execute privileged instructions and access privileged registers itself without trapping to the host. This gives excellent performance, but does mean that KVM cannot emulate a processor architecture other than the one that the hardware implements. This code assumes that the guest is running paravirtualized using the PAPR (Power Architecture Platform Requirements) interface, which is the interface that IBM's PowerVM hypervisor uses. That means that existing Linux distributions that run on IBM pSeries machines will also run under KVM without modification. In order to communicate the PAPR hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code to include/linux/kvm.h. Currently the choice between book3s_hv support and book3s_pr support (i.e. the existing code, which runs the guest in user mode) has to be made at kernel configuration time, so a given kernel binary can only do one or the other. This new book3s_hv code doesn't support MMIO emulation at present. Since we are running paravirtualized guests, this isn't a serious restriction. With the guest running in supervisor mode, most exceptions go straight to the guest. We will never get data or instruction storage or segment interrupts, alignment interrupts, decrementer interrupts, program interrupts, single-step interrupts, etc., coming to the hypervisor from the guest. Therefore this introduces a new KVMTEST_NONHV macro for the exception entry path so that we don't have to do the KVM test on entry to those exception handlers. We do however get hypervisor decrementer, hypervisor data storage, hypervisor instruction storage, and hypervisor emulation assist interrupts, so we have to handle those. In hypervisor mode, real-mode accesses can access all of RAM, not just a limited amount. Therefore we put all the guest state in the vcpu.arch and use the shadow_vcpu in the PACA only for temporary scratch space. We allocate the vcpu with kzalloc rather than vzalloc, and we don't use anything in the kvmppc_vcpu_book3s struct, so we don't allocate it. We don't have a shared page with the guest, but we still need a kvm_vcpu_arch_shared struct to store the values of various registers, so we include one in the vcpu_arch struct. The POWER7 processor has a restriction that all threads in a core have to be in the same partition. MMU-on kernel code counts as a partition (partition 0), so we have to do a partition switch on every entry to and exit from the guest. At present we require the host and guest to run in single-thread mode because of this hardware restriction. This code allocates a hashed page table for the guest and initializes it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We require that the guest memory is allocated using 16MB huge pages, in order to simplify the low-level memory management. This also means that we can get away without tracking paging activity in the host for now, since huge pages can't be paged or swapped. This also adds a few new exports needed by the book3s_hv code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:54 +03:00
yu liu	685659ee70	powerpc/e500: Save SPEFCSR in flush_spe_to_thread() giveup_spe() saves the SPE state which is protected by MSR[SPE]. However, modifying SPEFSCR does not trap when MSR[SPE]=0. And since SPEFSCR is already saved/restored in _switch(), not all the callers want to save SPEFSCR again. Thus, saving SPEFSCR should not belong to giveup_spe(). This patch moves SPEFSCR saving to flush_spe_to_thread(), and cleans up the caller that needs to save SPEFSCR accordingly. Signed-off-by: Liu Yu <yu.liu@freescale.com> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:30 +03:00
Peter Zijlstra	d6bf29b44d	powerpc: mmu_gather rework Fix up powerpc to the new mmu_gather stuff. PPC has an extra batching queue to RCU free the actual pagetable allocations, use the ARCH extentions for that for now. For the ppc64_tlb_batch, which tracks the vaddrs to unhash from the hardware hash-table, keep using per-cpu arrays but flush on context switch and use a TLF bit to track the lazy_mmu state. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:13 -07:00
Matt Evans	44ae3ab335	powerpc: Free up some CPU feature bits by moving out MMU-related features Some of the 64bit PPC CPU features are MMU-related, so this patch moves them to MMU_FTR_ bits. All cpu_has_feature()-style tests are moved to mmu_has_feature(), and seven feature bits are freed as a result. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-27 14:18:52 +10:00
Alexey Kardashevskiy	efcac6589a	powerpc: Per process DSCR + some fixes (try#4) The DSCR (aka Data Stream Control Register) is supported on some server PowerPC chips and allow some control over the prefetch of data streams. This patch allows the value to be specified per thread by emulating the corresponding mfspr and mtspr instructions. Children of such threads inherit the value. Other threads use a default value that can be specified in sysfs - /sys/devices/system/cpu/dscr_default. If a thread starts with non default value in the sysfs entry, all children threads inherit this non default value even if the sysfs value is changed later. Signed-off-by: Alexey Kardashevskiy <aik@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-27 14:18:19 +10:00
Eric Dumazet	b6a84016bd	mm: NUMA aware alloc_thread_info_node() Add a node parameter to alloc_thread_info(), and change its name to alloc_thread_info_node() This change is needed to allow NUMA aware kthread_create_on_cpu() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-22 17:44:01 -07:00
K.Prasad	e0780b720f	powerpc: Fix call to flush_ptrace_hw_breakpoint() Fix the error in spelling the config option for hw-breakpoints and fix the build issue that follows. Signed-off by: K.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-03-02 14:56:49 +11:00
Anton Blanchard	7071854bb2	powerpc: Print 32 bits of DSISR in show_regs We were printing 64 bits of DSISR in show_regs even though it is 32 bit. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-01-21 14:08:37 +11:00
Paul Mackerras	cf9efce0ce	powerpc: Account time using timebase rather than PURR Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the PURR register for measuring the user and system time used by processes, as well as other related times such as hardirq and softirq times. This turns out to be quite confusing for users because it means that a program will often be measured as taking less time when run on a multi-threaded processor (SMT2 or SMT4 mode) than it does when run on a single-threaded processor (ST mode), even though the program takes longer to finish. The discrepancy is accounted for as stolen time, which is also confusing, particularly when there are no other partitions running. This changes the accounting to use the timebase instead, meaning that the reported user and system times are the actual number of real-time seconds that the program was executing on the processor thread, regardless of which SMT mode the processor is in. Thus a program will generally show greater user and system times when run on a multi-threaded processor than on a single-threaded processor. On pSeries systems on POWER5 or later processors, we measure the stolen time (time when this partition wasn't running) using the hypervisor dispatch trace log. We check for new entries in the log on every entry from user mode and on every transition from kernel process context to soft or hard IRQ context (i.e. when account_system_vtime() gets called). So that we can correctly distinguish time stolen from user time and time stolen from system time, without having to check the log on every exit to user mode, we store separate timestamps for exit to user mode and entry from user mode. On systems that have a SPURR (POWER6 and POWER7), we read the SPURR in account_system_vtime() (as before), and then apportion the SPURR ticks since the last time we read it between scaled user time and scaled system time according to the relative proportions of user time and system time over the same interval. This avoids having to read the SPURR on every kernel entry and exit. On systems that have PURR but not SPURR (i.e., POWER5), we do the same using the PURR rather than the SPURR. This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl for now since it conflicts with the use of the dispatch trace log by the time accounting code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-09-02 14:07:31 +10:00
Michael Neuling	e1f0ece113	powerpc: Move arch_sd_sibling_asym_packing() to smp.c Simple cleanup by moving arch_sd_sibling_asym_packing from process.c to smp.c to save an #ifdef CONFIG_SMP No functionality change. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-09-02 14:07:31 +10:00
Anton Blanchard	4138d65333	powerpc: Inline ppc64_runlatch_off I'm sick of seeing ppc64_runlatch_off in our profiles, so inline it into the callers. To avoid a mess of circular includes I didn't add it as an inline function. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-24 15:26:30 +10:00
Denis Kirjanov	9904b00593	powerpc: Use is_32bit_task() helper to test 32 bit binary Use is_32bit_task() helper to test 32 bit binary. Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-24 15:26:27 +10:00
David Howells	d7627467b7	Make do_execve() take a const filename pointer Make do_execve() take a const filename pointer so that kernel_execve() compiles correctly on ARM: arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type This also requires the argv and envp arguments to be consted twice, once for the pointer array and once for the strings the array points to. This is because do_execve() passes a pointer to the filename (now const) to copy_strings_kernel(). A simpler alternative would be to cast the filename pointer in do_execve() when it's passed to copy_strings_kernel(). do_execve() may not change any of the strings it is passed as part of the argv or envp lists as they are some of them in .rodata, so marking these strings as const should be fine. Further kernel_execve() and sys_execve() need to be changed to match. This has been test built on x86_64, frv, arm and mips. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-17 18:07:43 -07:00
David Howells	c788732523	Mark arguments to certain syscalls as being const Mark arguments to certain system calls as being const where they should be but aren't. The list includes: () The filename arguments of various stat syscalls, execve(), various utimes syscalls and some mount syscalls. () The filename arguments of some syscall helpers relating to the above. (*) The buffer argument of various write syscalls. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-13 16:53:13 -07:00
Linus Torvalds	c4efd6b569	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits) sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug sched: No need for bootmem special cases sched: Revert nohz_ratelimit() for now sched: Reduce update_group_power() calls sched: Update rq->clock for nohz balanced cpus sched: Fix spelling of sibling sched, cpuset: Drop __cpuexit from cpu hotplug callbacks sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check() sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand() sched: thread_group_cputime: Simplify, document the "alive" check sched: Remove the obsolete exit_state/signal hacks sched: task_tick_rt: Remove the obsolete ->signal != NULL check sched: __sched_setscheduler: Read the RLIMIT_RTPRIO value lockless sched: Fix comments to make them DocBook happy sched: Fix fix_small_capacity powerpc: Exclude arch_sd_sibiling_asym_packing() on UP powerpc: Enable asymmetric SMT scheduling on POWER7 sched: Add asymmetric group packing option for sibling domain sched: Fix capacity calculations for SMT4 sched: Change nohz idle load balancing logic to push model ...	2010-08-06 09:39:22 -07:00
Ingo Molnar	dca45ad8af	Merge branch 'linus' into sched/core Merge reason: Move from the -rc3 to the almost-rc6 base. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-07-21 21:45:08 +02:00
Benjamin Herrenschmidt	a2e198116f	powerpc/book3e: Hack to get gdb moving along on Book3E 64-bit Our handling of debug interrupts on Book3E 64-bit is not quite the way it should be just yet. This is a workaround to let gdb work at least for now. We ensure that when context switching, we set the appropriate DBCR0 value for the new task. We also make sure that we turn off MSR[DE] within the kernel, and set it as part of the bits that get set when going back to userspace. In the long run, we will probably set the userspace DBCR0 on the exception exit code path and ensure we have some proper kernel value to set on the way into the kernel, a bit like ppc32 does, but that will take more work. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-09 15:24:47 +10:00
Benjamin Herrenschmidt	5f07aa7524	Merge commit 'paulus-perf/master' into next	2010-07-09 11:25:48 +10:00
Michael Neuling	2ec57d448b	sched: Fix spelling of sibling No logic changes, only spelling. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: linuxppc-dev@ozlabs.org Cc: David Howells <dhowells@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <15249.1277776921@neuling.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-29 10:44:29 +02:00
K.Prasad	5aae8a5370	powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors Implement perf-events based hw-breakpoint interfaces for PowerPC 64-bit server (Book III S) processors. This allows access to a given location to be used as an event that can be counted or profiled by the perf_events subsystem. This is done using the DABR (data breakpoint register), which can also be used for process debugging via ptrace. When perf_event hw_breakpoint support is configured in, the perf_event subsystem manages the DABR and arbitrates access to it, and ptrace then creates a perf_event when it is requested to set a data breakpoint. [Adopted suggestions from Paul Mackerras <paulus@samba.org> to - emulate_step() all system-wide breakpoints and single-step only the per-task breakpoints - perform arch-specific cleanup before unregistration through arch_unregister_hw_breakpoint() ] Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2010-06-22 19:40:50 +10:00
Christoph Hellwig	f1ba9a5b2a	powerpc: Unconditionally enabled irq stacks Irq stacks provide an essential protection from stack overflows through external interrupts, at the cost of two additionals stacks per CPU. Enable them unconditionally to simplify the kernel build and prevent people from accidentally disabling them. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-06-15 15:02:37 +10:00
Peter Zijlstra	89275d59b5	powerpc: Exclude arch_sd_sibiling_asym_packing() on UP Only SMP systems care about load-balance features, plus this saves some .text space on UP and also fixes the build. Reported-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michael Neuling <mikey@neuling.org> LKML-Reference: <tip-76cbd8a8f8b0dddbff89a6708bd5bd13c0d21a00@git.kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-09 16:31:39 +02:00
Michael Neuling	76cbd8a8f8	powerpc: Enable asymmetric SMT scheduling on POWER7 The POWER7 core has dynamic SMT mode switching which is controlled by the hypervisor. There are 3 SMT modes: SMT1 uses thread 0 SMT2 uses threads 0 & 1 SMT4 uses threads 0, 1, 2 & 3 When in any particular SMT mode, all threads have the same performance as each other (ie. at any moment in time, all threads perform the same). The SMT mode switching works such that when linux has threads 2 & 3 idle and 0 & 1 active, it will cede (H_CEDE hypercall) threads 2 and 3 in the idle loop and the hypervisor will automatically switch to SMT2 for that core (independent of other cores). The opposite is not true, so if threads 0 & 1 are idle and 2 & 3 are active, we will stay in SMT4 mode. Similarly if thread 0 is active and threads 1, 2 & 3 are idle, we'll go into SMT1 mode. If we can get the core into a lower SMT mode (SMT1 is best), the threads will perform better (since they share less core resources). Hence when we have idle threads, we want them to be the higher ones. This adds a feature bit for asymmetric packing to powerpc and then enables it on POWER7. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@ozlabs.org LKML-Reference: <20100608045702.31FB5CC8C7@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-09 11:13:14 +02:00
Dave Kleikamp	221c185d4e	powerpc/476: Add isync after loading mmu and debug spr's 476 requires an isync after loading MMU and debug related SPR's. Some of these are in performance-critical paths and may need to be optimized, but initially, we're playing it safe. Signed-off-by: Torez Smith <lnxtorez@linux.vnet.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2010-05-05 10:39:13 -04:00
Dave Kleikamp	3bffb6529c	powerpc/booke: Add support for advanced debug registers powerpc/booke: Add support for advanced debug registers From: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Based on patches originally written by Torez Smith. This patch defines context switch and trap related functionality for BookE specific Debug Registers. It adds support to ptrace() for setting and getting BookE related Debug Registers Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Torez Smith <lnxtorez@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <dwg@au1.ibm.com> Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Sergio Durigan Junior <sergiodj@br.ibm.com> Cc: Thiago Jung Bauermann <bauerman@br.ibm.com> Cc: linuxppc-dev list <Linuxppc-dev@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-17 14:03:17 +11:00
Dave Kleikamp	172ae2e7f8	powerpc/booke: Introduce new CONFIG options for advanced debug registers powerpc/booke: Introduce new CONFIG options for advanced debug registers From: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Introduce new config options to simplify the ifdefs pertaining to the advanced debug registers for booke and 40x processors: CONFIG_PPC_ADV_DEBUG_REGS - boolean: true for dac-based processors CONFIG_PPC_ADV_DEBUG_IACS - number of IAC registers CONFIG_PPC_ADV_DEBUG_DACS - number of DAC registers CONFIG_PPC_ADV_DEBUG_DVCS - number of DVC registers CONFIG_PPC_ADV_DEBUG_DAC_RANGE - DAC ranges supported Beginning conservatively, since I only have the facilities to test 440 hardware. I believe all 40x and booke platforms support at least 2 IAC and 2 DAC registers. For 440, 4 IAC and 2 DVC registers are enabled, as well as the DAC ranges. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Acked-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-17 14:03:16 +11:00

1 2 3

136 Commits