linux

Author	SHA1	Message	Date
Mohammed Gamal	b13354f8f0	KVM: x86 emulator: emulate nop and xchg reg, acc (opcodes 0x90 - 0x97) Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:33 +03:00
Avi Kivity	f76c710d75	KVM: Use printk_rlimit() instead of reporting emulation failures just once Emulation failure reports are useful, so allow more than one per the lifetime of the module. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:32 +03:00
Glauber Costa	25be46080f	KVM: Do not calculate linear rip in emulation failure report If we're not gonna do anything (case in which failure is already reported), we do not need to even bother with calculating the linear rip. Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:32 +03:00
Marcelo Tosatti	622395a9e6	KVM: only abort guest entry if timer count goes from 0->1 Only abort guest entry if the timer count went from 0->1, since for 1->2 or larger the bit will either be set already or a timer irq will have been injected. Using atomic_inc_and_test() for it also introduces an SMP barrier to the LAPIC version (thought it was unecessary because of timer migration, but guest can be scheduled to a different pCPU between exit and kvm_vcpu_block(), so there is the possibility for a race). Noticed by Avi. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:32 +03:00
Laurent Vivier	542472b53e	KVM: Add coalesced MMIO support (x86 part) This patch enables coalesced MMIO for x86 architecture. It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO. It enables the compilation of coalesced_mmio.c. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:31 +03:00
Laurent Vivier	92760499d0	KVM: kvm_io_device: extend in_range() to manage len and write attribute Modify member in_range() of structure kvm_io_device to pass length and the type of the I/O (write or read). This modification allows to use kvm_io_device with coalesced MMIO. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:30 +03:00
Avi Kivity	131d82791b	KVM: MMU: Avoid page prefetch on SVM SVM cannot benefit from page prefetching since guest page fault bypass cannot by made to work there. Avoid accessing the guest page table in this case. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:30 +03:00
Avi Kivity	d761a501cf	KVM: MMU: Move nonpaging_prefetch_page() In preparation for next patch. No code change. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:30 +03:00
Avi Kivity	91ed7a0e15	KVM: x86 emulator: implement 'push imm' (opcode 0x68) Encountered in FC6 boot sequence, now that we don't force ss.rpl = 0 during the protected mode transition. Not really necessary, but nice to have. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:29 +03:00
Avi Kivity	19e43636b5	KVM: x86 emulator: simplify push imm8 emulation Instead of fetching the data explicitly, use SrcImmByte. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:29 +03:00
Avi Kivity	eab9f71feb	KVM: MMU: Optimize prefetch_page() Instead of reading each pte individually, read 256 bytes worth of ptes and batch process them. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:28 +03:00
Guillaume Thouvenin	38d5bc6d50	KVM: x86 emulator: Add support for mov r, sreg (0x8c) instruction Add support for mov r, sreg (0x8c) instruction Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:28 +03:00
Guillaume Thouvenin	4257198ae2	KVM: x86 emulator: Add support for mov seg, r (0x8e) instruction Add support for mov r, sreg (0x8c) instruction. [avi: drop the sreg decoding table in favor of 1:1 encoding] Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:28 +03:00
Guillaume Thouvenin	615ac12561	KVM: x86 emulator: adds support to mov r,imm (opcode 0xb8) instruction Add support to mov r, imm (0xb8) instruction. Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:27 +03:00
Guillaume Thouvenin	954cd36f76	KVM: x86 emulator: add support for jmp far 0xea Add support for jmp far (opcode 0xea) instruction. Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:27 +03:00
Guillaume Thouvenin	89c696383d	KVM: x86 emulator: Update c->dst.bytes in decode instruction Update c->dst.bytes in decode instruction instead of instruction itself. It's needed because if c->dst.bytes is equal to 0, the instruction is not emulated. Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:27 +03:00
Guillaume Thouvenin	3e6e0aab1b	KVM: Prefixes segment functions that will be exported with "kvm_" Prefixes functions that will be exported with kvm_. We also prefixed set_segment() even if it still static to be coherent. signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:27 +03:00
Avi Kivity	9ba075a664	KVM: MTRR support Add emulation for the memory type range registers, needed by VMware esx 3.5, and by pci device assignment. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:26 +03:00
Sheng Yang	f08864b42a	KVM: VMX: Enable NMI with in-kernel irqchip Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:26 +03:00
Sheng Yang	3419ffc8e4	KVM: IOAPIC/LAPIC: Enable NMI support [avi: fix ia64 build breakage] Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:25 +03:00
Avi Kivity	50d40d7fb9	KVM: Remove unnecessary ->decache_regs() call Since we aren't modifying any register, there's no need to decache the register state. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:25 +03:00
Avi Kivity	7cc8883074	KVM: Remove decache_vcpus_on_cpu() and related callbacks Obsoleted by the vmx-specific per-cpu list. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:25 +03:00
Avi Kivity	543e424366	KVM: VMX: Add list of potentially locally cached vcpus VMX hardware can cache the contents of a vcpu's vmcs. This cache needs to be flushed when migrating a vcpu to another cpu, or (which is the case that interests us here) when disabling hardware virtualization on a cpu. The current implementation of decaching iterates over the list of all vcpus, picks the ones that are potentially cached on the cpu that is being offlined, and flushes the cache. The problem is that it uses mutex_trylock() to gain exclusive access to the vcpu, which fires off a (benign) warning about using the mutex in an interrupt context. To avoid this, and to make things generally nicer, add a new per-cpu list of potentially cached vcus. This makes the decaching code much simpler. The list is vmx-specific since other hardware doesn't have this issue. [andrea: fix crash on suspend/resume] Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:24 +03:00
Avi Kivity	4ecac3fd6d	KVM: Handle virtualization instruction #UD faults during reboot KVM turns off hardware virtualization extensions during reboot, in order to disassociate the memory used by the virtualization extensions from the processor, and in order to have the system in a consistent state. Unfortunately virtual machines may still be running while this goes on, and once virtualization extensions are turned off, any virtulization instruction will #UD on execution. Fix by adding an exception handler to virtualization instructions; if we get an exception during reboot, we simply spin waiting for the reset to complete. If it's a true exception, BUG() so we can have our stack trace. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:41:43 +03:00
Avi Kivity	1b7fcd3263	KVM: MMU: Fix false flooding when a pte points to page table The KVM MMU tries to detect when a speculative pte update is not actually used by demand fault, by checking the accessed bit of the shadow pte. If the shadow pte has not been accessed, we deem that page table flooded and remove the shadow page table, allowing further pte updates to proceed without emulation. However, if the pte itself points at a page table and only used for write operations, the accessed bit will never be set since all access will happen through the emulator. This is exactly what happens with kscand on old (2.4.x) HIGHMEM kernels. The kernel points a kmap_atomic() pte at a page table, and then proceeds with read-modify-write operations to look at the dirty and accessed bits. We get a false flood trigger on the kmap ptes, which results in the mmu spending all its time setting up and tearing down shadows. Fix by setting the shadow accessed bit on emulated accesses. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:50 +03:00
Avi Kivity	7682f2d0dd	KVM: VMX: Trivial vmcs_write64() code simplification Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:50 +03:00
Chris Lalancette	14ae51b6c0	KVM: SVM: Fake MSR_K7 performance counters Attached is a patch that fixes a guest crash when booting older Linux kernels. The problem stems from the fact that we are currently emulating MSR_K7_EVNTSEL[0-3], but not emulating MSR_K7_PERFCTR[0-3]. Because of this, setup_k7_watchdog() in the Linux kernel receives a GPF when it attempts to write into MSR_K7_PERFCTR, which causes an OOPs. The patch fixes it by just "fake" emulating the appropriate MSRs, throwing away the data in the process. This causes the NMI watchdog to not actually work, but it's not such a big deal in a virtualized environment. When we get a write to one of these counters, we printk_ratelimit() a warning. I decided to print it out for all writes, even if the data is 0; it doesn't seem to make sense to me to special case when data == 0. Tested by myself on a RHEL-4 guest, and Joerg Roedel on a Windows XP 64-bit guest. Signed-off-by: Chris Lalancette <clalance@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:49 +03:00
Aurelien Jarno	f697554515	KVM: PIT: support mode 3 The in-kernel PIT emulation ignores pending timers if operating under mode 3, which for example Hurd uses. This mode should output a square wave, high for (N+1)/2 counts and low for (N-1)/2 counts. As we only care about the resulting interrupts, the period is N, and mode 3 is the same as mode 2 with regard to interrupts. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:49 +03:00
Joerg Roedel	d2ebb4103f	KVM: SVM: add tracing support for TDP page faults To distinguish between real page faults and nested page faults they should be traced as different events. This is implemented by this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:48 +03:00
Joerg Roedel	af9ca2d703	KVM: SVM: add missing kvmtrace markers This patch adds the missing kvmtrace markers to the svm module of kvm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:48 +03:00
Joerg Roedel	54e445ca84	KVM: add missing kvmtrace bits This patch adds some kvmtrace bits to the generic x86 code where it is instrumented from SVM. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:48 +03:00
Joerg Roedel	a069805579	KVM: SVM: implement dedicated INTR exit handler With an exit handler for INTR intercepts its possible to account them using kvmtrace. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:47 +03:00
Joerg Roedel	c47f098d69	KVM: SVM: implement dedicated NMI exit handler With an exit handler for NMI intercepts its possible to account them using kvmtrace. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:47 +03:00
Joerg Roedel	c7bf23babc	KVM: VMX: move APIC_ACCESS trace entry to generic code This patch moves the trace entry for APIC accesses from the VMX code to the generic lapic code. This way APIC accesses from SVM will also be traced. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:47 +03:00
Harvey Harrison	8b2cf73cc1	KVM: add statics were possible, function definition in lapic.h Noticed by sparse: arch/x86/kvm/vmx.c:1583:6: warning: symbol 'vmx_disable_intercept_for_msr' was not declared. Should it be static? arch/x86/kvm/x86.c:3406:5: warning: symbol 'kvm_task_switch_16' was not declared. Should it be static? arch/x86/kvm/x86.c:3429:5: warning: symbol 'kvm_task_switch_32' was not declared. Should it be static? arch/x86/kvm/mmu.c:1968:6: warning: symbol 'kvm_mmu_remove_one_alloc_mmu_page' was not declared. Should it be static? arch/x86/kvm/mmu.c:2014:6: warning: symbol 'mmu_destroy_caches' was not declared. Should it be static? arch/x86/kvm/lapic.c:862:5: warning: symbol 'kvm_lapic_get_base' was not declared. Should it be static? arch/x86/kvm/i8254.c:94:5: warning: symbol 'pit_get_gate' was not declared. Should it be static? arch/x86/kvm/i8254.c:196:5: warning: symbol '__pit_timer_fn' was not declared. Should it be static? arch/x86/kvm/i8254.c:561:6: warning: symbol '__inject_pit_timer_intr' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:40:46 +03:00
Peter Zijlstra	31656519e1	sched, x86: clean up hrtick implementation random uvesafb failures were reported against Gentoo: http://bugs.gentoo.org/show_bug.cgi?id=222799 and Mihai Moldovan bisected it back to: > `8f4d37ec07` is first bad commit > commit `8f4d37ec07` > Author: Peter Zijlstra <a.p.zijlstra@chello.nl> > Date: Fri Jan 25 21:08:29 2008 +0100 > > sched: high-res preemption tick Linus suspected it to be hrtick + vm86 interaction and observed: > Btw, Peter, Ingo: I think that commit is doing bad things. They aren't > _incorrect_ per se, but they are definitely bad. > > Why? > > Using random _TIF_WORK_MASK flags is really impolite for doing > "scheduling" work. There's a reason that arch/x86/kernel/entry_32.S > special-cases the _TIF_NEED_RESCHED flag: we don't want to exit out of > vm86 mode unnecessarily. > > See the "work_notifysig_v86" label, and how it does that > "save_v86_state()" thing etc etc. Right, I never liked having to fiddle with those TIF flags. Initially I needed it because the hrtimer base lock could not nest in the rq lock. That however is fixed these days. Currently the only reason left to fiddle with the TIF flags is remote wakeups. We cannot program a remote cpu's hrtimer. I've been thinking about using the new and improved IPI function call stuff to implement hrtimer_start_on(). However that does require that smp_call_function_single(.wait=0) works from interrupt context - /me looks at the latest series from Jens - Yes that does seem to be supported, good. Here's a stab at cleaning this stuff up ... Mihai reported test success as well. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Mihai Moldovan <ionic@ionic.de> Cc: Michal Januszewski <spock@gentoo.org> Cc: Antonino Daplas <adaplas@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-20 10:37:28 +02:00
Mike Travis	c4762aba0b	NR_CPUS: Replace NR_CPUS in speedstep-centrino.c Some cleanups in speedstep-centrino.c for NR_CPUS=4096. * Use new CPUMASK_PTR (instead of old CPUMASK_VAR). * Replace arrays sized by NR_CPUS with percpu variables. * Cleanup some formatting problems (>80 chars per line) and other checkpatch complaints. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-20 10:21:12 +02:00
Mike Travis	1bd9d6b64e	NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c * nr_cpu_ids should be used to determine if a percpu area is available for a given cpu. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-20 10:21:10 +02:00
Mike Travis	247bc6ca0f	NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c * Replace NR_CPUS loop with for_each_possible_cpu(). * nr_cpu_ids should be used to determine if a percpu area is available for a given cpu. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-20 10:21:09 +02:00
Mike Travis	f2ad47ffeb	NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c * Use nr_cpu_ids instead of NR_CPUS to limit traversal of cpu online map. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-20 10:21:09 +02:00
Mike Travis	6bca67f951	NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c * nr_cpu_ids should be used to allocate arrays based on the number of cpu's present. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-20 10:21:08 +02:00
Simon Arlott	e3a61b0a8c	x86: add unknown_nmi_panic kernel parameter It's not possible to enable the unknown_nmi_panic sysctl option until init is run. It's useful to be able to panic the kernel during boot too, this adds a parameter to enable this option. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-20 10:10:31 +02:00
Yinghai Lu	63b5d7af25	x86: add ->pre_time_init to x86_quirks so NUMAQ can use that to call numaq_pre_time_init() This allows us to remove a NUMAQ special from arch/x86/kernel/setup.c. (and paves the way to remove the NUMAQ subarch) Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-20 09:25:52 +02:00
Yinghai Lu	64898a8bad	x86: extend and use x86_quirks to clean up NUMAQ code add these new x86_quirks methods: int mpc_record; int (mpc_apic_id)(struct mpc_config_processor m); void (mpc_oem_bus_info)(struct mpc_config_bus m, char name); void (mpc_oem_pci_bus)(struct mpc_config_bus m); void (smp_read_mpc_oem)(struct mp_config_oemtable oemtable, unsigned short oemsize); ... and move NUMAQ related mps table handling to numaq_32.c. also move the call to smp_read_mpc_oem() to smp_read_mpc() directly. Should not change functionality, albeit it would be nice to get it tested on real NUMAQ as well ... Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-20 09:25:52 +02:00
Yinghai Lu	3c9cb6de1e	x86: introduce x86_quirks introduce x86_quirks array of boot-time quirk methods. No change in functionality intended. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-20 09:18:17 +02:00
Yinghai Lu	5f1f2b3d9d	x86: improve debug printout: add target bootmem range in early_res_to_bootmem() Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-20 09:11:07 +02:00
Ingo Molnar	d092633bff	Subject: devmem, x86: fix rename of CONFIG_NONPROMISC_DEVMEM From: Arjan van de Ven <arjan@infradead.org> Date: Sat, 19 Jul 2008 15:47:17 -0700 CONFIG_NONPROMISC_DEVMEM was a rather confusing name - but renaming it to CONFIG_PROMISC_DEVMEM causes problems on architectures that do not support this feature; this patch renames it to CONFIG_STRICT_DEVMEM, so that architectures can opt-in into it. ( the polarity of the option is still the same as it was originally; it needs to be for now to not break architectures that don't have the infastructure yet to support this feature) Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: "V.Radhakrishnan" <rk@atr-labs.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> ---	2008-07-20 08:35:55 +02:00
Yinghai Lu	e5849e71ad	x86: remove arch_get_ram_range no user now Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-18 17:43:40 -07:00
venkatesh.pallipadi@intel.com	fec0962e0b	x86: Add a debugfs interface to dump PAT memtype Add a debugfs interface to list out all the PAT memtype reservations. Appears at debugfs x86/pat_memtype_list and output format is type @ <start addr>-<end addr> We do not hold the lock while printing the entire list. So, the list may not be a consistent copy in case where regions are getting added or deleted at the same time. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-18 17:22:05 -07:00
venkatesh.pallipadi@intel.com	ae79cdaacb	x86: Add a arch directory for x86 under debugfs Add a directory for x86 arch under debugfs. Can be used to accumulate all x86 specific debugfs files. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-18 17:22:04 -07:00
Jan Beulich	2ddf9b7b3e	i386/xen: add proper unwind annotations to xen_sysenter_target Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-18 16:05:55 -07:00
Jan Beulich	08ad8afaa0	x86: reduce force_mwait visibility It's not used anywhere outside its single referencing file. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-18 15:55:09 -07:00
Jan Beulich	08e1a13e7d	x86: reduce forbid_dac's visibility It's not used anywhere outside its declaring file. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-18 14:39:37 -07:00
Bernhard Walle	91467bdf6e	x86: move dma32_reserve_bootmem() after reserve_crashkernel() On a x86-64 machine (nothing special I could encounter) I had the problem that crashkernel reservation with the usual "64M@16M" failed. While debugging that, I encountered that dma32_reserve_bootmem() reserves a memory region which is in that area. Because dma32_reserve_bootmem() does not rely on a specific offset but crashkernel does, it makes sense to move the dma32_reserve_bootmem() reservation down a bit. I tested that patch and it works without problems. I don't see any negative effects of that move, but maybe I oversaw something ... While we strictly don't need that patch in 2.6.27 because we have the automatic, dynamic offset detection, it makes sense to also include it here because: - it's easier to get it in -stable then, - many people are still used to the 'crashkernel=...@16M' syntax, - not everybody may be using a reloatable kernel. Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: kexec@lists.infradead.org Cc: vgoyal@redhat.com Cc: akpm@linux-foundation.org Cc: Bernhard Walle <bwalle@suse.de> Cc: yhlu.kernel@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 23:35:46 +02:00
Jan Beulich	369c99205f	x86: fix two modpost warnings Even though it's only the difference of the two __initdata symbols that's being calculated, modpost still doesn't like this. So rather calculate the size once in an __init function and store it for later use. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-18 14:34:08 -07:00
Jan Beulich	f2ba93929f	x86: check function status in EDD boot code Without checking the return value of get_edd_info() and adding the entry only in the success case, 6 devices show up under /sys/firmware/edd/, no matter how many devices are actually present. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-18 14:33:17 -07:00
Hiroshi Shimamoto	812b121d55	x86_64: ia32_signal.c: remove signal number conversion This was old code that was needed for iBCS and x86-64 never supported that. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 22:08:20 +02:00
Mike Travis	eb53fac5ca	cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target * Use the CPUMASK_ALLOC macros in the centrino_target() function. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 22:03:00 +02:00
Mike Travis	c42f4f4c6d	cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c * Optimize various places where a pointer to the cpumask_of_cpu value will result in reducing stack pressure. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 22:02:58 +02:00
Mike Travis	cb6d2be60d	cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c * Optimize various places where a pointer to the cpumask_of_cpu value will result in reducing stack pressure. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 22:02:57 +02:00
Mike Travis	65c0118453	cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr * This patch replaces the dangerous lvalue version of cpumask_of_cpu with new cpumask_of_cpu_ptr macros. These are patterned after the node_to_cpumask_ptr macros. In general terms, if there is a cpumask_of_cpu_map[] then a pointer to the cpumask_of_cpu_map[cpu] entry is used. The cpumask_of_cpu_map is provided when there is a large NR_CPUS count, reducing greatly the amount of code generated and stack space used for cpumask_of_cpu(). The pointer to the cpumask_t value is needed for calling set_cpus_allowed_ptr() to reduce the amount of stack space needed to pass the cpumask_t value. If there isn't a cpumask_of_cpu_map[], then a temporary variable is declared and filled in with value from cpumask_of_cpu(cpu) as well as a pointer variable pointing to this temporary variable. Afterwards, the pointer is used to reference the cpumask value. The compiler will optimize out the extra dereference through the pointer as well as the stack space used for the pointer, resulting in identical code. A good example of the orthogonal usages is in net/sunrpc/svc.c: case SVC_POOL_PERCPU: { unsigned int cpu = m->pool_to[pidx]; cpumask_of_cpu_ptr(cpumask, cpu); oldmask = current->cpus_allowed; set_cpus_allowed_ptr(current, cpumask); return 1; } case SVC_POOL_PERNODE: { unsigned int node = m->pool_to[pidx]; node_to_cpumask_ptr(nodecpumask, node); oldmask = current->cpus_allowed; set_cpus_allowed_ptr(current, nodecpumask); return 1; } Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 22:02:57 +02:00
Ingo Molnar	bb2c018b09	Merge branch 'linus' into cpus4096 Conflicts: drivers/acpi/processor_throttling.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 22:00:54 +02:00
Ingo Molnar	f6dc8ccaab	Merge branch 'linus' into core/generic-dma-coherent Conflicts: kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 21:13:20 +02:00
Ingo Molnar	9b610fda0d	Merge branch 'linus' into timers/nohz	2008-07-18 19:53:16 +02:00
Alexander Beregalov	fa10c51a04	arch/x86/kernel/cpu/common_64.c: remove double inclusions x86: remove double inclusions in arch/x86/kernel/cpu/common_64.c Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Cc: yhlu.kernel@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 19:25:28 +02:00
Hiroshi Shimamoto	1181f8b5f0	x86_32: remove redundant KERN_INFO This printk has a KERN_ facility level in the format string. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 19:01:04 +02:00
Jaswinder Singh	6ac8d51f01	x86: introducing asm-x86/traps.h Declaring x86 traps under one hood. Declaring x86 do_traps before defining them. Signed-off-by: Jaswinder Singh <jaswinder@infradead.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 18:51:57 +02:00
Joerg Roedel	5ff4789d04	AMD IOMMU: set iommu for device from ACPI code too The device<->iommu relationship has to be set from the information in the ACPI table too. This patch adds this logic to the driver. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 18:43:32 +02:00
Ingo Molnar	f1b0c8d3d3	Merge branch 'linus' into x86/amd-iommu	2008-07-18 18:43:08 +02:00
Thomas Petazzoni	9781f39fd2	x86: consolidate the definition of the force_mwait variable The force_mwait variable iss defined either in arch/x86/kernel/cpu/amd.c or in arch/x86/kernel/setup_64.c, but it is only initialized and used in arch/x86/kernel/process.c. This patch moves the declaration to arch/x86/kernel/process.c. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: michael@free-electrons.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 18:39:19 +02:00
Alexander Beregalov	4712965422	x86 setup.c: cleanup includes x86: remove double includes in setup.c Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Cc: yhlu.kernel@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 18:21:17 +02:00
Thomas Gleixner	b8f8c3cf0a	nohz: prevent tick stop outside of the idle loop Jack Ren and Eric Miao tracked down the following long standing problem in the NOHZ code: scheduler switch to idle task enable interrupts Window starts here ----> interrupt happens (does not set NEED_RESCHED) irq_exit() stops the tick ----> interrupt happens (does set NEED_RESCHED) return from schedule() cpu_idle(): preempt_disable(); Window ends here The interrupts can happen at any point inside the race window. The first interrupt stops the tick, the second one causes the scheduler to rerun and switch away from idle again and we end up with the tick disabled. The fact that it needs two interrupts where the first one does not set NEED_RESCHED and the second one does made the bug obscure and extremly hard to reproduce and analyse. Kudos to Jack and Eric. Solution: Limit the NOHZ functionality to the idle loop to make sure that we can not run into such a situation ever again. cpu_idle() { preempt_disable(); while(1) { tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we are in the idle loop while (!need_resched()) halt(); tick_nohz_restart_sched_tick(); <- disables NOHZ mode preempt_enable_no_resched(); schedule(); preempt_disable(); } } In hindsight we should have done this forever, but ... /me grabs a large brown paperbag. Debugged-by: Jack Ren <jack.ren@marvell.com>, Debugged-by: eric miao <eric.y.miao@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-07-18 18:10:28 +02:00
Akinobu Mita	8b2b9c1af0	x86, intel_cacheinfo: fix use-after-free cache_kobject This avoids calling kobject_uevent() with cache_kobject that has already been deallocated in an error path. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 17:49:33 +02:00
Ingo Molnar	3f9b5cc018	x86: re-enable OPTIMIZE_INLINING re-enable OPTIMIZE_INLINING more widely. Jeff Dike fixed the remaining outstanding issue in this commit: \| commit `4f81c5350b` \| Author: Jeff Dike <jdike@addtoit.com> \| Date: Mon Jul 7 13:36:56 2008 -0400 \| \| [UML] fix gcc ICEs and unresolved externs [...] \| This patch reintroduces unit-at-a-time for gcc >= 4.0, bringing back the \| possibility of Uli's crash. If that happens, we'll debug it. it's still default-off and thus opt-in. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 16:30:05 +02:00
Alexander van Heukelum	7dedcee394	x86: traps_xx: modify x86_64 to use _log_lvl variants i386 has show_trace_log_lvl and show_stack_log_lvl, allowing traces to be emitted with log-level annotations. This patch introduces them to x86_64, but log_lvl is only ever set to an empty string. Output of traces is unchanged. i386-chunk is whitespace-only. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 16:21:17 +02:00
Alexander van Heukelum	78cbac65fd	x86: traps_xx: refactor die() like in x86_64 Make the diff between the traps_32.c and traps_64.c a bit smaller. Change traps_32.c to look more like traps_64.c: - move lock information to file scope - split out oops_begin() and oops_end() from die() - increment nest counter in oops_begin Only whitespace change in traps_64.c No functional changes intended. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 16:21:17 +02:00
Russ Anderson	7019cc2dd6	x86 BIOS interface for RTC on SGI UV Real-time code needs to know the number of cycles per second on SGI UV. The information is provided via a run time BIOS call. This patch provides the linux side of that interface. This is the first of several run time BIOS calls to be defined in uv/bios.h and bios_uv.c. Note that BIOS_CALL() is just a stub for now. The bios side is being worked on. Signed-off-by: Russ Anderson <rja@sgi.com> Cc: Jack Steiner <steiner@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:35:14 +02:00
Maciej W. Rozycki	49a66a0bce	x86: I/O APIC: Always report how the timer has been set up Following recent (and less so) issues with the 8254 timer when routed through the I/O or local APIC, always report which configurations have been tried and which one has been set up eventually. This is so that logs posted by people for some other reason can be used as a cross-reference when investigating any possible future problems. The change unifies messages printed on 32-bit and 64-bit platforms and adds trailing newlines (removes leading ones), so that proper log level annotation can be used and any possible interspersed output will not cause a mess. I have chosen to use apic_printk(APIC_QUIET, ...) rather than printk(...) so that the distinction of these messages is maintained making possible future decisions about changes in this area easier. A change posted separately making apic_verbosity unsigned removes any extra code that would otherwise be generated as a result of this design decision. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:27:47 +02:00
Maciej W. Rozycki	baa1318841	x86: APIC: Make apic_verbosity unsigned As a microoptimisation, make apic_verbosity unsigned. This will make apic_printk(APIC_QUIET, ...) expand into just printk(...) with the surrounding condition and a reference to apic_verbosity removed. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:27:43 +02:00
Maciej W. Rozycki	17c44697f2	x86: I/O APIC: Include <asm/i8259.h> required by some code Include <asm/i8259.h> for i8259A_lock used in print_PIC() -- #if-0-ed out by default. The 32-bit version gets it right already. The plan is to enable this code with "apic=debug" eventually. This will aid with debugging strange problems without the need to ask people to apply patches. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:27:38 +02:00
Cyrill Gorcunov	836c129de9	x86: apic_32 - introduce calibrate_APIC_clock Introduce calibrate_APIC_clock so it could help in further 32/64bit apic code merging. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: macro@linux-mips.org Cc: yhlu.kernel@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:17:30 +02:00
Cyrill Gorcunov	89b3b1f41b	x86: apic_64 - make calibrate_APIC_clock to return error code Make calibration_result to return error and check calibration_result to be sufficient inside calibrate_APIC_clock. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: macro@linux-mips.org Cc: yhlu.kernel@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:17:29 +02:00
Yinghai Lu	caadbdce24	x86: enable memory tester support on 32-bit only supports memory below max_low_pfn. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:11:58 +02:00
Yinghai Lu	1f067167a8	x86: seperate memtest from init_64.c it's separate functionality that deserves its own file. This also prepares 32-bit memtest support. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:10:27 +02:00
Hiroshi Shimamoto	fbdb7da91b	x86_64: ia32_signal.c: use macro instead of immediate Make and use macro FIX_EFLAGS, instead of immediate value 0x40DD5 in ia32_restore_sigcontext(). Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 13:54:08 +02:00
Ingo Molnar	cdbfc557c4	Merge branch 'linus' into x86/cleanups	2008-07-18 13:53:16 +02:00
Jeremy Fitzhardinge	95c7c23b06	xen: report hypervisor version Various versions of the hypervisor have differences in what ABIs and features they support. Print some details into the boot log to help with remote debugging. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 13:50:42 +02:00
Ingo Molnar	2fb5e1e101	Merge branch 'linus' into x86/paravirt-spinlocks Conflicts: arch/x86/kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 13:41:27 +02:00
Maciej W. Rozycki	593f4a788e	x86: APIC: remove apic_write_around(); use alternatives Use alternatives to select the workaround for the 11AP Pentium erratum for the affected steppings on the fly rather than build time. Remove the X86_GOOD_APIC configuration option and replace all the calls to apic_write_around() with plain apic_write(), protecting accesses to the ESR as appropriate due to the 3AP Pentium erratum. Remove apic_read_around() and all its invocations altogether as not needed. Remove apic_write_atomic() and all its implementing backends. The use of ASM_OUTPUT2() is not strictly needed for input constraints, but I have used it for readability's sake. I had the feeling no one else was brave enough to do it, so I went ahead and here it is. Verified by checking the generated assembly and tested with both a 32-bit and a 64-bit configuration, also with the 11AP "feature" forced on and verified with gdb on /proc/kcore to work as expected (as an 11AP machines are quite hard to get hands on these days). Some script complained about the use of "volatile", but apic_write() needs it for the same reason and is effectively a replacement for writel(), so I have disregarded it. I am not sure what the policy wrt defconfig files is, they are generated and there is risk of a conflict resulting from an unrelated change, so I have left changes to them out. The option will get removed from them at the next run. Some testing with machines other than mine will be needed to avoid some stupid mistake, but despite its volume, the change is not really that intrusive, so I am fairly confident that because it works for me, it will everywhere. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 12:51:21 +02:00
Yinghai Lu	29cbeb0e17	x86: use cpu_clear in remove_cpu_from_maps Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 12:20:28 +02:00
Ingo Molnar	cd569ef5d6	Merge branch 'linus' into x86/urgent	2008-07-18 12:20:23 +02:00
Ingo Molnar	48ae744434	Merge branch 'linus' into x86/step	2008-07-18 10:14:56 +02:00
Ingo Molnar	6879827f4e	x86: remove arch/x86/kernel/smpcommon_32.c Yinghai Lu noticed that arch/x86/kernel/smpcommon_32.c got renamed to arch/x86/kernel/smpcommon.c but the old almost-empty file stayed around. Zap it. Reported-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 01:21:53 +02:00
Ingo Molnar	64d206d896	x86: rename CONFIG_NONPROMISC_DEVMEM to CONFIG_PROMISC_DEVMEM Linus observed: > The real bug is that we shouldn't have "double negatives", and > certainly not negative config options. Making that "promiscuous > /dev/mem" option a negated thing as a config option was bad. right ... lets rename this option. There should never be a negation in config options. [ that reminds me of CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER, but that is for another commit ;-) ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 00:28:57 +02:00
Ingo Molnar	393d81aa02	Merge branch 'linus' into xen-64bit	2008-07-17 23:57:20 +02:00
H. Peter Anvin	4fdf08b5bf	x86: unify and correct the GDT_ENTRY() macro Merge the GDT_ENTRY() macro between arch/x86/boot/pm.c and arch/x86/kernel/acpi/sleep.c and put the new one in <asm-x86/segment.h>. While we're at it, correct the bitmasks for the limit and flags. The new version relies on using ULL constants in order to cause type promotion rather than explicit casts; this avoids having to include <linux/types.h> in <asm-x86/segments.h>. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-17 11:29:24 -07:00
Linus Torvalds	2b04be7e8a	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix asm/e820.h for userspace inclusion x86: fix numaq_tsc_disable x86: fix kernel_physical_mapping_init() for large x86 systems	2008-07-17 10:38:59 -07:00
Linus Torvalds	bdec6cace4	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ftrace: do not trace library functions ftrace: do not trace scheduler functions ftrace: fix lockup with MAXSMP ftrace: fix merge buglet	2008-07-17 10:37:10 -07:00
Yinghai Lu	9354094a95	x86: fix numaq_tsc_disable fix: arch/x86/kernel/numaq_32.c: In function ‘numaq_tsc_disable’: arch/x86/kernel/numaq_32.c:99: warning: ‘return’ with a value, in function returning void Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-17 19:27:08 +02:00
Jeremy Fitzhardinge	93a0886e23	x86, xen, power: fix up config dependencies on PM Xen save/restore needs bits of code enabled by PM_SLEEP, and PM_SLEEP depends on PM. So make XEN_SAVE_RESTORE depend on PM and PM_SLEEP depend on XEN_SAVE_RESTORE. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-17 19:25:20 +02:00
Ingo Molnar	fab3b58d3b	x86 reboot quirks: add Dell Precision WorkStation T5400 as reported in: "reboot=bios is mandatory on Dell T5400 server." http://bugzilla.kernel.org/show_bug.cgi?id=11108 add a DMI reboot quirk. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org>	2008-07-17 13:56:15 +02:00
Ingo Molnar	8e9509c827	ftrace: fix merge buglet -tip testing found a bootup hang here: initcall anon_inode_init+0x0/0x130 returned 0 after 0 msecs calling acpi_event_init+0x0/0x57 the bootup should have continued with: initcall acpi_event_init+0x0/0x57 returned 0 after 45 msecs but it hung hard there instead. bisection led to this commit: \| commit `5806b81ac1` \| Merge: d14c8a6... 6712e29... \| Author: Ingo Molnar <mingo@elte.hu> \| Date: Mon Jul 14 16:11:52 2008 +0200 \| Merge branch 'auto-ftrace-next' into tracing/for-linus turns out that i made this mistake in the merge: ifdef CONFIG_FTRACE # Do not profile debug utilities CFLAGS_REMOVE_tsc_64.o = -pg CFLAGS_REMOVE_tsc_32.o = -pg those two files got unified meanwhile - so the dont-profile annotation got lost. The proper rule is: CFLAGS_REMOVE_tsc.o = -pg i guess this could have been caught sooner if the CFLAGS_REMOVE* kbuild rule aborted the build if it met a target that does not exist anymore? Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-17 13:26:50 +02:00
Linus Torvalds	dc7c65db28	Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits) Revert "x86/PCI: ACPI based PCI gap calculation" PCI: remove unnecessary volatile in PCIe hotplug struct controller x86/PCI: ACPI based PCI gap calculation PCI: include linux/pm_wakeup.h for device_set_wakeup_capable PCI PM: Fix pci_prepare_to_sleep x86/PCI: Fix PCI config space for domains > 0 Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n PCI: Simplify PCI device PM code PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep PCI ACPI: Rework PCI handling of wake-up ACPI: Introduce new device wakeup flag 'prepared' ACPI: Introduce acpi_device_sleep_wake function PCI: rework pci_set_power_state function to call platform first PCI: Introduce platform_pci_power_manageable function ACPI: Introduce acpi_bus_power_manageable function PCI: make pci_name use dev_name PCI: handle pci_name() being const PCI: add stub for pci_set_consistent_dma_mask() PCI: remove unused arch pcibios_update_resource() functions PCI: fix pci_setup_device()'s sprinting into a const buffer ... Fixed up conflicts in various files (arch/x86/kernel/setup_64.c, arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c, drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86 and ACPI updates manually.	2008-07-16 17:25:46 -07:00
Jesse Barnes	58b6e55384	Revert "x86/PCI: ACPI based PCI gap calculation" This reverts commit `809d9a8f93`. This one isn't quite ready for prime time. It needs more testing and additional feedback from the ACPI guys.	2008-07-16 16:21:47 -07:00
Zhao Yakui	da5e09a1b3	ACPI : Create "idle=nomwait" bootparam "idle=nomwait" disables the use of the MWAIT instruction from both C1 (C1_FFH) and deeper (C2C3_FFH) C-states. When MWAIT is unavailable, the BIOS and OS generally negotiate to use the HALT instruction for C1, and use IO accesses for deeper C-states. This option is useful for power and performance comparisons, and also to work around BIOS bugs where broken MWAIT support is advertised. http://bugzilla.kernel.org/show_bug.cgi?id=10807 http://bugzilla.kernel.org/show_bug.cgi?id=10914 Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:05 +02:00
Zhao Yakui	c1e3b377ad	ACPI: Create "idle=halt" bootparam "idle=halt" limits the idle loop to using the halt instruction. No MWAIT, no IO accesses, no C-states deeper than C1. If something is broken in the idle code, "idle=halt" is a less severe workaround than "idle=poll" which disables all power savings. Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:05 +02:00
Zhao Yakui	5b53496a5a	ACPI: Disable the C2C3_FFH access mode HW has no MWAIT support `991528d734` (ACPI: Processor native C-states using MWAIT) started passing C2C3_FFH to _PDC to tell the BIOS that Linux supports MWAIT for deep C-states. However, we should first double check with the hardware that it actually supports MWAIT before potentially exposing a BIOS bug of an MWAIT _CST on HW that doesn't support MWAIT. Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:04 +02:00
Bob Moore	19d0cfe9dd	ACPICA: Update DMAR and SRAT table definitions Synchronized tables with current specifications. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:04 +02:00
Roland McGrath	380fdd7585	x86 ptrace: user-sets-TF nits This closes some arcane holes in single-step handling that can arise only when user programs set TF directly (via popf or sigreturn) and then use vDSO (syscall/sysenter) system call entry. In those entry paths, the clear_TF_reenable case hits and we must check TIF_SINGLESTEP to be sure our bookkeeping stays correct wrt the user's view of TF. Signed-off-by: Roland McGrath <roland@redhat.com>	2008-07-16 12:15:17 -07:00
Roland McGrath	d4d6715016	x86 ptrace: unify syscall tracing This unifies and cleans up the syscall tracing code on i386 and x86_64. Using a single function for entry and exit tracing on 32-bit made the do_syscall_trace() into some terrible spaghetti. The logic is clear and simple using separate syscall_trace_enter() and syscall_trace_leave() functions as on 64-bit. The unification adds PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP support on x86_64, for 32-bit ptrace() callers and for 64-bit ptrace() callers tracing either 32-bit or 64-bit tasks. It behaves just like 32-bit. Changing syscall_trace_enter() to return the syscall number shortens all the assembly paths, while adding the SYSEMU feature in a simple way. Signed-off-by: Roland McGrath <roland@redhat.com>	2008-07-16 12:15:17 -07:00
Roland McGrath	64f0973319	x86 ptrace: unify TIF_SINGLESTEP This unifies the treatment of TIF_SINGLESTEP on i386 and x86_64. The bit is now excluded from _TIF_WORK_MASK on i386 as it has been on x86_64. This means the do_notify_resume() path using it is never used, so TIF_SINGLESTEP is not cleared on returning to user mode. Both now leave TIF_SINGLESTEP set when returning to user, so that it's already set on an int $0x80 system call entry. This removes the need for testing TF on the system_call path. Doing it this way fixes the regression for PTRACE_SINGLESTEP into a sigreturn syscall, introduced by commit `1e2e99f0e4`. The clear_TF_reenable case that sets TIF_SINGLESTEP can only happen on a non-exception kernel entry, i.e. sysenter/syscall instruction. That will always get to the syscall exit tracing path. Signed-off-by: Roland McGrath <roland@redhat.com>	2008-07-16 12:15:16 -07:00
Roland McGrath	6718d0d6da	x86 ptrace: block-step fix The enable_single_step() logic bails out early if TF is already set. That skips some of the bookkeeping that keeps things straight. This makes PTRACE_SINGLEBLOCK break the behavior of a user task that was already setting TF itself in user mode. Fix the bookkeeping to notice the old TF setting as it should. Test case at: http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/step-jump-cont-strict.c?cvsroot=systemtap Signed-off-by: Roland McGrath <roland@redhat.com>	2008-07-16 12:15:16 -07:00
Jack Steiner	e22146e610	x86: fix kernel_physical_mapping_init() for large x86 systems Fix bug in kernel_physical_mapping_init() that causes kernel page table to be built incorrectly for systems with greater than 512GB of memory. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 18:27:36 +02:00
Ingo Molnar	77e442461c	Merge branch 'linus' into x86/kprobes	2008-07-16 13:11:29 +02:00
Ingo Molnar	34646bca47	x86, paravirt-spinlocks: fix boot hang the paravirt-spinlock patches caused a boot hang with this config: http://redhat.com/~mingo/misc/config-Wed_Jul__9_14_47_04_CEST_2008.bad i have bisected it down to: \| commit e17b58c2e85bc2ad2afc07fb8d898017c2b75ed1 \| Author: Jeremy Fitzhardinge <jeremy@goop.org> \| Date: Mon Jul 7 12:07:53 2008 -0700 \| \| xen: implement Xen-specific spinlocks i.e. applying that patch alone causes the hang. The hang happens in the ftrace self-test: initcall utsname_sysctl_init+0x0/0x19 returned 0 after 0 msecs calling init_sched_switch_trace+0x0/0x4c Testing tracer sched_switch: PASSED initcall init_sched_switch_trace+0x0/0x4c returned 0 after 167 msecs calling init_function_trace+0x0/0x12 Testing tracer ftrace: [hard hang] it should have continued like this: Testing tracer ftrace: PASSED initcall init_function_trace+0x0/0x12 returned 0 after 198 msecs calling init_irqsoff_tracer+0x0/0x14 Testing tracer irqsoff: PASSED initcall init_irqsoff_tracer+0x0/0x14 returned 0 after 3 msecs calling init_mmio_trace+0x0/0x12 initcall init_mmio_trace+0x0/0x12 returned 0 after 0 msecs the problem is that such lowlevel primitives as spinlocks should never be built with -pg (which ftrace does). Marking paravirt.o as non-pg and marking all spinlock ops as always-inline solve the hang. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:53 +02:00
Ingo Molnar	9af98578d6	x86: paravirt spinlocks, modular build fix fix: MODPOST 408 modules ERROR: "pv_lock_ops" [net/dccp/dccp.ko] undefined! ERROR: "pv_lock_ops" [fs/jbd2/jbd2.ko] undefined! ERROR: "pv_lock_ops" [drivers/media/common/saa7146_vv.ko] undefined! Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:53 +02:00
Ingo Molnar	4bb689eee1	x86: paravirt spinlocks, !CONFIG_SMP build fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:53 +02:00
Jeremy Fitzhardinge	2d9e1e2f58	xen: implement Xen-specific spinlocks The standard ticket spinlocks are very expensive in a virtual environment, because their performance depends on Xen's scheduler giving vcpus time in the order that they're supposed to take the spinlock. This implements a Xen-specific spinlock, which should be much more efficient. The fast-path is essentially the old Linux-x86 locks, using a single lock byte. The locker decrements the byte; if the result is 0, then they have the lock. If the lock is negative, then locker must spin until the lock is positive again. When there's contention, the locker spin for 2^16[] iterations waiting to get the lock. If it fails to get the lock in that time, it adds itself to the contention count in the lock and blocks on a per-cpu event channel. When unlocking the spinlock, the locker looks to see if there's anyone blocked waiting for the lock by checking for a non-zero waiter count. If there's a waiter, it traverses the per-cpu "lock_spinners" variable, which contains which lock each CPU is waiting on. It picks one CPU waiting on the lock and sends it an event to wake it up. This allows efficient fast-path spinlock operation, while allowing spinning vcpus to give up their processor time while waiting for a contended lock. [] 2^16 iterations is threshold at which 98% locks have been taken according to Thomas Friebel's Xen Summit talk "Preventing Guests from Spinning Around". Therefore, we'd expect the lock and unlock slow paths will only be entered 2% of the time. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <clameter@linux-foundation.org> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Virtualization <virtualization@lists.linux-foundation.org> Cc: Xen devel <xen-devel@lists.xensource.com> Cc: Thomas Friebel <thomas.friebel@amd.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:53 +02:00
Jeremy Fitzhardinge	56397f8dad	xen: use lock-byte spinlock implementation Switch to using the lock-byte spinlock implementation, to avoid the worst of the performance hit from ticket locks. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <clameter@linux-foundation.org> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Virtualization <virtualization@lists.linux-foundation.org> Cc: Xen devel <xen-devel@lists.xensource.com> Cc: Thomas Friebel <thomas.friebel@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:53 +02:00
Jeremy Fitzhardinge	8efcbab674	paravirt: introduce a "lock-byte" spinlock implementation Implement a version of the old spinlock algorithm, in which everyone spins waiting for a lock byte. In order to be compatible with the ticket-lock's use of a zero initializer, this uses the convention of '0' for unlocked and '1' for locked. This algorithm is much better than ticket locks in a virtual envionment, because it doesn't interact badly with the vcpu scheduler. If there are multiple vcpus spinning on a lock and the lock is released, the next vcpu to be scheduled will take the lock, rather than cycling around until the next ticketed vcpu gets it. To use this, you must call paravirt_use_bytelocks() very early, before any spinlocks have been taken. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <clameter@linux-foundation.org> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Virtualization <virtualization@lists.linux-foundation.org> Cc: Xen devel <xen-devel@lists.xensource.com> Cc: Thomas Friebel <thomas.friebel@amd.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:53 +02:00
Jeremy Fitzhardinge	74d4affde8	x86/paravirt: add hooks for spinlock operations Ticket spinlocks have absolutely ghastly worst-case performance characteristics in a virtual environment. If there is any contention for physical CPUs (ie, there are more runnable vcpus than cpus), then ticket locks can cause the system to end up spending 90+% of its time spinning. The problem is that (v)cpus waiting on a ticket spinlock will be granted access to the lock in strict order they got their tickets. If the hypervisor scheduler doesn't give the vcpus time in that order, they will burn timeslices waiting for the scheduler to give the right vcpu some time. In the worst case it could take O(n^2) vcpu scheduler timeslices for everyone waiting on the lock to get it, not counting new cpus trying to take the lock while the log-jam is sorted out. These hooks allow a paravirt backend to replace the spinlock implementation. At the very least, this could revert the implementation back to the old lock algorithm, which allows the next scheduled vcpu to take the lock, and has basically fairly good performance. It also allows the spinlocks to take advantages of the hypervisor features to make locks more efficient (spin and block, for example). The cost to native execution is an extra direct call when using a spinlock function. There's no overhead if CONFIG_PARAVIRT is turned off. The lock structure is fixed at a single "unsigned int", initialized to zero, but the spinlock implementation can use it as it wishes. Thanks to Thomas Friebel's Xen Summit talk "Preventing Guests from Spinning Around" for pointing out this problem. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <clameter@linux-foundation.org> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Virtualization <virtualization@lists.linux-foundation.org> Cc: Xen devel <xen-devel@lists.xensource.com> Cc: Thomas Friebel <thomas.friebel@amd.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:52 +02:00
Jeremy Fitzhardinge	094029479b	x86_64: adjust exception frame on paranoid exceptions Exceptions using paranoidentry need to have their exception frames adjusted explicitly. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-16 11:08:58 +02:00
Jeremy Fitzhardinge	d5303b811b	x86: xen: no need to disable vdso32 Now that the vdso32 code can cope with both syscall and sysenter missing for 32-bit compat processes, just disable the features without disabling vdso altogether. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-16 11:08:44 +02:00
Jeremy Fitzhardinge	6a52e4b1cd	x86_64: further cleanup of 32-bit compat syscall mechanisms AMD only supports "syscall" from 32-bit compat usermode. Intel and Centaur(?) only support "sysenter" from 32-bit compat usermode. Set the X86 feature bits accordingly, and set up the vdso in accordance with those bits. On the offchance we run on in a 64-bit environment which supports neither syscall nor sysenter from 32-bit mode, then fall back to the int $0x80 vdso. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-16 11:08:27 +02:00
Ingo Molnar	71415c6a08	x86, xen, vdso: fix build error fix: arch/x86/xen/built-in.o: In function `xen_enable_syscall': (.cpuinit.text+0xdb): undefined reference to `sysctl_vsyscall32' Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:07:58 +02:00
Jeremy Fitzhardinge	62541c3766	xen64: disable 32-bit syscall/sysenter if not supported. Old versions of Xen (3.1 and before) don't support sysenter or syscall from 32-bit compat userspaces. If we can't set the appropriate syscall callback, then disable the corresponding feature bit, which will cause the vdso32 setup to fall back appropriately. Linux assumes that syscall is always available to 32-bit userspace, and installs it by default if sysenter isn't available. In that case, we just disable vdso altogether, forcing userspace libc to fall back to int $0x80. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:07:44 +02:00
Ingo Molnar	6596f24223	Revert "x86_64: there's no need to preallocate level1_fixmap_pgt" This reverts commit 033786969d1d1b5af12a32a19d3a760314d05329. Suresh Siddha reported that this broke booting on his 2GB testbox. Reported-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:07:30 +02:00
Ingo Molnar	b3fe124389	xen64: fix build error on 32-bit + !HIGHMEM fix: arch/x86/xen/enlighten.c: In function 'xen_set_fixmap': arch/x86/xen/enlighten.c:1127: error: 'FIX_KMAP_BEGIN' undeclared (first use in this function) arch/x86/xen/enlighten.c:1127: error: (Each undeclared identifier is reported only once arch/x86/xen/enlighten.c:1127: error: for each function it appears in.) arch/x86/xen/enlighten.c:1127: error: 'FIX_KMAP_END' undeclared (first use in this function) make[1]: * [arch/x86/xen/enlighten.o] Error 1 make: * [arch/x86/xen/enlighten.o] Error 2 FIX_KMAP_BEGIN is only available on HIGHMEM. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:07:02 +02:00
Jeremy Fitzhardinge	51dd660a2c	xen: update Kconfig to allow 64-bit Xen Allow Xen to be enabled on 64-bit. Also extend domain size limit from 8 GB (on 32-bit) to 32 GB on 64-bit. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:06:34 +02:00
Jeremy Fitzhardinge	1153968a48	xen: implement Xen write_msr operation 64-bit uses MSRs for important things like the base for fs and gs-prefixed addresses. It's more efficient to use a hypercall to update these, rather than go via the trap and emulate path. Other MSR writes are just passed through; in an unprivileged domain they do nothing, but it might be useful later. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:06:20 +02:00
Jeremy Fitzhardinge	bf18bf94dc	xen64: set up userspace syscall patch 64-bit userspace expects the vdso to be mapped at a specific fixed address, which happens to be in the middle of the kernel address space. Because we have split user and kernel pagetables, we need to make special arrangements for the vsyscall mapping to appear in the kernel part of the user pagetable. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:06:06 +02:00
Jeremy Fitzhardinge	6fcac6d305	xen64: set up syscall and sysenter entrypoints for 64-bit We set up entrypoints for syscall and sysenter. sysenter is only used for 32-bit compat processes, whereas syscall can be used in by both 32 and 64-bit processes. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:05:52 +02:00
Jeremy Fitzhardinge	d6182fbf04	xen64: allocate and manage user pagetables Because the x86_64 architecture does not enforce segment limits, Xen cannot protect itself with them as it does in 32-bit mode. Therefore, to protect itself, it runs the guest kernel in ring 3. Since it also runs the guest userspace in ring3, the guest kernel must maintain a second pagetable for its userspace, which does not map kernel space. Naturally, the guest kernel pagetables map both kernel and userspace. The userspace pagetable is attached to the corresponding kernel pagetable via the pgd's page->private field. It is allocated and freed at the same time as the kernel pgd via the paravirt_pgd_alloc/free hooks. Fortunately, the user pagetable is almost entirely shared with the kernel pagetable; the only difference is the pgd page itself. set_pgd will populate all entries in the kernel pagetable, and also set the corresponding user pgd entry if the address is less than STACK_TOP_MAX. The user pagetable must be pinned and unpinned with the kernel one, but because the pagetables are aliased, pgd_walk() only needs to be called on the kernel pagetable. The user pgd page is then pinned/unpinned along with the kernel pgd page. xen_write_cr3 must write both the kernel and user cr3s. The init_mm.pgd pagetable never has a user pagetable allocated for it, because it can never be used while running usermode. One awkward area is that early in boot the page structures are not available. No user pagetable can exist at that point, but it complicates the logic to avoid looking at the page structure. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:05:38 +02:00
Eduardo Habkost	8a95408e18	xen64: Clear %fs on xen_load_tls() We need to do this, otherwise we can get a GPF on hypercall return after TLS descriptor is cleared but %fs is still pointing to it. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:04:55 +02:00
Jeremy Fitzhardinge	4a5c3e77f7	xen64: implement failsafe callback Implement the failsafe callback, so that iret and segment register load exceptions are reported to the kernel. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:04:41 +02:00
Jeremy Fitzhardinge	b7c3c5c159	xen: make sure the kernel command line is right Point the boot params cmd_line_ptr to the domain-builder-provided command line. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:04:13 +02:00
Jeremy Fitzhardinge	5deb30d194	xen: rework pgd_walk to deal with 32/64 bit Rewrite pgd_walk to deal with 64-bit address spaces. There are two notible features of 64-bit workspaces: 1. The physical address is only 48 bits wide, with the upper 16 bits being sign extension; kernel addresses are negative, and userspace is positive. 2. The Xen hypervisor mapping is at the negative-most address, just above the sign-extension hole. 1. means that we can't easily use addresses when traversing the space, since we must deal with sign extension. This rewrite expresses everything in terms of pgd/pud/pmd indices, which means we don't need to worry about the exact configuration of the virtual memory space. This approach works equally well in 32-bit. To deal with 2, assume the hole is between the uppermost userspace address and PAGE_OFFSET. For 64-bit this skips the Xen mapping hole. For 32-bit, the hole is zero-sized. In all cases, the uppermost kernel address is FIXADDR_TOP. A side-effect of this patch is that the upper boundary is actually handled properly, exposing a long-standing bug in 32-bit, which failed to pin kernel pmd page. The kernel pmd is not shared, and so must be explicitly pinned, even though the kernel ptes are shared and don't need pinning. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:03:59 +02:00
Eduardo Habkost	a8fc1089e4	xen64: implement xen_load_gs_index() xen-64: implement xen_load_gs_index() Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:03:45 +02:00
Jeremy Fitzhardinge	0725cbb977	xen64: add identity irq->vector map The x86_64 interrupt subsystem is oriented towards vectors, as opposed to a flat irq space as it is in x86-32. This patch adds a simple identity irq->vector mapping so that we can continue to feed irqs into do_IRQ() and get a good result. Ideally x86_32 will unify with the 64-bit code and use vectors too. At that point we can move to mapping event channels to vectors, which will allow us to economise on irqs (so per-cpu event channels can share irqs, rather than having to allocte one per cpu, for example). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:03:16 +02:00
Jeremy Fitzhardinge	88459d4c7e	xen64: register callbacks in arch-independent way Use callback_op hypercall to register callbacks in a 32/64-bit independent way (64-bit doesn't need a code segment, but that detail is hidden in XEN_CALLBACK). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:03:01 +02:00
Jeremy Fitzhardinge	952d1d7055	xen64: add pvop for swapgs swapgs is a no-op under Xen, because the hypervisor makes sure the right version of %gs is current when switching between user and kernel modes. This means that the swapgs "implementation" can be inlined and used when the stack is unsafe (usermode). Unfortunately, it means that disabling patching will result in a non-booting kernel... Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:02:46 +02:00
Jeremy Fitzhardinge	997409d3d0	xen64: deal with extra words Xen pushes onto exception frames Xen pushes two extra words containing the values of rcx and r11. This pvop hook copies the words back into their appropriate registers, and cleans them off the stack. This leaves the stack in native form, so the normal handler can run unchanged. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:02:31 +02:00
Eduardo Habkost	e176d367d0	xen64: xen_write_idt_entry() and cvt_gate_to_trap() Changed to use the (to-be-)unified descriptor structs. Signed-off-by: Eduardo Habkost <ehabkost@Rawhide-64.localdomain> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:02:15 +02:00
Jeremy Fitzhardinge	836fe2f291	xen: use set_pte_vaddr Make Xen's set_pte_mfn() use set_pte_vaddr rather than copying it. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:02:01 +02:00
Jeremy Fitzhardinge	8745f8b0b9	xen64: defer setting pagetable alloc/release ops We need to wait until the page structure is available to use the proper pagetable page alloc/release operations, since they use struct page to determine if a pagetable is pinned. This happened to work in 32bit because nobody allocated new pagetable pages in the interim between xen_pagetable_setup_done and xen_post_allocator_init, but the 64-bit kenrel needs to allocate more pagetable levels. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:01:45 +02:00
Jeremy Fitzhardinge	4560a2947e	xen: set num_processors Someone's got to do it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:01:31 +02:00
Jeremy Fitzhardinge	ce803e705f	xen64: use arbitrary_virt_to_machine for xen_set_pmd When building initial pagetables in 64-bit kernel the pud/pmd pointer may be in ioremap/fixmap space, so we need to walk the pagetable to look up the physical address. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:01:17 +02:00
Jeremy Fitzhardinge	ebd879e397	xen: fix truncation of machine address arbitrary_virt_to_machine can truncate a machine address if its above 4G. Cast the problem away. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:01:03 +02:00
Jeremy Fitzhardinge	39dbc5bd34	xen32: create initial mappings like 64-bit Rearrange the pagetable initialization to share code with the 64-bit kernel. Rather than deferring anything to pagetable_setup_start, just set up an initial pagetable in swapper_pg_dir early at startup, and create an additional 8MB of physical memory mappings. This matches the native head_32.S mappings to a large degree, and allows the rest of the pagetable setup to continue without much Xen vs. native difference. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:00:49 +02:00
Jeremy Fitzhardinge	d114e1981c	xen64: map an initial chunk of physical memory Early in boot, map a chunk of extra physical memory for use later on. We need a pool of mapped pages to allocate further pages to construct pagetables mapping all physical memory. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:00:35 +02:00
Jeremy Fitzhardinge	22911b3f1c	xen64: 64-bit starts using set_pte from very early It also doesn't need the 32-bit hack version of set_pte for initial pagetable construction, so just make it use the real thing. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:00:21 +02:00
Jeremy Fitzhardinge	084a2a4e76	xen64: early mapping setup Set up the initial pagetables to map the kernel mapping into the physical mapping space. This makes __va() usable, since it requires physical mappings. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:00:07 +02:00
Jeremy Fitzhardinge	3d75e1b8ef	xen64: add hypervisor callbacks for events, etc Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:59:52 +02:00
Jeremy Fitzhardinge	7d087b68d6	xen: cpu_detect is 32-bit only Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:59:38 +02:00
Jeremy Fitzhardinge	15664f968a	xen64: use set_fixmap for shared_info structure Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:59:24 +02:00
Jeremy Fitzhardinge	cdacc1278b	xen64: add 64-bit assembler Split xen-asm into 32- and 64-bit files, and implement the 64-bit variants. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:59:09 +02:00
Jeremy Fitzhardinge	555cf2b580	xen64: add asm-offsets Add Xen vcpu_info offsets to asm-offsets_64. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:58:55 +02:00
Jeremy Fitzhardinge	8c5e5ac32f	xen64: add xen-head code to head_64.S Add the Xen entrypoint and ELF notes to head_64.S. Adapts xen-head.S to compile either 32-bit or 64-bit. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:58:41 +02:00
Jeremy Fitzhardinge	c7b75947f8	xen64: smp.c compile hacking A number of random changes to make xen/smp.c compile in 64-bit mode. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>a Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:58:27 +02:00
Jeremy Fitzhardinge	5b09b2876e	x86_64: add workaround for no %gs-based percpu As a stopgap until Mike Travis's x86-64 gs-based percpu patches are ready, provide workaround functions for x86_read/write_percpu for Xen's use. Specifically, this means that we can't really make use of vcpu placement, because we can't use a single gs-based memory access to get to vcpu fields. So disable all that for now. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:58:13 +02:00
Jeremy Fitzhardinge	a9e7062d73	xen: move smp setup into smp.c Move all the smp_ops setup into smp.c, allowing a lot of things to become static. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:57:59 +02:00
Jeremy Fitzhardinge	ce87b3d326	xen64: get active_mm from the pda x86_64 stores the active_mm in the pda, so fetch it from there. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:57:45 +02:00
Jeremy Fitzhardinge	f5d36de069	xen64: random ifdefs to mask out 32-bit only code Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:57:30 +02:00
Jeremy Fitzhardinge	f6e587325b	xen64: add extra pv_mmu_ops We need extra pv_mmu_ops for 64-bit, to deal with the extra level of pagetable. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:57:16 +02:00
Jeremy Fitzhardinge	7077c33d81	xen: make ELF notes work for 32 and 64 bit Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:56:32 +02:00
Jeremy Fitzhardinge	48b5db2062	xen64: define asm/xen/interface for 64-bit Copy 64-bit definitions of various interface structures into place. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:56:18 +02:00
Jeremy Fitzhardinge	851fa3c4e7	xen: define set_pte from the outset We need set_pte to work from a relatively early point, so enable it from the start. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:56:04 +02:00
Isaku Yamahata	ad55db9fed	xen: add xen_arch_resume()/xen_timer_resume hook for ia64 support add xen_timer_resume() hook. Timer resume should be done after event channel is resumed. add xen_arch_resume() hook when ipi becomes usable after resume. After resume, some cpu specific resource must be reinitialized on ia64 that can't be set by another cpu. However available hooks is run once on only one cpu so that ipi has to be used. During stop_machine_run() ipi can't be used because interrupt is masked. So add another hook after stop_machine_run(). Another approach might be use resume hook which is run by device_resume(). However device_resume() may be executed on suspend error recovery path. So it is necessary to determine whether it is executed on real resume path or error recovery path. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:55:50 +02:00
Jeremy Fitzhardinge	8ba6c2b095	xen: print backtrace on multicall failure Print a backtrace if a multicall fails, to help with debugging. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:55:21 +02:00
Jeremy Fitzhardinge	7c33b1e6ee	x86_64: unstatic get_local_pda This allows Xen's xen_cpu_up() to allocate a pda for the new CPU. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:55:07 +02:00
Jeremy Fitzhardinge	360c044eb1	x86_64: adjust exception frame in ia32entry The 32-bit compat int $0x80 entrypoint needs exception frame adjustment. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:54:53 +02:00
Jeremy Fitzhardinge	cbcd79c2e5	x86: use __page_aligned_data/bss Update arch/x86's use of page-aligned variables. The change to arch/x86/xen/mmu.c fixes an actual bug, but the rest are cleanups and to set a precedent. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:54:39 +02:00
Jeremy Fitzhardinge	87b935a0ef	x86: clean up formatting of __switch_to process_64.c:__switch_to has some very old strange formatting, some of it dating back to pre-git. Fix it up. No functional changes. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:54:25 +02:00
Jeremy Fitzhardinge	8840c0ccd7	x86_64: there's no need to preallocate level1_fixmap_pgt Early fixmap will allocate its own L1 pagetable page for fixmap mappings, so there's no need to preallocate one. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:54:11 +02:00
Eduardo Habkost	c1f2f09ef6	pvops-64: call paravirt_post_allocator_init() on setup_arch() Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:53:57 +02:00
Eduardo Habkost	a312b37b2a	x86/paravirt: call paravirt_pagetable_setup_{start, done} Call paravirt_pagetable_setup_{start,done} These paravirt_ops functions were not being called on x86_64. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:53:43 +02:00
Linus Torvalds	fafa3a3f16	Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix TSC build error on 32bit	2008-07-15 16:29:18 -07:00
Alok Kataria	809d9a8f93	x86/PCI: ACPI based PCI gap calculation Using ACPI to find free address space allows us to find a gap for the unallocated PCI resources or MMIO resources for hotplug devices within the BIOS allowed PCI regions. It works by evaluating the _CRS object under PCI0 looking for producer resources. Then searches the e820 memory space for a gap within these producer resources. Signed-off-by: Alok N Kataria <akataria@vmware.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-07-15 15:49:03 -07:00
Ingo Molnar	82638844d9	Merge branch 'linus' into cpus4096 Conflicts: arch/x86/xen/smp.c kernel/sched_rt.c net/iucv/iucv.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 00:29:07 +02:00
Thomas Gleixner	431ceb83f7	x86: fix TSC build error on 32bit Dave Hansen reported a build error on 32bit which went unnoticed as newer gcc versions seem to optimize unused static functions away before compiling them. Make vread_tsc() depend on CONFIG_X86_64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-07-15 22:46:47 +02:00
Ingo Molnar	b3c9816b9f	generic-ipi: merge fix fix merge fallout: arch/x86/pci/amd_bus.c: In function ‘enable_pci_io_ecs': arch/x86/pci/amd_bus.c:581: error: too many arguments to function ‘on_each_cpu' Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-15 22:03:56 +02:00
Ingo Molnar	1a781a777b	Merge branch 'generic-ipi' into generic-ipi-for-linus Conflicts: arch/powerpc/Kconfig arch/s390/kernel/time.c arch/x86/kernel/apic_32.c arch/x86/kernel/cpu/perfctr-watchdog.c arch/x86/kernel/i8259_64.c arch/x86/kernel/ldt.c arch/x86/kernel/nmi_64.c arch/x86/kernel/smpboot.c arch/x86/xen/smp.c include/asm-x86/hw_irq_32.h include/asm-x86/hw_irq_64.h include/asm-x86/mach-default/irq_vectors.h include/asm-x86/mach-voyager/irq_vectors.h include/asm-x86/smp.h kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-15 21:55:59 +02:00
Linus Torvalds	da6e88f496	Merge branch 'timers/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: add PCI ID for 6300ESB force hpet x86: add another PCI ID for ICH6 force-hpet kernel-paramaters: document pmtmr= command line option acpi_pm clccksource: fix printk format warning nohz: don't stop idle tick if softirqs are pending. pmtmr: allow command line override of ioport nohz: reduce jiffies polling overhead hrtimer: Remove unused variables in ktime_divns() hrtimer: remove warning in hres_timers_resume posix-timers: print RT watchdog message	2008-07-15 10:39:57 -07:00
Linus Torvalds	af5329cdf5	Merge branch 'core/stacktrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core/stacktrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: generic-ipi: powerpc/generic-ipi tree build failure stacktrace: fix build failure on sparc64 stacktrace: export save_stack_trace[_tsk] stacktrace: fix modular build, export print_stack_trace and save_stack_trace backtrace: replace timer with tasklet + completions stacktrace: add saved stack traces to backtrace self-test stacktrace: print_stack_trace() cleanup debugging: make stacktrace independent from DEBUG_KERNEL stacktrace: don't crash on invalid stack trace structs	2008-07-15 10:31:35 -07:00
Thomas Gleixner	aba3728ce2	x86: sanitize Kconfig Set default n for MEMTEST and MTRR_SANITIZER and fix the help texts. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-07-15 15:30:28 +02:00
Ingo Molnar	91d0322bef	Merge branch 'linus' into x86/urgent	2008-07-15 13:45:59 +02:00
Peter Zijlstra	d54191b85e	Kprobe smoke test lockdep warning On Mon, 2008-04-21 at 18:54 -0400, Masami Hiramatsu wrote: > Thank you for reporting. > > Actually, kprobes tries to fixup thread's flags in post_kprobe_handler > (which is called from kprobe_exceptions_notify) by > trace_hardirqs_fixup_flags(pt_regs->flags). However, even the irq flag > is set in pt_regs->flags, true hardirq is still off until returning > from do_debug. Thus, lockdep assumes that hardirq is off without annotation. > > IMHO, one possible solution is that fixing hardirq flags right after > notify_die in do_debug instead of in post_kprobe_handler. My reply to BZ 10489: > [ 2.707509] Kprobe smoke test started > [ 2.709300] ------------[ cut here ]------------ > [ 2.709420] WARNING: at kernel/lockdep.c:2658 check_flags+0x4d/0x12c() > [ 2.709541] Modules linked in: > [ 2.709588] Pid: 1, comm: swapper Not tainted 2.6.25.jml.057 #1 > [ 2.709588] [<c0126acc>] warn_on_slowpath+0x41/0x51 > [ 2.709588] [<c010bafc>] ? save_stack_trace+0x1d/0x3b > [ 2.709588] [<c0140a83>] ? save_trace+0x37/0x89 > [ 2.709588] [<c011987d>] ? kernel_map_pages+0x103/0x11c > [ 2.709588] [<c0109803>] ? native_sched_clock+0xca/0xea > [ 2.709588] [<c0142958>] ? mark_held_locks+0x41/0x5c > [ 2.709588] [<c0382580>] ? kprobe_exceptions_notify+0x322/0x3af > [ 2.709588] [<c0142aff>] ? trace_hardirqs_on+0xf1/0x119 > [ 2.709588] [<c03825b3>] ? kprobe_exceptions_notify+0x355/0x3af > [ 2.709588] [<c0140823>] check_flags+0x4d/0x12c > [ 2.709588] [<c0143c9d>] lock_release+0x58/0x195 > [ 2.709588] [<c038347c>] ? __atomic_notifier_call_chain+0x0/0x80 > [ 2.709588] [<c03834d6>] __atomic_notifier_call_chain+0x5a/0x80 > [ 2.709588] [<c0383508>] atomic_notifier_call_chain+0xc/0xe > [ 2.709588] [<c013b6d4>] notify_die+0x2d/0x2f > [ 2.709588] [<c038168a>] do_debug+0x67/0xfe > [ 2.709588] [<c0381287>] debug_stack_correct+0x27/0x30 > [ 2.709588] [<c01564c0>] ? kprobe_target+0x1/0x34 > [ 2.709588] [<c0156572>] ? init_test_probes+0x50/0x186 > [ 2.709588] [<c04fae48>] init_kprobes+0x85/0x8c > [ 2.709588] [<c04e947b>] kernel_init+0x13d/0x298 > [ 2.709588] [<c04e933e>] ? kernel_init+0x0/0x298 > [ 2.709588] [<c04e933e>] ? kernel_init+0x0/0x298 > [ 2.709588] [<c0105ef7>] kernel_thread_helper+0x7/0x10 > [ 2.709588] ======================= > [ 2.709588] ---[ end trace 778e504de7e3b1e3 ]--- > [ 2.709588] possible reason: unannotated irqs-off. > [ 2.709588] irq event stamp: 370065 > [ 2.709588] hardirqs last enabled at (370065): [<c0382580>] kprobe_exceptions_notify+0x322/0x3af > [ 2.709588] hardirqs last disabled at (370064): [<c0381bb7>] do_int3+0x1d/0x7d > [ 2.709588] softirqs last enabled at (370050): [<c012b464>] __do_softirq+0xfa/0x100 > [ 2.709588] softirqs last disabled at (370045): [<c0107438>] do_softirq+0x74/0xd9 > [ 2.714751] Kprobe smoke test passed successfully how I love this stuff... Ok, do_debug() is a trap, this can happen at any time regardless of the machine's IRQ state. So the first thing we do is fix up the IRQ state. Then we call this die notifier stuff; and return with messed up IRQ state... YAY. So, kprobes fudges it.. notify_die(DIE_DEBUG) kprobe_exceptions_notify() post_kprobe_handler() modify regs->flags trace_hardirqs_fixup_flags(regs->flags); <--- must be it So what's the use of modifying flags if they're not meant to take effect at some point. /me tries to reproduce issue; enable kprobes test thingy && boot OK, that reproduces.. So the below makes it work - but I'm not getting this code; at the time I wrote that stuff I CC'ed each and every kprobe maintainer listed in the usual places but got no reposonse - can some please explain this stuff to me? Are the saved flags only for the TF bit or are they made in full effect later (and if so, where) ? Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-15 11:18:03 +02:00
Linus Torvalds	5a86102248	Merge branch 'for-2.6.27' of git://git.infradead.org/users/dwmw2/firmware-2.6 * 'for-2.6.27' of git://git.infradead.org/users/dwmw2/firmware-2.6: (64 commits) firmware: convert sb16_csp driver to use firmware loader exclusively dsp56k: use request_firmware edgeport-ti: use request_firmware() edgeport: use request_firmware() vicam: use request_firmware() dabusb: use request_firmware() cpia2: use request_firmware() ip2: use request_firmware() firmware: convert Ambassador ATM driver to request_firmware() whiteheat: use request_firmware() ti_usb_3410_5052: use request_firmware() emi62: use request_firmware() emi26: use request_firmware() keyspan_pda: use request_firmware() keyspan: use request_firmware() ttusb-budget: use request_firmware() kaweth: use request_firmware() smctr: use request_firmware() firmware: convert ymfpci driver to use firmware loader exclusively firmware: convert maestro3 driver to use firmware loader exclusively ... Fix up trivial conflicts with BKL removal in drivers/char/dsp56k.c and drivers/char/ip2/ip2main.c manually.	2008-07-14 16:54:07 -07:00
David Woodhouse	751851af7a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git Conflicts: sound/pci/Kconfig	2008-07-14 15:51:11 -07:00
Linus Torvalds	d18bb9a548	Merge branch 'core/rodata' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core/rodata' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: move BUG_TABLE into RODATA	2008-07-14 15:28:10 -07:00
Linus Torvalds	4bb0057f99	Merge branch 'core/printk' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core/printk' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, generic: mark early_printk as asmlinkage printk: export console_drivers printk: remember the message level for multi-line output printk: refactor processing of line severity tokens printk: don't prefer unsuited consoles on registration printk: clean up recursion check related static variables namespacecheck: more kernel/printk.c fixes namespacecheck: fix kernel printk.c	2008-07-14 15:27:43 -07:00
Linus Torvalds	116a9fb3ed	x86: MMIOTRACE should not default to on Even the help-text makes it clear that normal people shouldn't enable it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-14 15:03:25 -07:00
Linus Torvalds	e18425a0ab	Merge branch 'tracing/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (228 commits) ftrace: build fix for ftraced_suspend ftrace: separate out the function enabled variable ftrace: add ftrace_kill_atomic ftrace: use current CPU for function startup ftrace: start wakeup tracing after setting function tracer ftrace: check proper config for preempt type ftrace: trace schedule ftrace: define function trace nop ftrace: move sched_switch enable after markers ftrace: prevent ftrace modifications while being kprobe'd, v2 fix "ftrace: store mcount address in rec->ip" mmiotrace broken in linux-next (8-bit writes only) ftrace: avoid modifying kprobe'd records ftrace: freeze kprobe'd records kprobes: enable clean usage of get_kprobe ftrace: store mcount address in rec->ip ftrace: build fix with gcc 4.3 namespacecheck: fixes ftrace: fix "notrace" filtering priority ftrace: fix printout ...	2008-07-14 14:49:54 -07:00
Linus Torvalds	d1794f2c5b	Merge branch 'bkl-removal' of git://git.lwn.net/linux-2.6 * 'bkl-removal' of git://git.lwn.net/linux-2.6: (146 commits) IB/umad: BKL is not needed for ib_umad_open() IB/uverbs: BKL is not needed for ib_uverbs_open() bf561-coreb: BKL unneeded for open() Call fasync() functions without the BKL snd/PCM: fasync BKL pushdown ipmi: fasync BKL pushdown ecryptfs: fasync BKL pushdown Bluetooth VHCI: fasync BKL pushdown tty_io: fasync BKL pushdown tun: fasync BKL pushdown i2o: fasync BKL pushdown mpt: fasync BKL pushdown Remove BKL from remote_llseek v2 Make FAT users happier by not deadlocking x86-mce: BKL pushdown vmwatchdog: BKL pushdown vmcp: BKL pushdown via-pmu: BKL pushdown uml-random: BKL pushdown uml-mmapper: BKL pushdown ...	2008-07-14 14:48:31 -07:00
Jonathan Corbet	2fceef397f	Merge commit 'v2.6.26' into bkl-removal	2008-07-14 15:29:34 -06:00
Matthew Wilcox	beef3129b3	x86/PCI: Fix PCI config space for domains > 0 John Keller reports that PCI config space access is broken on machines with more than one domain. conf1 accesses only work for domain 0, so make sure we check the domain number in the raw routines before trying conf1. Reported-by: John Keller <jpk@sgi.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-07-14 14:23:28 -07:00
Linus Torvalds	a3da5bf84a	Merge branch 'x86/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (821 commits) x86: make 64bit hpet_set_mapping to use ioremap too, v2 x86: get x86_phys_bits early x86: max_low_pfn_mapped fix #4 x86: change _node_to_cpumask_ptr to return const ptr x86: I/O APIC: remove an IRQ2-mask hack x86: fix numaq_tsc_disable calling x86, e820: remove end_user_pfn x86: max_low_pfn_mapped fix, #3 x86: max_low_pfn_mapped fix, #2 x86: max_low_pfn_mapped fix, #1 x86_64: fix delayed signals x86: remove conflicting nx6325 and nx6125 quirks x86: Recover timer_ack lost in the merge of the NMI watchdog x86: I/O APIC: Never configure IRQ2 x86: L-APIC: Always fully configure IRQ0 x86: L-APIC: Set IRQ0 as edge-triggered x86: merge dwarf2 headers x86: use AS_CFI instead of UNWIND_INFO x86: use ignore macro instead of hash comment x86: use matching CFI_ENDPROC ...	2008-07-14 13:43:24 -07:00
Linus Torvalds	7daf705f36	Start using the new '%pS' infrastructure to print symbols This simplifies the code significantly, and was the whole point of the exercise. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-14 12:12:53 -07:00
H. Peter Anvin	065cb3dfe2	x86, suspend, acpi: correct and add comments about Big Real Mode Explain that we set up the descriptors for Big Real Mode, and why we do so. In particular, one system that is known to fail without it is the Lenovo X61. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-14 11:44:26 -07:00
H. Peter Anvin	3bf2e77453	x86, suspend, acpi: enter Big Real Mode The explanation for recent video BIOS suspend quirk failures is that the VESA BIOS expects to be entered in Big Real Mode (.limit = 0xffffffff) instead of ordinary Real Mode (.limit = 0xffff). This patch changes the segment descriptors to Big Real Mode instead. The segment descriptor registers (what Intel calls "segment cache") is always active. The only thing that changes based on CR0.PE is how it is loaded and the interpretation of the CS flags. The segment descriptor registers contain of the following sub-registers: selector (the "visible" part), base, limit and flags. In protected mode or long mode, they are loaded from descriptors (or fs.base or gs.base can be manipulated directly in long mode.) In real mode, the only thing changed by a segment register load is the selector and the base, where the base <- selector << 4. In particular, the limit and the flags are not changed. As far as the handling of the CS flags: a code segment cannot be writable in protected mode, whereas it is "just another segment" in real mode, so there is some kind of quirk that kicks in for this when CR0.PE <- 0. I'm not sure if this is accomplished by actually changing the cs.flags register or just changing the interpretation; it might be something that is CPU-specific. In particular, the Transmeta CPUs had an explicit "CS is writable if you're in real mode" override, so even if you had loaded CS with an execute-only segment it'd be writable (but not readable!) on return to real mode. I'm not at all sure if that is how other CPUs behave. Signed-off-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-14 18:16:09 +02:00

... 2 3 4 5 6 ...

3605 Commits