linux

Author	SHA1	Message	Date
Suresh Siddha	5734f62b66	x86, cpu: Enumerate xsaveopt Enumerate the xsaveopt feature. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100719230205.604014179@sbs-t61.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-19 16:50:40 -07:00
Suresh Siddha	40e1d7a4ff	x86, cpu: Add xsaveopt cpufeature Add cpu feature bit support for the XSAVEOPT instruction. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100719230205.523204988@sbs-t61.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-19 16:48:06 -07:00
Suresh Siddha	edb18f8ab0	x86, cpu: Make init_scattered_cpuid_features() consider cpuid subleaves Some cpuid features (like xsaveopt) are enumerated using cpuid subleaves. Extend init_scattered_cpuid_features() to take subleaf into account. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100719230205.439900717@sbs-t61.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-19 16:47:59 -07:00
Linus Torvalds	d0c6f62584	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pci, mrst: Add extra sanity check in walking the PCI extended cap chain x86: Fix x2apic preenabled system with kexec x86: Force HPET readback_cmp for all ATI chipsets	2010-07-19 13:19:32 -07:00
Kulikov Vasiliy	6c54aabd5e	x86/amd-iommu: Use for_each_pci_dev() Use for_each_pci_dev() to simplify the code. Signed-off-by: Kulikov Vasiliy <segooon@gmail.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2010-07-19 15:41:13 +02:00
Pavel Machek	a2531293db	update email address pavel@suse.cz no longer works, replace it with working address. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-07-19 10:56:54 +02:00
Dave Chinner	7f8275d0d6	mm: add context argument to shrinker callback The current shrinker implementation requires the registered callback to have global state to work from. This makes it difficult to shrink caches that are not global (e.g. per-filesystem caches). Pass the shrinker structure to the callback so that users can embed the shrinker structure in the context the shrinker needs to operate on and get back to it in the callback via container_of(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-07-19 14:56:17 +10:00
Linus Torvalds	a9f7f2e74a	Merge branch 'x86/kprobes' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland * 'x86/kprobes' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland: x86: kprobes: fix swapped segment registers in kretprobe	2010-07-18 15:13:30 -07:00
Roland McGrath	a197479848	x86: kprobes: fix swapped segment registers in kretprobe In commit `f007ea26`, the order of the %es and %ds segment registers got accidentally swapped, so synthesized 'struct pt_regs' frames have the two values inverted. It's almost sure that these values never matter, and that they also never differ. But wrong is wrong. Signed-off-by: Roland McGrath <roland@redhat.com>	2010-07-18 15:05:34 -07:00
Cliff Wickman	93a7ca0c3e	x86, UV: Initialize BAU MMRs only on hubs with cpus Remove the initialization of MMRs UVH_LB_BAU_SB_ACTIVATION_CONTROL and UVH_BAU_DATA_BROADCAST on UV hubs that have no active cpus. Such initialization on hubs with no active cpus would result in a kernel page fault. This is not of real high priority, because we don't have any such systems (with UV hubs that have no active cpus). But they will be coming. Signed-off-by: Cliff Wickman <cpw@sgi.com> LKML-Reference: <E1OZmZN-0006cW-RC@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-07-17 12:11:48 +02:00
Jacob Pan	f82c3d71d6	x86, pci, mrst: Add extra sanity check in walking the PCI extended cap chain The fixed bar capability structure is searched in PCI extended configuration space. We need to make sure there is a valid capability ID to begin with otherwise, the search code may stuck in a infinite loop which results in boot hang. This patch adds additional check for cap ID 0, which is also invalid, and indicates end of chain. End of chain is supposed to have all fields zero, but that doesn't seem to always be the case in the field. Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> LKML-Reference: <1279306706-27087-1-git-send-email-jacob.jun.pan@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-16 16:52:15 -07:00
Yinghai Lu	fd19dce7ac	x86: Fix x2apic preenabled system with kexec Found one x2apic system kexec loop test failed when CONFIG_NMI_WATCHDOG=y (old) or CONFIG_LOCKUP_DETECTOR=y (current tip) first kernel can kexec second kernel, but second kernel can not kexec third one. it can be duplicated on another system with BIOS preenabled x2apic. First kernel can not kexec second kernel. It turns out, when kernel boot with pre-enabled x2apic, it will not execute disable_local_APIC on shutdown path. when init_apic_mappings() is called in setup_arch, it will skip setting of apic_phys when x2apic_mode is set. ( x2apic_mode is much early check_x2apic()) Then later, disable_local_APIC() will bail out early because !apic_phys. So check !x2apic_mode in x2apic_mode in disable_local_APIC with !apic_phys. another solution could be updating init_apic_mappings() to set apic_phys even for preenabled x2apic system. Actually even for x2apic system, that lapic address is mapped already in early stage. BTW: is there any x2apic preenabled system with apicid of boot cpu > 255? Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4C3EB22B.3000701@kernel.org> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-16 16:49:41 -07:00
Bjorn Helgaas	58c84eda07	PCI: fall back to original BIOS BAR addresses If we fail to assign resources to a PCI BAR, this patch makes us try the original address from BIOS rather than leaving it disabled. Linux tries to make sure all PCI device BARs are inside the upstream PCI host bridge or P2P bridge apertures, reassigning BARs if necessary. Windows does similar reassignment. Before this patch, if we could not move a BAR into an aperture, we left the resource unassigned, i.e., at address zero. Windows leaves such BARs at the original BIOS addresses, and this patch makes Linux do the same. This is a bit ugly because we disable the resource long before we try to reassign it, so we have to keep track of the BIOS BAR address somewhere. For lack of a better place, I put it in the struct pci_dev. I think it would be cleaner to attempt the assignment immediately when the claim fails, so we could easily remember the original address. But we currently claim motherboard resources in the middle, after attempting to claim PCI resources and before assigning new PCI resources, and changing that is a fairly big job. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16263 Reported-by: Andrew <nitr0@seti.kr.ua> Tested-by: Andrew <nitr0@seti.kr.ua> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-07-16 11:39:48 -07:00
Jiri Slaby	f33ebbe9da	unistd: add __NR_prlimit64 syscall numbers Add __NR_prlimit64 syscall numbers to asm-generic. Add them also to asm-x86, both 32 and 64-bit. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com>	2010-07-16 09:52:32 +02:00
Joe Perches	b2691085d1	x86: Clean up arch/x86/kernel/cpu/mtrr/cleanup.c: use ";" not "," to terminate statements Also needed if pr_<level> becomes a bit more space efficient. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> LKML-Reference: <1277768808.29157.280.camel@Joe-Laptop.home> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-07-15 18:21:22 +02:00
Thomas Gleixner	08be97962b	x86: Force HPET readback_cmp for all ATI chipsets commit `30a564be` (x86, hpet: Restrict read back to affected ATI chipset) restricted the workaround for the HPET bug to SMX00 chipsets. This was reasonable as those were the only ones against which we ever got a bug report. Stephan Wolf reported now that this patch breaks his IXP400 based machine. Though it's confirmed to work on other IXP400 based systems. To error out on the safe side, we force the HPET readback workaround for all ATI SMbus class chipsets. Reported-by: Stephan Wolf <stephan@letzte-bankreihe.de> LKML-Reference: <alpine.LFD.2.00.1007142134140.3321@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Stephan Wolf <stephan@letzte-bankreihe.de> Acked-by: Borislav Petkov <borislav.petkov@amd.com>	2010-07-15 17:10:02 +02:00
Yinghai Lu	70b0d22d58	x86, setup: Only set early_serial_base after port is initialized putchar is using early_serial_base to check if port is initialized. So we only assign it after early_serial_init() is called, in case we need use VGA to debug early serial console. Also add display for port addr and baud. -v2: update to current tip Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4C3E0171.6050008@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-14 12:07:16 -07:00
Linus Torvalds	bcefc8d0d3	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: input: i8042 - add runtime check in x86's i8042_platform_init Revert "Input: fixup X86_MRST selects" Revert "Input: do not force selecting i8042 on Moorestown" x86, mrst: Add i8042_detect API for Moorestwon platform x86: Add i8042 pre-detection hook to x86_platform_ops x86, platform: Export x86_platform to modules	2010-07-13 17:31:11 -07:00
Linus Torvalds	177dd7e1eb	Merge branch 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: MMU: flush remote tlbs when overwriting spte with different pfn KVM: VMX: Fix host MSR_KERNEL_GS_BASE corruption	2010-07-13 17:30:49 -07:00
H. Peter Anvin	3b770a2128	x86, alternatives: BUG on encountering an invalid CPU feature number Make the alternatives-patching code BUG on encountering an invalid CPU feature number. Should have done this a long time ago. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Yinghai Lu <yinhai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <tip-df378ccfc4dd04e263426ad805516915874774aa@git.kernel.org>	2010-07-13 14:57:50 -07:00
H. Peter Anvin	df378ccfc4	x86, alternatives: Fix one more open-coded 8-bit alternative number Fix a missing case of an 8-bit alternative number, buried inside an assembly macro. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Reported-by: Yinghai Lu <yinhai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <4C3BDDA3.2060900@kernel.org>	2010-07-13 14:56:16 -07:00
Yinghai Lu	ce0aa5dd20	x86, setup: Make the setup code also accept console=uart8250 Make the boot code also accept the console=uart8250,io,0x2f8,115200n form of early console. Also add back simple_guess_base(), otherwise those simple_strtoull(,,0) are not going to work. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4C3CCE05.4090505@kernel.org> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-13 14:08:12 -07:00
Pekka Enberg	fa97bdf927	x86, setup: Early-boot serial I/O support This patch adds serial I/O support to the real-mode setup (very early boot) printf(). It's useful for debugging boot code when running Linux under KVM, for example. The actual code was lifted from early printk. Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1278835617-11368-1-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-12 14:46:00 -07:00
Xiao Guangrong	91546356d0	KVM: MMU: flush remote tlbs when overwriting spte with different pfn After remove a rmap, we should flush all vcpu's tlb Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-07-12 14:05:56 -03:00
Kenji Kaneshige	35be1b716a	x86, ioremap: Fix normal ram range check Check for normal RAM in x86 ioremap() code seems to not work for the last page frame in the specified physical address range. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <4C1AE6CD.1080704@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-09 11:42:11 -07:00
Kenji Kaneshige	ffa71f33a8	x86, ioremap: Fix incorrect physical address handling in PAE mode Current x86 ioremap() doesn't handle physical address higher than 32-bit properly in X86_32 PAE mode. When physical address higher than 32-bit is passed to ioremap(), higher 32-bits in physical address is cleared wrongly. Due to this bug, ioremap() can map wrong address to linear address space. In my case, 64-bit MMIO region was assigned to a PCI device (ioat device) on my system. Because of the ioremap()'s bug, wrong physical address (instead of MMIO region) was mapped to linear address space. Because of this, loading ioatdma driver caused unexpected behavior (kernel panic, kernel hangup, ...). Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <4C1AE680.7090408@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-09 11:42:03 -07:00
Ky Srinivasan	9279aa5506	x86: Export the symbol ms_hyperv This is needed so that the staging hyperv can properly access this symbol. Signed-off-by: K. Y. Srinivasan <ksrinivasan@novell.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-07-08 14:12:03 -07:00
Javier Martinez Canillas	5bbd4a336c	x86/apic/es7000_32: Remove unused variable In today's linux-next I got this compile warning: arch/x86/kernel/apic/es7000_32.c:132: warning: 'base' defined but not used Current patch solves the issue removing the unused variable. Signed-off-by: Javier Martinez Canillas <martinez.javier@gmail.com> Cc: Rakib Mullick <rakib.mullick@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Tejun Heo <tj@kernel.org> LKML-Reference: <1278546719.9020.4.camel@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-07-08 09:14:37 +02:00
H. Peter Anvin	bdc802dcca	x86, cpu: Support the features flags in new CPUID leaf 7 Intel has defined CPUID leaf 7 as the next set of feature flags (see the AVX specification, version 007). Add support for this new feature flags word. Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <tip-*@vger.kernel.org>	2010-07-07 17:29:18 -07:00
Feng Tang	6d2cce6201	x86, mrst: Add i8042_detect API for Moorestwon platform It will just return 0 as there is no i8042 controller Signed-off-by: Feng Tang <feng.tang@intel.com> LKML-Reference: <1278342202-10973-3-git-send-email-feng.tang@intel.com> Acked-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-07 17:05:06 -07:00
Feng Tang	c516ac5839	x86: Add i8042 pre-detection hook to x86_platform_ops Some x86 platforms like Intel MID platforms don't have i8042 controllers, and i8042 driver's probe to some legacy IO ports may hang the MID processor. With this hook, i8042 driver can runtime check and skip the probe when the pretection fail which also saves some probe time [ hpa note: this is currently a compile-time check, which breaks the i386 allyesconfig build. This patch series thus does fix a regression. ] Signed-off-by: Feng Tang <feng.tang@intel.com> LKML-Reference: <1278342202-10973-2-git-send-email-feng.tang@intel.com> Acked-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-07 17:05:06 -07:00
H. Peter Anvin	72550b3ae5	x86, platform: Export x86_platform to modules Export x86_platform to modules in preparation of using it for i8042 discovery control. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <1278342202-10973-1-git-send-email-feng.tang@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Feng Tang <feng.tang@intel.com> Cc: Dmitry Torokhov <dtor@mail.ru>	2010-07-07 17:05:05 -07:00
H. Peter Anvin	83a7a2ad2a	x86, alternatives: Use 16-bit numbers for cpufeature index We already have cpufeature indicies above 255, so use a 16-bit number for the alternatives index. This consumes a padding field and so doesn't add any size, but it means that abusing the padding field to create assembly errors on overflow no longer works. We can retain the test simply by redirecting it to the .discard section, however. [ v3: updated to include open-coded locations ] Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <tip-f88731e3068f9d1392ba71cc9f50f035d26a0d4f@git.kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-07-07 10:36:28 -07:00
H. Peter Anvin	24da9c26f3	x86, cpu: Add CPU flags for F16C and RDRND Add support for the newly documented F16C (16-bit floating point conversions) and RDRND (RDRAND instruction) CPU feature flags. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-07-07 10:15:12 -07:00
Linus Torvalds	47a716cf0c	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rbtree: Undo augmented trees performance damage and regression x86, Calgary: Limit the max PHB number to 256	2010-07-06 17:16:09 -07:00
Suresh Siddha	8e221b6db4	x86: Avoid unnecessary __clear_user() and xrstor in signal handling fxsave/xsave doesn't touch all the bytes in the memory layout used by these instructions. Specifically SW reserved (bytes 464..511) fields in the fxsave frame and the reserved fields in the xsave header. To present a clean context for the signal handling, just clear these fields instead of clearing the complete fxsave/xsave memory layout, when we dump these registers directly to the user signal frame. Also avoid the call to second xrstor (which inits the state not passed in the signal frame) in restore_user_xstate() if all the state has already been restored by the first xrstor. These changes improve the performance of signal handling(by ~3-5% as measured by the lat_sig). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1277249017.2847.85.camel@sbs-t61.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-06 16:31:04 -07:00
Avi Kivity	da38f43859	KVM: VMX: Fix host MSR_KERNEL_GS_BASE corruption enter_lmode() and exit_lmode() modify the guest's EFER.LMA before calling vmx_set_efer(). However, the latter function depends on the value of EFER.LMA to determine whether MSR_KERNEL_GS_BASE needs reloading, via vmx_load_host_state(). With EFER.LMA changing under its feet, it took the wrong choice and corrupted userspace's %gs. This causes 32-on-64 host userspace to fault. Fix not touching EFER.LMA; instead ask vmx_set_efer() to change it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-07-06 11:41:31 +03:00
Peter Zijlstra	b945d6b255	rbtree: Undo augmented trees performance damage and regression Reimplement augmented RB-trees without sprinkling extra branches all over the RB-tree code (which lives in the scheduler hot path). This approach is 'borrowed' from Fabio's BFQ implementation and relies on traversing the rebalance path after the RB-tree-op to correct the heap property for insertion/removal and make up for the damage done by the tree rotations. For insertion the rebalance path is trivially that from the new node upwards to the root, for removal it is that from the deepest node in the path from the to be removed node that will still be around after the removal. [ This patch also fixes a video driver regression reported by Ali Gholami Rudi - the memtype->subtree_max_end was updated incorrectly. ] Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Ali Gholami Rudi <ali@rudi.ir> Cc: Fabio Checconi <fabio@gandalf.sssup.it> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <1275414172.27810.27961.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-07-05 14:43:50 +02:00
Cyrill Gorcunov	39ef13a4ac	perf, x86: P4 PMU -- redesign cache events To support cache events we have reserved the low 6 bits in hw_perf_event::config (which is a part of CCCR register configuration actually). These bits represent Replay Event mertic enumerated in enum P4_PEBS_METRIC. The caller should not care about which exact bits should be set and how -- the caller just chooses one P4_PEBS_METRIC entity and puts it into the config. The kernel will track it and set appropriate additional MSR registers (metrics) when needed. The reason for this redesign was the PEBS enable bit, which should not be set until DS (and PEBS sampling) support will be implemented properly. TODO ==== - PEBS sampling (note it's tricky and works with _one_ counter only so for HT machines it will be not that easy to handle both threads) - tracking of PEBS registers state, a user might need to turn PEBS off completely (ie no PEBS enable, no UOP_tag) but some other event may need it, such events clashes and should not run simultaneously, at moment we just don't support such events - eventually export user space bits in separate header which will allow user apps to configure raw events more conveniently. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1278295769.9540.15.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-07-05 08:34:36 +02:00
Ingo Molnar	08f8ba0799	Merge commit 'v2.6.35-rc4' into perf/core Merge reason: Pick up the latest perf fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-07-05 08:30:58 +02:00
Linus Torvalds	3f7d7b4bde	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86: Fix incorrect branches event on AMD CPUs perf tools: Fix find tids routine by excluding "." and ".." x86: Send a SIGTRAP for user icebp traps	2010-07-04 20:20:53 -07:00
Vince Weaver	f287d332ce	perf, x86: Fix incorrect branches event on AMD CPUs While doing some performance counter validation tests on some assembly language programs I noticed that the "branches:u" count was very wrong on AMD machines. It looks like the wrong event was selected. Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: <stable@kernel.org> LKML-Reference: <alpine.DEB.2.00.1007011526010.23160@cl320.eecs.utk.edu> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-07-03 15:19:34 +02:00
Alexander Duyck	7475271004	x86: Drop CONFIG_MCORE2 check around setting of NET_IP_ALIGN This patch removes the CONFIG_MCORE2 check from around NET_IP_ALIGN. It is based on a suggestion from Andi Kleen. The assumption is that there are not any x86 cores where unaligned access is really slow, and this change would allow for a performance improvement to still exist on configurations that are not necessarily optimized for Core 2. Cc: Andi Kleen <ak@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-01 22:45:54 -07:00
Ingo Molnar	0a54cec0c2	Merge branch 'linus' into core/rcu Conflicts: fs/fs-writeback.c Merge reason: Resolve the conflict Note, i picked the version from Linus's tree, which effectively reverts the fs-writeback.c bits of: `b97181f`: fs: remove all rcu head initializations, except on_stack initializations As the upstream changes to this file changed this code heavily and the first attempt to resolve the conflict resulted in a non-booting kernel. It's safer to re-try this portion of the commit cleanly. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-07-01 09:31:25 +02:00
Darrick J. Wong	d596043d71	x86, Calgary: Limit the max PHB number to 256 The x3950 family can have as many as 256 PCI buses in a single system, so change the limits to the maximum. Since there can only be 256 PCI buses in one domain, we no longer need the BUG_ON check. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> LKML-Reference: <20100701004519.GQ15515@tux1.beaverton.ibm.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-06-30 22:41:42 -07:00
Alexander Duyck	ea812ca1b0	x86: Align skb w/ start of cacheline on newer core 2/Xeon Arch x86 architectures can handle unaligned accesses in hardware, and it has been shown that unaligned DMA accesses can be expensive on Nehalem architectures. As such we should overwrite NET_IP_ALIGN to resolve this issue. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-30 14:34:09 -07:00
Frederic Weisbecker	a1e80fafc9	x86: Send a SIGTRAP for user icebp traps Before we had a generic breakpoint layer, x86 used to send a sigtrap for any debug event that happened in userspace, except if it was caused by lazy dr7 switches. Currently we only send such signal for single step or breakpoint events. However, there are three other kind of debug exceptions: - debug register access detected: trigger an exception if the next instruction touches the debug registers. We don't use it. - task switch, but we don't use tss. - icebp/int01 trap. This instruction (0xf1) is undocumented and generates an int 1 exception. Unlike single step through TF flag, it doesn't set the single step origin of the exception in dr6. icebp then used to be reported in userspace using trap signals but this have been incidentally broken with the new breakpoint code. Reenable this. Since this is the only debug event that doesn't set anything in dr6, this is all we have to check. This fixes a regression in Wine where World Of Warcraft got broken as it uses this for software protection checks purposes. And probably other apps do. Reported-and-tested-by: Alexandre Julliard <julliard@winehq.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: 2.6.33.x 2.6.34.x <stable@kernel.org>	2010-06-30 16:16:20 +02:00
Masami Hiramatsu	567a9fd867	kprobes/x86: Fix kprobes to skip prefixes correctly Fix resume_execution() and is_IF_modifier() to skip x86 instruction prefixes correctly by using x86 instruction attribute. Without this fix, resume_execution() can't handle instructions which have non-REX prefixes (REX prefixes are skipped). This will cause unexpected kernel panic by hitting bad address when a kprobe hits on two-byte ret (e.g. "repz ret" generated for Athlon/K8 optimization), because it just checks "repz" and can't recognize the "ret" instruction. These prefixes can be found easily with x86 instruction attribute. This patch introduces skip_prefixes() and uses it in resume_execution() and is_IF_modifier() to skip prefixes. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> LKML-Reference: <4C298A6E.8070609@hitachi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-29 10:43:41 +02:00
Tejun Heo	b71ab8c202	workqueue: increase max_active of keventd and kill current_is_keventd() Define WQ_MAX_ACTIVE and create keventd with max_active set to half of it which means that keventd now can process upto WQ_MAX_ACTIVE / 2 - 1 works concurrently. Unless some combination can result in dependency loop longer than max_active, deadlock won't happen and thus it's unnecessary to check whether current_is_keventd() before trying to schedule a work. Kill current_is_keventd(). (Lockdep annotations are broken. We need lock_map_acquire_read_norecurse()) Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com>	2010-06-29 10:07:14 +02:00
Thomas Gleixner	f384c954c9	Merge branch 'linus' into perf/core Reason: Further changes conflict with upstream fixes Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-06-28 22:33:24 +02:00
Linus Torvalds	5904b3b81d	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Fix undeclared ENOSYS in include/linux/tracepoint.h perf record: prevent kill(0, SIGTERM); perf session: Remove threads from tree on PERF_RECORD_EXIT perf/tracing: Fix regression of perf losing kprobe events perf_events: Fix Intel Westmere event constraints perf record: Don't call newt functions when not initialized	2010-06-28 12:24:43 -07:00
Linus Torvalds	ab8aadbda7	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, Calgary: Increase max PHB number x86: Fix rebooting on Dell Precision WorkStation T7400 x86: Fix vsyscall on gcc 4.5 with -Os x86, pat: Proper init of memtype subtree_max_end um, hweight: Fix UML boot crash due to x86 optimized hweight x86, setup: Set ax register in boot vga query percpu, x86: Avoid warnings of unused variables in per cpu x86, irq: Rename gsi_end gsi_top, and fix off by one errors x86: use __ASSEMBLY__ rather than __ASSEMBLER__	2010-06-28 12:06:25 -07:00
Darrick J. Wong	499a00e92d	x86, Calgary: Increase max PHB number Newer systems (x3950M2) can have 48 PHBs per chassis and 8 chassis, so bump the limits up and provide an explanation of the requirements for each class. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Acked-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Corinna Schultz <cschultz@linux.vnet.ibm.com> Cc: <stable@kernel.org> LKML-Reference: <20100624212647.GI15515@tux1.beaverton.ibm.com> [ v2: Fixed build bug, added back PHBS_PER_CALGARY == 4 ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-25 16:14:58 +02:00
Frederic Weisbecker	f7809daf64	x86: Support for instruction breakpoints Instruction breakpoints need to have a specific length of 0 to be working. Bring this support but also take care the user is not trying to set an unsupported length, like a range breakpoint for example. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Jason Wessel <jason.wessel@windriver.com>	2010-06-24 23:35:27 +02:00
Frederic Weisbecker	0c4519e825	x86: Set resume bit before returning from breakpoint exception Instruction breakpoints trigger before the instruction executes, and returning back from the breakpoint handler brings us again to the instruction that breakpointed. This naturally bring to a breakpoint recursion. To solve this, x86 has the Resume Bit trick. When the cpu flags have the RF flag set, the next instruction won't trigger any instruction breakpoint, and once this instruction is executed, RF is cleared back. This let's us jump back to the instruction that triggered the breakpoint without recursion. Use this when an instruction breakpoint triggers. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jason Wessel <jason.wessel@windriver.com>	2010-06-24 23:35:15 +02:00
Andres Salomon	75a9cac430	x86, olpc: Add comment about implicit optimization barrier Signed-off-by: Andres Salomon <dilinger@queued.net> Cc: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <20100618174653.7755a39a@dev.queued.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-24 11:40:47 +02:00
Thomas Backlund	890ffedc7c	x86: Fix rebooting on Dell Precision WorkStation T7400 Dell Precision WorkStation T7400 freezes on reboot unless reboot=b is used. Reference: https://qa.mandriva.com/show_bug.cgi?id=58017 Signed-off-by: Thomas Backlund <tmb@mandriva.org> LKML-Reference: <4C1CC6E9.6000701@mandriva.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-20 09:24:13 +02:00
Andres Salomon	fd699c7655	x86, olpc: Add support for calling into OpenFirmware Add support for saving OFW's cif, and later calling into it to run OFW commands. OFW remains resident in memory, living within virtual range 0xff800000 - 0xffc00000. A single page directory entry points to the pgdir that OFW actually uses, so rather than saving the entire page table, we grab and install that one entry permanently in the kernel's page table. This is currently only used by the OLPC XO. Note that this particular calling convention breaks PAE and PAT, and so cannot be used on newer x86 hardware. Signed-off-by: Andres Salomon <dilinger@queued.net> LKML-Reference: <20100618174653.7755a39a@dev.queued.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-06-18 14:54:36 -07:00
H. Peter Anvin	05d0b0889c	x86, vdso: Error out if the vdso contains external references The vdso is a piece of userspace code which is supposed to be fully self-contained. Any external (undefined) reference is an error, and should be caught at compile time. This was giving us trouble when compiling with -Os on gcc 4.5.0, for example (failed inline). The need to do a buildtime check was pointed out by Andi Kleen. Reported-by: Andi Kleen <andi@firstfloor.org> LKML-Reference: <tip-*@vger.kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-06-18 14:46:21 -07:00
Andi Kleen	124482935f	x86: Fix vsyscall on gcc 4.5 with -Os This fixes the -Os breaks with gcc 4.5 bug. rdtsc_barrier needs to be force inlined, otherwise user space will jump into kernel space and kill init. This also addresses http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44129 I believe. Signed-off-by: Andi Kleen <ak@linux.intel.com> LKML-Reference: <20100618210859.GA10913@basil.fritz.box> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@kernel.org>	2010-06-18 14:16:31 -07:00
Jiri Slaby	d7a0380dc3	x86-64, mm: Initialize VDSO earlier on 64 bits When initrd is in use and a driver does request_module() in its module_init (i.e. __initcall or device_initcall), a modprobe process is created with VDSO mapping. But VDSO is inited even in __initcall, i.e. on the same level (at the same time), so it may not be inited yet (link order matters). Move the VDSO initialization code earlier by switching to something before rootfs_initcall where initrd is loaded as rootfs. Specifically to subsys_initcall. Do it for standard 64-bit path (init_vdso_vars) and for compat (sysenter_setup), just in case people have 32-bit initrd and ia32 emulation built-in. i386 (pure 32-bit) is not affected, since sysenter_setup() is called from check_bugs()->identify_boot_cpu() in start_kernel() before rest_init()->kernel_thread(kernel_init) where even kernel_init() calls do_basic_setup()->do_initcalls(). What this patch fixes are early modprobe crashes such as: Unpacking initramfs... Freeing initrd memory: 9324k freed modprobe[368]: segfault at 7fff4429c020 ip 00007fef397e160c \ sp 00007fff442795c0 error 4 in ld-2.11.2.so[7fef397df000+1f000] Signed-off-by: Jiri Slaby <jslaby@suse.cz> LKML-Reference: <1276720242-13365-1-git-send-email-jslaby@suse.cz> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-06-18 13:48:14 -07:00
Robert Schöne	c882e0feb9	x86, perf: Add power_end event to process_*.c cpu_idle routine Systems using the idle thread from process_32.c and process_64.c do not generate power_end events which could be traced using perf. This patch adds the event generation for such systems. Signed-off-by: Robert Schoene <robert.schoene@tu-dresden.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1276515440.5441.45.camel@localhost> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-18 11:35:10 +02:00
Marcin Slusarz	8b8f79b927	x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages After every iounmap mmiotrace has to free kmmio_fault_pages, but it can't do it directly, so it defers freeing by RCU. It usually works, but when mmiotraced code calls ioremap-iounmap multiple times without sleeping between (so RCU won't kick in and start freeing) it can be given the same virtual address, so at every iounmap mmiotrace will schedule the same pages for release. Obviously it will explode on second free. Fix it by marking kmmio_fault_pages which are scheduled for release and not adding them second time. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Tested-by: Marcin Kocielnicki <koriakin@0x04.net> Tested-by: Shinpei KATO <shinpei@il.is.s.u-tokyo.ac.jp> Acked-by: Pekka Paalanen <pq@iki.fi> Cc: Stuart Bennett <stuart@freedesktop.org> Cc: Marcin Kocielnicki <koriakin@0x04.net> Cc: nouveau@lists.freedesktop.org Cc: <stable@kernel.org> LKML-Reference: <20100613215654.GA3829@joi.lan> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-18 11:30:09 +02:00
Ingo Molnar	646b1db495	Merge commit 'v2.6.35-rc3' into perf/core Merge reason: Go from -rc1 base to -rc3 base, merge in fixes.	2010-06-18 10:53:19 +02:00
Venkatesh Pallipadi	23016bf0d2	x86: Look for IA32_ENERGY_PERF_BIAS support The new IA32_ENERGY_PERF_BIAS MSR allows system software to give hardware a hint whether OS policy favors more power saving, or more performance. This allows the OS to have some influence on internal hardware power/performance tradeoffs where the OS has previously had no influence. The support for this feature is indicated by CPUID.06H.ECX.bit3, as documented in the Intel Architectures Software Developer's Manual. This patch discovers support of this feature and displays it as "epb" in /proc/cpuinfo. Signed-off-by: Venkatesh Pallipadi <venki@google.com> LKML-Reference: <alpine.LFD.2.00.1006032310160.6669@localhost.localdomain> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-06-16 13:37:32 -07:00
Jiri Kosina	f1bbbb6912	Merge branch 'master' into for-next	2010-06-16 18:08:13 +02:00
Uwe Kleine-König	421f91d21a	fix typos concerning "initiali[zs]e" Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-06-16 18:05:05 +02:00
Paul E. McKenney	ec8c27e04f	mce: convert to rcu_dereference_index_check() The mce processing applies rcu_dereference_check() to integers used as array indices. This patch therefore moves mce to the new RCU API rcu_dereference_index_check() that avoids the sparse processing that would otherwise result in compiler errors. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com>	2010-06-14 16:37:28 -07:00
Len Brown	42de5532f4	Merge branch 'bugzilla-13931-sleep-nvs' into release Conflicts: drivers/acpi/sleep.c Signed-off-by: Len Brown <len.brown@intel.com>	2010-06-12 01:15:40 -04:00
Linus Torvalds	7ae1277a52	Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / x86: Save/restore MISC_ENABLE register	2010-06-11 14:19:45 -07:00
Linus Torvalds	eda054770e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: clear bridge resource range if BIOS assigned bad one PCI: hotplug/cpqphp, fix NULL dereference Revert "PCI: create function symlinks in /sys/bus/pci/slots/N/" PCI: change resource collision messages from KERN_ERR to KERN_INFO	2010-06-11 14:15:44 -07:00
Venkatesh Pallipadi	6a4f3b5237	x86, pat: Proper init of memtype subtree_max_end subtree_max_end that was recently added to struct memtype was not getting properly initialized resulting in WARNING: kmemcheck: Caught 64-bit read from uninitialized memory in memtype_rb_augment_cb() reported here https://bugzilla.kernel.org/show_bug.cgi?id=16092 This change fixes the problem. Reported-by: Christian Casteyde <casteyde.christian@free.fr> Tested-by: Christian Casteyde <casteyde.christian@free.fr> Signed-off-by: Venkatesh Pallipadi <venki@google.com> LKML-Reference: <1276217101-11515-1-git-send-email-venki@google.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com>	2010-06-11 14:12:22 -07:00
Yinghai Lu	837c4ef13c	PCI: clear bridge resource range if BIOS assigned bad one Yannick found that video does not work with 2.6.34. The cause of this bug was that the BIOS had assigned the wrong range to the PCI bridge above the video device. Before 2.6.34 the kernel would have shrunk the size of the bridge window, but since `d65245c` PCI: don't shrink bridge resources the kernel will avoid shrinking BIOS ranges. So zero out the old range if we fail to claim it at boot time; this will cause us to allocate a new range at startup, restoring the 2.6.34 behavior. Fixes regression https://bugzilla.kernel.org/show_bug.cgi?id=16009. Reported-by: Yannick <yannick.roehlly@free.fr> Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-06-11 13:24:51 -07:00
Huang Ying	a2d7b0d485	x86, mce: Use HW_ERR in MCE handler Use HW_ERR printk prefix in MCE handler. To make it more explicit that this is hardware error instead of software error. Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1275978939.3444.668.camel@yhuang-dev.sh.intel.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-06-10 21:28:49 -07:00
Huang Ying	3c41758860	x86, mce: Fix MSR_IA32_MCI_CTL2 CMCI threshold setup It is reported that CMCI is not raised when number of corrected error reaches preset threshold. After inspection, it is found that MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch fixed it. Value of MCI_CTL2_CMCI_THRESHOLD_MASK is fixed according to x86_64 Software Developer's Manual too. Reported-by: Shaohui Zheng <shaohui.zheng@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1275977350.3444.660.camel@yhuang-dev.sh.intel.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-06-10 21:27:36 -07:00
Huang Ying	1f9a0bd498	x86, mce: Rename MSR_IA32_MCx_CTL2 value Rename CMCI_EN to MCI_CTL2_CMCI_EN and CMCI_THRESHOLD_MASK to MCI_CTL2_CMCI_THRESHOLD_MASK to make naming consistent. Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1275977348.3444.659.camel@yhuang-dev.sh.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-06-10 21:27:26 -07:00
Andi Kleen	cf3bdc29fc	x86, setup: Set ax register in boot vga query Catch missing conversion to the register structure "glove box" scheme. Found by gcc 4.6's new warnings. Signed-off-by: Andi Kleen <ak@linux.intel.com> LKML-Reference: <20100610111040.F1781B1A2B@basil.firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-06-10 15:24:29 -07:00
Andi Kleen	23b764d056	percpu, x86: Avoid warnings of unused variables in per cpu Avoid hundreds of warnings with a gcc 4.6 -Wall build. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-11 00:03:45 +02:00
Matthew Garrett	dd4c4f17d7	suspend: Move NVS save/restore code to generic suspend functionality Saving platform non-volatile state may be required for suspend to RAM as well as hibernation. Move it to more generic code. Signed-off-by: Matthew Garrett <mjg@redhat.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Tested-by: Maxim Levitsky <maximlevitsky@gmail.com> Signed-off-by: Len Brown <len.brown@intel.com>	2010-06-10 11:02:34 -04:00
Stephane Eranian	d11007703c	perf_events: Fix Intel Westmere event constraints Based on Intel Vol3b (March 2010), the event SNOOPQ_REQUEST_OUTSTANDING is restricted to counters 0,1 so update the event table for Intel Westmere accordingly. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: paulus@samba.org Cc: davem@davemloft.net Cc: fweisbec@gmail.com Cc: perfmon2-devel@lists.sf.net Cc: eranian@gmail.com Cc: <stable@kernel.org> # .34.x LKML-Reference: <4c10cb56.5120e30a.2eb4.ffffc3de@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-10 14:16:32 +02:00
Borislav Petkov	12d8a96128	x86, AMD: Extend support to future families Extend support to future families, and in particular: * extend direct mapping split of Tseg SMM area. * extend K8 flavored alternatives (NOPS). * rep movs* prefix is fast in ucode. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100602182921.GA21557@aftab> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-06-09 15:57:47 -07:00
Borislav Petkov	8cc1176e5d	x86, cacheinfo: Carve out L3 cache slot accessors This is in preparation for disabling L3 cache indices after having received correctable ECCs in the L3 cache. Now we allow for initial setting of a disabled index slot (write once) and deny writing new indices to it after it has been disabled. Also, we deny using both slots to disable one and the same index. Userspace can restore the previously disabled indices by rewriting those sysfs entries when booting. Cleanup and reorganize code while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100602161840.GI18327@aftab> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-06-09 15:57:41 -07:00
Dan Carpenter	d6d4d4205c	x86, xsave: Cleanup return codes in check_for_xstate() The places which call check_for_xstate() only care about zero or non-zero so this patch doesn't change how the code runs, but it's a cleanup. The main reason for this patch is that I'm looking for places which don't return -EFAULT for copy_from_user() failures. Signed-off-by: Dan Carpenter <error27@gmail.com> LKML-Reference: <20100603100746.GU5483@bicker> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com>	2010-06-09 15:57:36 -07:00
Eric W. Biederman	a4384df3e2	x86, irq: Rename gsi_end gsi_top, and fix off by one errors When I introduced the global variable gsi_end I thought gsi_end on io_apics was one past the end of the gsi range for the io_apic. After it was pointed out the the range on io_apics was inclusive I changed my global variable to match. That was a big mistake. Inclusive semantics without a range start cannot describe the case when no gsi's are allocated. Describing the case where no gsi's are allocated is important in sfi.c and mpparse.c so that we can assign gsi numbers instead of blindly copying the gsi assignments the BIOS has done as we do in the acpi case. To keep from getting the global variable confused with the gsi range end rename it gsi_top. To allow describing the case where no gsi's are allocated have gsi_top be one place the highest gsi number seen in the system. This fixes an off by one bug in sfi.c: Reported-by: jacob pan <jacob.jun.pan@linux.intel.com> This fixes the same off by one bug in mpparse.c: This fixes an off unreachable by one bug in acpi/boot.c:irq_to_gsi Reported-by: Yinghai <yinghai.lu@oracle.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> LKML-Reference: <m17hm9jre7.fsf_-_@fess.ebiederm.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-06-09 13:34:06 -07:00
Ingo Molnar	c726b61c6a	Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core	2010-06-09 18:55:57 +02:00
Avi Kivity	69325a1225	KVM: MMU: Remove user access when allowing kernel access to gpte.w=0 page If cr0.wp=0, we have to allow the guest kernel access to a page with pte.w=0. We do that by setting spte.w=1, since the host cr0.wp must remain set so the host can write protect pages. Once we allow write access, we must remove user access otherwise we mistakenly allow the user to write the page. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-06-09 18:48:37 +03:00
Marcelo Tosatti	3be2264be3	KVM: MMU: invalidate and flush on spte small->large page size change Always invalidate spte and flush TLBs when changing page size, to make sure different sized translations for the same address are never cached in a CPU's TLB. Currently the only case where this occurs is when a non-leaf spte pointer is overwritten by a leaf, large spte entry. This can happen after dirty logging is disabled on a memslot, for example. Noticed by Andrea. KVM-Stable-Tag Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-06-09 18:48:36 +03:00
Joerg Roedel	67ec660777	KVM: SVM: Implement workaround for Erratum 383 This patch implements a workaround for AMD erratum 383 into KVM. Without this erratum fix it is possible for a guest to kill the host machine. This patch implements the suggested workaround for hypervisors which will be published by the next revision guide update. [jan: fix overflow warning on i386] [xiao: fix unused variable warning] Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-06-09 18:47:20 +03:00
Joerg Roedel	fe5913e4e1	KVM: SVM: Handle MCEs early in the vmexit process This patch moves handling of the MC vmexits to an earlier point in the vmexit. The handle_exit function is too late because the vcpu might alreadry have changed its physical cpu. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-06-09 18:39:10 +03:00
Oleg Nesterov	018378c55b	x86: Unify save_stack_address() and save_stack_address_nosched() Cleanup. Factor the common code in save_stack_address() and save_stack_address_nosched(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> LKML-Reference: <20100603193243.GA31534@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-06-09 17:32:19 +02:00
Oleg Nesterov	147ec4d236	x86: Make save_stack_address() !CONFIG_FRAME_POINTER friendly If CONFIG_FRAME_POINTER=n, print_context_stack() shouldn't neglect the non-reliable addresses on stack, this is all we have if dump_trace(bp) is called with the wrong or zero bp. For example, /proc/pid/stack doesn't work if CONFIG_FRAME_POINTER=n. This patch obviously has no effect if CONFIG_FRAME_POINTER=y, otherwise it reverts `1650743c` "x86: don't save unreliable stack trace entries". Also, remove the unnecessary type-cast. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <20100603193239.GA31530@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-06-09 17:32:15 +02:00
Peter Zijlstra	e78505958c	perf: Convert perf_event to local_t Since now all modification to event->count (and ->prev_count and ->period_left) are local to a cpu, change then to local64_t so we avoid the LOCK'ed ops. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-09 11:12:37 +02:00
Peter Zijlstra	1996bda2a4	arch: Implement local64_t On 64bit, local_t is of size long, and thus we make local64_t an alias. On 32bit, we fall back to atomic64_t. (architecture can provide optimized 32-bit version) (This new facility is to be used by perf events optimizations.) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linux-arch@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-09 11:12:36 +02:00
Cyrill Gorcunov	68aa00ac0a	perf, x86: Make a second write to performance counter if needed On Netburst PMU we need a second write to a performance counter due to cpu erratum. A simple flag test instead of alternative instructions was choosen because wrmsrl is already a macro and if virtualization is turned on will need an additional wrapper call which is more expencise. nb: we should propably switch to jump-labels as only this facility reach the mainline. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100602212304.GC5264@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-09 11:12:35 +02:00
Peter Zijlstra	8d2cacbbb8	perf: Cleanup {start,commit,cancel}_txn details Clarify some of the transactional group scheduling API details and change it so that a successfull ->commit_txn also closes the transaction. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1274803086.5882.1752.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-09 11:12:34 +02:00
Frederic Weisbecker	b0f82b81fe	perf: Drop the skip argument from perf_arch_fetch_regs_caller Drop this argument now that we always want to rewind only to the state of the first caller. It means frame pointers are not necessary anymore to reliably get the source of an event. But this also means we need this helper to be a macro now, as an inline function is not an option since we need to know when to provide a default implentation. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>	2010-06-08 23:31:27 +02:00
Frederic Weisbecker	c9cf4dbb4d	x86: Unify dumpstack.h and stacktrace.h arch/x86/include/asm/stacktrace.h and arch/x86/kernel/dumpstack.h declare headers of objects that deal with the same topic. Actually most of the files that include stacktrace.h also include dumpstack.h Although dumpstack.h seems more reserved for internals of stack traces, those are quite often needed to define specialized stack trace operations. And perf event arch headers are going to need access to such low level operations anyway. So don't continue to bother with dumpstack.h as it's not anymore about isolated deep internals. v2: fix struct stack_frame definition conflict in sysprof Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Soeren Sandmann <sandmann@daimi.au.dk>	2010-06-08 23:29:52 +02:00
Cliff Wickman	f6d8a56693	x86, UV: Modularize BAU send and wait Streamline the large uv_flush_send_and_wait() function by use of a couple of helper functions. And remove some excess comments. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: gregkh@suse.de LKML-Reference: <E1OJvNy-0004ay-IH@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-08 21:13:48 +02:00
Cliff Wickman	450a007eeb	x86, UV: BAU broadcast to the local hub Make the Broadcast Assist Unit driver use the BAU for TLB shootdowns of cpu's on the local uvhub. It was previously thought that IPI might be faster to the cpu's on the local hub. But the IPI operation would have to follow the completion of the BAU broadcast anyway. So we broadcast to the local uvhub in all cases except when the current cpu was the only local cpu in the mask. This simplifies uv_flush_send_and_wait() in that it returns either all shootdowns complete, or none. Adjust the statistics to account for shootdowns on the local uvhub. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: gregkh@suse.de LKML-Reference: <E1OJvNy-0004aq-G7@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-08 21:13:48 +02:00
Cliff Wickman	7fba1bcd48	x86, UV: Correct BAU regular message type The Broadcast Assist Unit messages have a regular or retry message type. The regular type was not being set, but needs to be, because the lack of a message type is sometimes used to identify an unused entry in the message queue. Also removing some excess comments. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: gregkh@suse.de LKML-Reference: <E1OJvNy-0004ak-Dy@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-08 21:13:47 +02:00
Cliff Wickman	90cc7d9449	x86, UV: Remove BAU check for stay-busy Remove a faulty assumption that a long running BAU request has encountered a hardware problem and will never finish. Numalink congestion can make a request appear to have encountered such a problem, but it is not safe to cancel the request. If such a cancel is done but a reply is later received we can miss a TLB shootdown. We depend upon the max_bau_concurrent 'throttle' to prevent the stay-busy case from happening. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: gregkh@suse.de LKML-Reference: <E1OJvNy-0004ad-BV@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-08 21:13:47 +02:00
Cliff Wickman	a8328ee58c	x86, UV: Correct BAU discovery of hubs and sockets Correct the initialization-time assumption of contigous blade numbers and of sockets numbered from zero. There may be hubs present with no cpu's enabled. There may be disabled sockets such that the active socket is not number zero. And assign a 'socket master' by assuming that a socket is a node. (it is not safe to extract socket number from an apicid) Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: gregkh@suse.de LKML-Reference: <E1OJvNy-0004aW-9S@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-08 21:13:46 +02:00
Cliff Wickman	39847e7f3c	x86, UV: Correct BAU software acknowledge Correct the acknowledgment and the reset of a BAU software-acknowledged message. A retry message should be testing only for timed-out resources (mask << 8). (And we delete a log message that might cause unnecessary concern) The acknowledge MMR is \|--timed-out--\|---pending--\|, each is 8 bits. The IPI-driven reset of software acknowledge resources frees both timed out and pending resources. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: gregkh@suse.de LKML-Reference: <E1OJvNy-0004aP-7O@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-08 21:13:46 +02:00
Cliff Wickman	4faca15508	x86, UV: BAU structure rearranging Move some structure definitions from the C code to the BAU header file, and change the organization of that header file a little. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: gregkh@suse.de LKML-Reference: <E1OJvNy-0004aI-54@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-08 21:13:45 +02:00
Cliff Wickman	712157aa70	x86, UV: Shorten access to BAU statistics structure Use a pointer from the per-cpu BAU control structure to the per-cpu BAU statistics structure. We nearly always know the first before needing the second. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: gregkh@suse.de LKML-Reference: <E1OJvNy-0004aB-2k@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-08 21:13:45 +02:00
Cliff Wickman	50fb55acc5	x86, UV: Disable BAU on network congestion The numalink network can become so congested that TLB shootdown using the Broadcast Assist Unit becomes slower than using IPI's. In that case, disable the use of the BAU for a period of time. The period is tunable. When the period expires the use of the BAU is re-enabled. A count of these actions is added to the statistics file. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: gregkh@suse.de LKML-Reference: <E1OJvNy-0004a4-0a@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-08 21:13:45 +02:00
Cliff Wickman	e8e5e8a804	x86, UV: BAU tunables into a debugfs file Make the Broadcast Assist Unit driver's nine tuning values variable by making them accessible through a read/write debugfs file. The file will normally be mounted as /sys/kernel/debug/sgi_uv/bau_tunables. The tunables are kept in each cpu's per-cpu BAU structure. The patch also does a little name improvement, and corrects the reset of two destination timeout counters. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: gregkh@suse.de LKML-Reference: <E1OJvNx-0004Zx-Uo@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-08 21:13:44 +02:00
Cliff Wickman	12a6611fa1	x86, UV: Calculate BAU destination timeout Calculate the Broadcast Assist Unit's destination timeout period from the values in the relevant MMR's. Store it in each cpu's per-cpu BAU structure so that a destination timeout can be differentiated from a 'plugged' situation in which all software ack resources are already allocated and a timeout is pending. That case returns an immediate destination error. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: gregkh@suse.de LKML-Reference: <E1OJvNx-0004Zq-RK@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-08 21:13:44 +02:00
Livio Soares	e768aee89c	perf, x86: Small fix to cpuid10_edx Fixes to 'cpuid10_edx' to comply with Intel documentation. According to the Intel Manual, Volume 2A, Table 3-12, the cpuid for architecture performance monitoring returns, in EDX, two pieces of information: 1) Number of fixed-function counters (5 bits, not 4) 2) Width of fixed-function counters (8 bits) Signed-off-by: Livio Soares <livio@eecg.toronto.edu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-08 20:27:04 +02:00
Andres Salomon	fc4ac7a5f5	x86: use __ASSEMBLY__ rather than __ASSEMBLER__ As Ingo pointed out in a separate patch, we should be using __ASSEMBLY__. Make that the case in pgtable headers. Signed-off-by: Andres Salomon <dilinger@queued.net> LKML-Reference: <20100605114042.35ac69c1@dev.queued.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-06-07 17:27:11 -07:00
Ondrej Zary	85a0e75397	PM / x86: Save/restore MISC_ENABLE register Save/restore MISC_ENABLE register on suspend/resume. This fixes OOPS (invalid opcode) on resume from STR on Asus P4P800-VM, which wakes up with MWAIT disabled. Fixes https://bugzilla.kernel.org/show_bug.cgi?id=15385 Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Tested-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2010-06-08 00:32:49 +02:00
Alex Nixon	08bbc9da92	xen: Add xen_create_contiguous_region A memory region must be physically contiguous in order to be accessed through DMA. This patch adds xen_create_contiguous_region, which ensures a region of contiguous virtual memory is also physically contiguous. Based on Stephen Tweedie's port of the 2.6.18-xen version. Remove contiguous_bitmap[] as it's no longer needed. Ported from linux-2.6.18-xen.hg 707:e410857fd83c [ Impact: add Xen-internal API to make pages phys-contig ] Signed-off-by: Alex Nixon <alex.nixon@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-06-07 15:37:53 -04:00
Alex Nixon	19001c8c5b	xen: Rename the balloon lock * xen_create_contiguous_region needs access to the balloon lock to ensure memory doesn't change under its feet, so expose the balloon lock * Change the name of the lock to xen_reservation_lock, to imply it's now less-specific usage. [ Impact: cleanup ] Signed-off-by: Alex Nixon <alex.nixon@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-06-07 14:34:07 -04:00
Alex Nixon	7347b4082e	xen: Allow unprivileged Xen domains to create iomap pages PV DomU domains are allowed to map hardware MFNs for PCI passthrough, but are not generally allowed to map raw machine pages. In particular, various pieces of code try to map DMI and ACPI tables in the ISA ROM range. We disallow _PAGE_IOMAP for those mappings, so that they are redirected to a set of local zeroed pages we reserve for that purpose. [ Impact: prevent passthrough of ISA space, as we only allow PCI ] Signed-off-by: Alex Nixon <alex.nixon@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-06-07 14:33:13 -04:00
Jeremy Fitzhardinge	c0011dbfce	xen: use _PAGE_IOMAP in ioremap to do machine mappings In a Xen domain, ioremap operates on machine addresses, not pseudo-physical addresses. We use _PAGE_IOMAP to determine whether a mapping is intended for machine addresses. [ Impact: allow Xen domain to map real hardware ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-06-07 14:32:33 -04:00
Linus Torvalds	9a9620db07	Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: (83 commits) i7core_edac: Better describe the supported devices Add support for Westmere to i7core_edac driver i7core_edac: don't free on success i7core_edac: Add support for X5670 Always call i7core_[ur]dimm_check_mc_ecc_err i7core_edac: fix memory leak of i7core_dev EDAC: add __init to i7core_xeon_pci_fixup i7core_edac: Fix wrong device id for channel 1 devices i7core: add support for Lynnfield alternate address i7core_edac: Add initial support for Lynnfield i7core_edac: do not export static functions edac: fix i7core build edac: i7core_edac produces undefined behaviour on 32bit i7core_edac: Use a more generic approach for probing PCI devices i7core_edac: PCI device is called NONCORE, instead of NOCORE i7core_edac: Fix ringbuffer maxsize i7core_edac: First store, then increment i7core_edac: Better parse "any" addrmask i7core_edac: Use a lockless ringbuffer edac: Create an unique instance for each kobj ...	2010-06-04 15:39:54 -07:00
Robert Richter	d8a382d266	Merge remote branch 'tip/perf/urgent' into oprofile/urgent	2010-06-04 11:33:10 +02:00
Linus Torvalds	167b712904	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, smpboot: Fix cores per node printing on boot x86/amd-iommu: Fall back to GART if initialization fails x86/amd-iommu: Fix crash when request_mem_region fails x86/mm: Remove unused DBG() macro arch/x86/kernel: Add missing spin_unlock	2010-06-03 15:47:22 -07:00
Linus Torvalds	b01b7dc283	Merge branch 'for-linus/bugfixes' of git://xenbits.xensource.com/people/ianc/linux-2.6 * 'for-linus/bugfixes' of git://xenbits.xensource.com/people/ianc/linux-2.6: xen: avoid allocation causing potential swap activity on the resume path xen: ensure timer tick is resumed even on CPU driving the resume	2010-06-03 15:46:09 -07:00
Linus Torvalds	f150dba6d4	Merge branch 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix crash in swevents perf buildid-list: Fix --with-hits event processing perf scripts python: Give field dict to unhandled callback perf hist: fix objdump output parsing perf-record: Check correct pid when forking perf: Do the comm inheritance per thread in event__process_task perf: Use event__process_task from perf sched perf: Process comm events by tid blktrace: Fix new kernel-doc warnings perf_events: Fix unincremented buffer base on partial copy perf_events: Fix event scheduling issues introduced by transactional API perf_events, trace: Fix perf_trace_destroy(), mutex went missing perf_events, trace: Fix probe unregister race perf_events: Fix races in group composition perf_events: Fix races and clean up perf_event and perf_mmap_data interaction	2010-06-03 15:45:26 -07:00
Ian Campbell	cd52e17ea8	xen: ensure timer tick is resumed even on CPU driving the resume The core suspend/resume code is run from stop_machine on CPU0 but parts of the suspend/resume machinery (including xen_arch_resume) are run on whichever CPU happened to schedule the xenwatch kernel thread. As part of the non-core resume code xen_arch_resume is called in order to restart the timer tick on non-boot processors. The boot processor itself is taken care of by core timekeeping code. xen_arch_resume uses smp_call_function which does not call the given function on the current processor. This means that we can end up with one CPU not receiving timer ticks if the xenwatch thread happened to be scheduled on CPU > 0. Use on_each_cpu instead of smp_call_function to ensure the timer tick is resumed everywhere. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Stable Kernel <stable@kernel.org> # .32.x	2010-06-03 09:34:04 +01:00
Borislav Petkov	4adc8b71cc	x86, smpboot: Fix cores per node printing on boot Percpu initialization happens now after booting the cores on the machine and this causes them all to be displayed as belonging to node 0: Jun 8 05:57:21 kepek kernel: [ 0.106999] Booting Node 0, Processors #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 Ok. Use early_cpu_to_node() to get the correct node of each core instead. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Mike Travis <travis@sgi.com> LKML-Reference: <20100601190455.GA14237@aftab> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-02 09:38:53 +02:00
Linus Torvalds	1f73897861	Merge branch 'for-35' of git://repo.or.cz/linux-kbuild * 'for-35' of git://repo.or.cz/linux-kbuild: (81 commits) kbuild: Revert part of `e8d400a` to resolve a conflict kbuild: Fix checking of scm-identifier variable gconfig: add support to show hidden options that have prompts menuconfig: add support to show hidden options which have prompts gconfig: remove show_debug option gconfig: remove dbg_print_ptype() and dbg_print_stype() kconfig: fix zconfdump() kconfig: some small fixes add random binaries to .gitignore kbuild: Include gen_initramfs_list.sh and the file list in the .d file kconfig: recalc symbol value before showing search results .gitignore: ignore *.lzo files headerdep: perlcritic warning scripts/Makefile.lib: Align the output of LZO kbuild: Generate modules.builtin in make modules_install Revert "kbuild: specify absolute paths for cscope" kbuild: Do not unnecessarily regenerate modules.builtin headers_install: use local file handles headers_check: fix perl warnings export_report: fix perl warnings ...	2010-06-01 08:55:52 -07:00
Ingo Molnar	c8fcb14fec	Merge branch 'amd-iommu/2.6.35' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent	2010-06-01 11:45:45 +02:00
Joerg Roedel	d7f0776975	x86/amd-iommu: Fall back to GART if initialization fails This patch implements a fallback to the GART IOMMU if this is possible and the AMD IOMMU initialization failed. Otherwise the fallback would be nommu which is very problematic on machines with more than 4GB of memory or swiotlb which hurts io-performance. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2010-06-01 10:20:15 +02:00
Joerg Roedel	e82752d8b5	x86/amd-iommu: Fix crash when request_mem_region fails When request_mem_region fails the error path tries to disable the IOMMUs. This accesses the mmio-region which was not allocated leading to a kernel crash. This patch fixes the issue. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2010-06-01 10:03:08 +02:00
Joerg Roedel	1d61e73ab4	Merge commit 'v2.6.35-rc1' into amd-iommu/2.6.35	2010-06-01 09:57:49 +02:00
Akinobu Mita	e565813ab9	x86/mm: Remove unused DBG() macro DBG() macro for CONFIG_DEBUG_PER_CPU_MAPS is unused. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> LKML-Reference: <1274706291-13554-1-git-send-email-akinobu.mita@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-31 10:01:53 +02:00
Stephane Eranian	90151c35b1	perf_events: Fix event scheduling issues introduced by transactional API The transactional API patch between the generic and model-specific code introduced several important bugs with event scheduling, at least on X86. If you had pinned events, e.g., watchdog, and were over-committing the PMU, you would get bogus counts. The bug was showing up on Intel CPU because events would move around more often that on AMD. But the problem also existed on AMD, though harder to expose. The issues were: - group_sched_in() was missing a cancel_txn() in the error path - cpuc->n_added was not properly maintained, leading to missing actions in hw_perf_enable(), i.e., n_running being 0. You cannot update n_added until you know the transaction has succeeded. In case of failed transaction n_added was not adjusted back. - in case of failed transactions, event_sched_out() was called and eventually invoked x86_disable_event() to touch the HW reg. But with transactions, on X86, event_sched_in() does not touch HW registers, it simply collects events into a list. Thus, you could end up calling x86_disable_event() on a counter which did not correspond to the current event when idx != -1. The patch modifies the generic and X86 code to avoid all those problems. First, we keep track of the number of events added last. In case the transaction fails, we substract them from n_added. This approach is necessary (as opposed to delaying updates to n_added) because not all event updates use the transaction API, e.g., single events. Second, we encapsulate the event_sched_in() and event_sched_out() in group_sched_in() inside the transaction. That makes the operations symmetrical and you can also detect that you are inside a transaction and skip the HW reg access by checking cpuc->group_flag. With this patch, you can now overcommit the PMU even with pinned system-wide events present and still get valid counts. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1274796225.5882.1389.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-31 08:46:10 +02:00
Linus Torvalds	17d30ac077	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (47 commits) mfd: Rename twl5031 sih modules mfd: Storage class for timberdale should be before const qualifier mfd: Remove unneeded and dangerous clearing of clientdata mfd: New AB8500 driver gpio: Fix inverted rdc321x gpio data out registers mfd: Change rdc321x resources flags to IORESOURCE_IO mfd: Move pcf50633 irq related functions to its own file. mfd: Use threaded irq for pcf50633 mfd: pcf50633-adc: Fix potential race in pcf50633_adc_sync_read mfd: Fix pcf50633 bitfield logic in interrupt handler gpio: rdc321x needs to select MFD_CORE mfd: Use menuconfig for quicker config editing ARM: AB3550 board configuration and irq for U300 mfd: AB3550 core driver mfd: AB3100 register access change to abx500 API mfd: Renamed ab3100.h to abx500.h gpio: Add TC35892 GPIO driver mfd: Add Toshiba's TC35892 MFD core mfd: Delay to mask tsc irq in max8925 mfd: Remove incorrect wm8350 kfree ...	2010-05-30 09:13:08 -07:00
Linus Torvalds	021fad8b70	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, cpufeature: Unbreak compile with gcc 3.x x86, pat: Fix memory leak in free_memtype x86, k8: Fix section mismatch for powernowk8_exit() lib/atomic64_test: fix missing include of linux/kernel.h x86: remove last traces of quicklist usage x86, setup: Phoenix BIOS fixup is needed on Dell Inspiron Mini 1012 x86: "nosmp" command line option should force the system into UP mode arch/x86/pci: use kasprintf x86, apic: ack all pending irqs when crashed/on kexec	2010-05-30 09:06:13 -07:00
Linus Torvalds	35926ff5fb	Revert "cpusets: randomize node rotor used in cpuset_mem_spread_node()" This reverts commit `0ac0c0d0f8`, which caused cross-architecture build problems for all the wrong reasons. IA64 already added its own version of __node_random(), but the fact is, there is nothing architectural about the function, and the original commit was just badly done. Revert it, since no fix is forthcoming. Requested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-30 09:00:03 -07:00
Linus Torvalds	e4f2e5eaac	Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: intel_idle: native hardware cpuidle driver for latest Intel processors ACPI: acpi_idle: touch TS_POLLING only in the non-MWAIT case acpi_pad: uses MONITOR/MWAIT, so it doesn't need to clear TS_POLLING sched: clarify commment for TS_POLLING ACPI: allow a native cpuidle driver to displace ACPI cpuidle: make cpuidle_curr_driver static cpuidle: add cpuidle_unregister_driver() error check cpuidle: fail to register if !CONFIG_CPU_IDLE	2010-05-28 16:14:17 -07:00
Linus Torvalds	9a90e09854	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (27 commits) ACPI: Don't let acpi_pad needlessly mark TSC unstable drivers/acpi/sleep.h: Checkpatch cleanup ACPI: Minor cleanup eliminating redundant PMTIMER_TICKS to NS conversion ACPI: delete unused c-state promotion/demotion data strucutures ACPI: video: fix acpi_backlight=video ACPI: EC: Use kmemdup drivers/acpi: use kasprintf ACPI, APEI, EINJ injection parameters support Add x64 support to debugfs ACPI, APEI, Use ERST for persistent storage of MCE ACPI, APEI, Error Record Serialization Table (ERST) support ACPI, APEI, Generic Hardware Error Source memory error support ACPI, APEI, UEFI Common Platform Error Record (CPER) header Unified UUID/GUID definition ACPI Hardware Error Device (PNP0C33) support ACPI, APEI, PCIE AER, use general HEST table parsing in AER firmware_first setup ACPI, APEI, Document for APEI ACPI, APEI, EINJ support ACPI, APEI, HEST table parsing ACPI, APEI, APEI supporting infrastructure ...	2010-05-28 14:42:18 -07:00
Len Brown	d3b383338f	Merge branch 'ht-delete-2.6.35' into release	2010-05-28 16:20:35 -04:00
Len Brown	91dd696439	Merge branch 'acpi_enable' into release	2010-05-28 16:17:27 -04:00
Len Brown	dc1544ea5d	Merge branch 'bjorn-pci-root-v4-2.6.35' into release	2010-05-28 16:17:16 -04:00
Len Brown	e45b7fa230	sched: clarify commment for TS_POLLING TS_POLLING set tells the scheduler an idle_task will poll need_resched() to look for work. TS_POLLING clear tells resched_task() and wake_up_idle_cpu() that the remote CPU's idle_task is now sleeping in idle, and thus requires a reschedule interrupt notice work. Update the description of TS_POLLING to reflect how it works. "idle task polling need_resched, skip sending interrupt" Wordsmithing-by: Milton Miller <miltonm@bga.com> Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org>	2010-05-27 21:07:05 -04:00
Florian Fainelli	f52e62dcc1	x86: remove rdc321x_defs.h This file is replaced by a cleaner version with the adding of a MFD driver for the southbridge. Signed-off-by: Florian Fainelli <florian@openwrt.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2010-05-28 01:37:30 +02:00
Linus Torvalds	c5617b200a	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits) tracing: Add __used annotation to event variable perf, trace: Fix !x86 build bug perf report: Support multiple events on the TUI perf annotate: Fix up usage of the build id cache x86/mmiotrace: Remove redundant instruction prefix checks perf annotate: Add TUI interface perf tui: Remove annotate from popup menu after failure perf report: Don't start the TUI if -D is used perf: Fix getline undeclared perf: Optimize perf_tp_event_match() perf: Remove more code from the fastpath perf: Optimize the !vmalloc backed buffer perf: Optimize perf_output_copy() perf: Fix wakeup storm for RO mmap()s perf-record: Share per-cpu buffers perf-record: Remove -M perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig ...	2010-05-27 15:23:47 -07:00
H. Peter Anvin	1ba4f22c42	x86, cpufeature: Unbreak compile with gcc 3.x gcc 3 is too braindamaged to be able to compile static_cpu_has() -- apparently it can't tell that a constant passed to an inline function is still a constant -- so if we're using gcc 3, just use the dynamic test. This is bad for performance, but if you care about performance, don't use an ancient, known-to-optimize-poorly compiler. Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com> LKML-Reference: <4BF2FF82.7090005@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-05-27 12:02:00 -07:00
Lee Schermerhorn	e534c7c5f8	numa: x86_64: use generic percpu var numa_node_id() implementation x86 arch specific changes to use generic numa_node_id() based on generic percpu variable infrastructure. Back out x86's custom version of numa_node_id() Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:57 -07:00
Lee Schermerhorn	7281201922	numa: add generic percpu var numa_node_id() implementation Rework the generic version of the numa_node_id() function to use the new generic percpu variable infrastructure. Guard the new implementation with a new config option: CONFIG_USE_PERCPU_NUMA_NODE_ID. Archs which support this new implemention will default this option to 'y' when NUMA is configured. This config option could be removed if/when all archs switch over to the generic percpu implementation of numa_node_id(). Arch support involves: 1) converting any existing per cpu variable implementations to use this implementation. x86_64 is an instance of such an arch. 2) archs that don't use a per cpu variable for numa_node_id() will need to initialize the new per cpu variable "numa_node" as cpus are brought on-line. ia64 is an example. 3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g., when NUMA is configured. This is required because I have retained the old implementation by default to allow archs to be modified incrementally, as desired. Subsequent patches will convert x86_64 and ia64 to use this implemenation. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:57 -07:00
FUJITA Tomonori	1ef04370d8	asm-generic: remove ARCH_HAS_SG_CHAIN in scatterlist.h There are more architectures that don't support ARCH_HAS_SG_CHAIN than those that support it. This removes removes ARCH_HAS_SG_CHAIN in asm-generic/scatterlist.h and lets arhictectures to define it. It's clearer than defining ARCH_HAS_SG_CHAIN asm-generic/scatterlist.h and undefing it in arhictectures that don't support it. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:54 -07:00
Andrew Morton	4a14d84ea2	x86_32: use asm-generic/scatterlist.h Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:54 -07:00
FUJITA Tomonori	18e98307de	asm-generic: add NEED_SG_DMA_LENGTH to define sg_dma_len() There are only two ways to define sg_dma_len(); use sg->dma_length or sg->length. This patch introduces NEED_SG_DMA_LENGTH that enables architectures to choose sg->dma_length or sg->length. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:54 -07:00
FUJITA Tomonori	de006a071c	x86: remove unnecessary sync_single_range_* in swiotlb_dma_ops sync_single_range_for_cpu and sync_single_range_for_device hooks in swiotlb_dma_ops are unnecessary because sync_single_for_cpu and sync_single_for_device are used there. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:52 -07:00
Akinobu Mita	a94247e7fb	x86: convert cpu notifier to return encapsulate errno value By the previous modification, the cpu notifier can return encapsulate errno value. This converts the cpu notifiers for msr, cpuid, and therm_throt. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:48 -07:00
Jack Steiner	0ac0c0d0f8	cpusets: randomize node rotor used in cpuset_mem_spread_node() Some workloads that create a large number of small files tend to assign too many pages to node 0 (multi-node systems). Part of the reason is that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at node 0 for newly created tasks. This patch changes the rotor to be initialized to a random node number of the cpuset. [akpm@linux-foundation.org: fix layout] [Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration] Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Paul Menage <menage@google.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:44 -07:00
Julia Lawall	84fe6c19e4	arch/x86/kernel: Add missing spin_unlock Add a spin_unlock missing on the error path. The locks and unlocks are balanced in other functions, so it seems that the same should be the case here. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E1; @@ * spin_lock(E1,...); <+... when != E1 if (...) { ... when != E1 * return ...; } ...+> * spin_unlock(E1,...); // </smpl> Cc: stable@kernel.org Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2010-05-27 12:40:11 +02:00
Xiaotian Feng	20413f2716	x86, pat: Fix memory leak in free_memtype Reserve_memtype will allocate memory for new memtype, but in free_memtype, after the memtype erased from rbtree, the memory is not freed. Changes since V1: make rbt_memtype_erase return erased memtype so that it can be freed in free_memtype. [ hpa: not for -stable: 2.6.34 and earlier not affected ] Signed-off-by: Xiaotian Feng <dfeng@redhat.com> LKML-Reference: <1274838670-8731-1-git-send-email-dfeng@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Jack Steiner <steiner@sgi.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-05-26 11:26:04 -07:00
Linus Torvalds	13da9e200f	Revert "endian: #define __BYTE_ORDER" This reverts commit `b3b77c8cae`, which was also totally broken (see commit `0d2daf5cc8` that reverted the crc32 version of it). As reported by Stephen Rothwell, it causes problems on big-endian machines: > In file included from fs/jfs/jfs_types.h:33, > from fs/jfs/jfs_incore.h:26, > from fs/jfs/file.c:22: > fs/jfs/endian24.h:36:101: warning: "__LITTLE_ENDIAN" is not defined The kernel has never had that crazy "__BYTE_ORDER == __LITTLE_ENDIAN" model. It's not how we do things, and it isn't how we _should_ do things. So don't go there. Requested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-26 08:30:15 -07:00
Borislav Petkov	fe501f1e89	x86, k8: Fix section mismatch for powernowk8_exit() Fix the following warning: "WARNING: arch/x86/kernel/built-in.o(.exit.text+0x72): Section mismatch in reference from the function powernowk8_exit() to the variable .cpuinit.data:cpb_nb The function __exit powernowk8_exit() references a variable __cpuinitdata cpb_nb. This is often seen when error handling in the exit function uses functionality in the init path. The fix is often to remove the __cpuinitdata annotation of cpb_nb so it may be used outside an init section." Cc: <stable@kernel.org> Reported-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100525152858.GA24836@aftab> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-05-25 15:42:21 -07:00
Kay Sievers	578454ff7e	driver core: add devname module aliases to allow module on-demand auto-loading This adds: alias: devname:<name> to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the common and singleinstance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Ian Kent <raven@themaw.net> Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-05-25 15:08:26 -07:00
Carsten Emde	a321cedb12	drivers/hwmon/coretemp.c: get TjMax value from MSR The MSR IA32_TEMPERATURE_TARGET contains the TjMax value in the newer Intel processors. Signed-off-by: Huaxu Wan <huaxu.wan@linux.intel.com> Signed-off-by: Carsten Emde <C.Emde@osadl.org> Cc: Jean Delvare <khali@linux-fr.org> Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Yong Wang <yong.y.wang@linux.intel.com> Cc: Rudolf Marek <r.marek@assembler.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-25 08:07:07 -07:00
Joakim Tjernlund	b3b77c8cae	endian: #define __BYTE_ORDER Linux does not define __BYTE_ORDER in its endian header files which makes some header files bend backwards to get at the current endian. Lets #define __BYTE_ORDER in big_endian.h/litte_endian.h to make it easier for header files that are used in user space too. In userspace the convention is that 1. _both_ __LITTLE_ENDIAN and __BIG_ENDIAN are defined, 2. you have to test for e.g. __BYTE_ORDER == __BIG_ENDIAN. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-25 08:07:02 -07:00
Peter Zijlstra	87f44bbc24	perf, trace: Fix !x86 build bug Patch `b7e2ecef92` (perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction) made the unfortunate mistake of assuming the world is x86 only, correct this. The problem was that perf_fetch_caller_regs() did local_save_flags() into regs->flags, and I re-used that to remove another local_save_flags(), forgetting !x86 doesn't have regs->flags. Do the reverse, remove the local_save_flags() from perf_fetch_caller_regs() and let the ftrace site do the local_save_flags() instead. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Cc: acme@redhat.com Cc: efault@gmx.de Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org LKML-Reference: <1274778175.5882.623.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-25 11:28:49 +02:00
Peter Zijlstra	48691ff86d	x86: remove last traces of quicklist usage We still have a stray quicklist header included even though we axed quicklist usage quite a while back. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <201005241913.o4OJDJe9010881@imap1.linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-05-24 13:33:31 -07:00
Gabor Gombas	3d6e77a3dd	x86, setup: Phoenix BIOS fixup is needed on Dell Inspiron Mini 1012 The low-memory corruption checker triggers during suspend/resume, so we need to reserve the low 64k. Don't be fooled that the BIOS identifies itself as "Dell Inc.", it's still Phoenix BIOS. [ hpa: I think we blacklist almost every BIOS in existence. We should either change this to a whitelist or just make it unconditional. ] Signed-off-by: Gabor Gombas <gombasg@digikabel.hu> LKML-Reference: <201005241913.o4OJDIMM010877@imap1.linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@kernel.org>	2010-05-24 13:33:14 -07:00
Jan Beulich	5f2eb55026	x86: "nosmp" command line option should force the system into UP mode Bits set in cpu_possible_mask prior to the execution of prefill_possible_map() (i.e. when parsing ACPI or MPS tables) would prevent the SMP alternatives logic from switching to UP mode, plus unnecessary setup of per-CPU data for CPUs that can never come online. Additionally, without CONFIG_HOTPLUG_CPU disabled CPUs can never come online, and hence setting cpu_possible_mask bits for them is again a simple waste of resources. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <201005241913.o4OJDH3Z010874@imap1.linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-05-24 13:31:52 -07:00
Julia Lawall	b46fc5f235	arch/x86/pci: use kasprintf kasprintf combines kmalloc and sprintf, and takes care of the size calculation itself. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression a,flag; expression list args; statement S; @@ a = - $kmalloc\\|kzalloc$(...,flag) + kasprintf(flag,args) <... when != a if (a == NULL \|\| ...) S ...> - sprintf(a,args); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> LKML-Reference: <201005241913.o4OJDG3R010871@imap1.linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-05-24 13:31:45 -07:00
Kerstin Jonsson	8c3ba8d049	x86, apic: ack all pending irqs when crashed/on kexec When the SMP kernel decides to crash_kexec() the local APICs may have pending interrupts in their vector tables. The setup routine for the local APIC has a deficient mechanism for clearing these interrupts, it only handles interrupts that has already been dispatched to the local core for servicing (the ISR register) safely, it doesn't consider lower prioritized queued interrupts stored in the IRR register. If you have more than one pending interrupt within the same 32 bit word in the LAPIC vector table registers you may find yourself entering the IO APIC setup with pending interrupts left in the LAPIC. This is a situation for wich the IO APIC setup is not prepared. Depending of what/which interrupt vector/vectors are stuck in the APIC tables your system may show various degrees of malfunctioning. That was the reason why the check_timer() failed in our system, the timer interrupts was blocked by pending interrupts from the old kernel when routed trough the IO APIC. Additional comment from Jiri Bohac: ============== If this should go into stable release, I'd add some kind of limit on the number of iterations, just to be safe from hard to debug lock-ups: +if (loops++ > MAX_LOOPS) { + printk("LAPIC pending clean-up") + break; +} while (queued); with MAX_LOOPS something like 1E9 this would leave plenty of time for the pending IRQs to be cleared and would and still cause at most a second of delay if the loop were to lock-up for whatever reason. [trenn@suse.de: V2: Use tsc if avail to bail out after 1 sec due to possible virtual apic_read calls which may take rather long (suggested by: Avi Kivity <avi@redhat.com>) If no tsc is available bail out quickly after cpu_khz, if we broke out too early and still have irqs pending (which should never happen?) we still get a WARN_ON... V3: - Fixed indentation -> checkpatch clean - max_loops must be signed V4: - Fix typo, mixed up tsc and ntsc in first rdtscll() call V5: Adjust WARN_ON() condition to also catch error in cpu_has_tsc case] Cc: <jbohac@novell.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Kerstin Jonsson <kerstin.jonsson@ericsson.com> Cc: Avi Kivity <avi@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Tested-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Thomas Renninger <trenn@suse.de> LKML-Reference: <201005241913.o4OJDGWM010865@imap1.linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-05-24 13:31:35 -07:00
Akinobu Mita	841bca1393	x86/mmiotrace: Remove redundant instruction prefix checks Get rid of the duplicated entries in prefix_codes[] to eliminate redundant checks by skip_prefix(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Pekka Paalanen <pq@iki.fi> LKML-Reference: <1274140110-5841-1-git-send-email-akinobu.mita@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-23 11:02:43 +02:00
Linus Torvalds	6109e2ce26	Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (36 commits) PCI: hotplug: pciehp: Removed check for hotplug of display devices PCI: read memory ranges out of Broadcom CNB20LE host bridge PCI: Allow manual resource allocation for PCI hotplug bridges x86/PCI: make ACPI MCFG reserved error messages ACPI specific PCI hotplug: Use kmemdup PM/PCI: Update PCI power management documentation PCI: output FW warning in pci_read/write_vpd PCI: fix typos pci_device_dis/enable to pci_dis/enable_device in comments PCI quirks: disable msi on AMD rs4xx internal gfx bridges PCI: Disable MSI for MCP55 on P5N32-E SLI x86/PCI: irq and pci_ids patch for additional Intel Cougar Point DeviceIDs PCI: aerdrv: trivial cleanup for aerdrv_core.c PCI: aerdrv: trivial cleanup for aerdrv.c PCI: aerdrv: introduce default_downstream_reset_link PCI: aerdrv: rework find_aer_service PCI: aerdrv: remove is_downstream PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS PCI: aerdrv: redefine PCI_ERR_ROOT_*_SRC PCI: aerdrv: rework do_recovery PCI: aerdrv: rework get_e_source() ...	2010-05-21 18:58:52 -07:00
Linus Torvalds	98edb6ca41	Merge branch 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits) KVM: x86: Add missing locking to arch specific vcpu ioctls KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls KVM: MMU: Segregate shadow pages with different cr0.wp KVM: x86: Check LMA bit before set_efer KVM: Don't allow lmsw to clear cr0.pe KVM: Add cpuid.txt file KVM: x86: Tell the guest we'll warn it about tsc stability x86, paravirt: don't compute pvclock adjustments if we trust the tsc x86: KVM guest: Try using new kvm clock msrs KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID KVM: x86: add new KVMCLOCK cpuid feature KVM: x86: change msr numbers for kvmclock x86, paravirt: Add a global synchronization point for pvclock x86, paravirt: Enable pvclock flags in vcpu_time_info structure KVM: x86: Inject #GP with the right rip on efer writes KVM: SVM: Don't allow nested guest to VMMCALL into host KVM: x86: Fix exception reinjection forced to true KVM: Fix wallclock version writing race KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots KVM: VMX: enable VMXON check with SMX enabled (Intel TXT) ...	2010-05-21 17:16:21 -07:00
Linus Torvalds	27a3353a45	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86: (32 commits) Move N014, N051 and CR620 dmi information to load scm dmi table drivers/platform/x86/eeepc-wmi.c: fix build warning X86 platfrom wmi: Add debug facility to dump WMI data in a readable way X86 platform wmi: Also log GUID string when an event happens and debug is set X86 platform wmi: Introduce debug param to log all WMI events Clean up all objects used by scm model when driver initial fail or exit msi-laptop: fix up some coding style issues found by checkpatch msi-laptop: Add i8042 filter to sync sw state with BIOS when function key pressed msi-laptop: Set rfkill init state when msi-laptop intiial msi-laptop: Add MSI CR620 notebook dmi information to scm models table msi-laptop: Add N014 N051 dmi information to scm models table drivers/platform/x86: Use kmemdup drivers/platform/x86: Use kzalloc drivers/platform/x86: Clarify the MRST IPC driver description slightly eeepc-wmi: depends on BACKLIGHT_CLASS_DEVICE IPC driver for Intel Mobile Internet Device (MID) platforms classmate-laptop: Add RFKILL support. thinkpad-acpi: document backlight level writeback at driver init thinkpad-acpi: clean up ACPI handles handling thinkpad-acpi: don't depend on led_path for led firmware type (v2) ...	2010-05-21 17:13:24 -07:00
Linus Torvalds	2a8ba8f032	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (46 commits) random: simplify fips mode crypto: authenc - Fix cryptlen calculation crypto: talitos - add support for sha224 crypto: talitos - add hash algorithms crypto: talitos - second prepare step for adding ahash algorithms crypto: talitos - prepare for adding ahash algorithms crypto: n2 - Add Niagara2 crypto driver crypto: skcipher - Add ablkcipher_walk interfaces crypto: testmgr - Add testing for async hashing and update/final crypto: tcrypt - Add speed tests for async hashing crypto: scatterwalk - Fix scatterwalk_done() test crypto: hifn_795x - Rename ablkcipher_walk to hifn_cipher_walk padata: Use get_online_cpus/put_online_cpus in padata_free padata: Add some code comments padata: Flush the padata queues actively padata: Use a timer to handle remaining objects in the reorder queues crypto: shash - Remove usage of CRYPTO_MINALIGN crypto: mv_cesa - Use resource_size crypto: omap - OMAP macros corrected padata: Use get_online_cpus/put_online_cpus ... Fix up conflicts in arch/arm/mach-omap2/devices.c	2010-05-21 14:46:51 -07:00
Ira W. Snyder	3f6ea84a30	PCI: read memory ranges out of Broadcom CNB20LE host bridge Read the memory ranges behind the Broadcom CNB20LE host bridge out of the hardware. This allows PCI hotplugging to work, since we know which memory range to allocate PCI BAR's from. The x86 PCI code automatically prefers the ACPI _CRS information when it is available. In that case, this information is not used. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-05-21 14:43:46 -07:00
Linus Torvalds	59534f7298	Merge branch 'drm-for-2.6.35' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-for-2.6.35' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (207 commits) drm/radeon/kms/pm/r600: select the mid clock mode for single head low profile drm/radeon: fix power supply kconfig interaction. drm/radeon/kms: record object that have been list reserved drm/radeon: AGP memory is only I/O if the aperture can be mapped by the CPU. drm/radeon/kms: don't default display priority to high on rs4xx drm/edid: fix typo in 1600x1200@75 mode drm/nouveau: fix i2c-related init table handlers drm/nouveau: support init table i2c device identifier 0x81 drm/nouveau: ensure we've parsed i2c table entry for INIT_I2C handlers drm/nouveau: display error message for any failed init table opcode drm/nouveau: fix init table handlers to return proper error codes drm/nv50: support fractional feedback divider on newer chips drm/nv50: fix monitor detection on certain chipsets drm/nv50: store full dcb i2c entry from vbios drm/nv50: fix suspend/resume with DP outputs drm/nv50: output calculated crtc pll when debugging on drm/nouveau: dump pll limits entries when debugging is on drm/nouveau: bios parser fixes for eDP boards drm/nouveau: fix a nouveau_bo dereference after it's been destroyed drm/nv40: remove some completed ctxprog TODOs ...	2010-05-21 11:14:52 -07:00
Jason Wessel	61eaf539b9	earlyprintk,vga,kdb: Fix \b and \r for earlyprintk=vga with kdb Allow kdb to work properly with with earlyprintk=vga by interpreting the backspace and carriage return output characters. These interpretation of these characters is used for simple line editing provided in the kdb shell. CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: H. Peter Anvin <hpa@zytor.com> CC: x86@kernel.org Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-05-20 21:04:31 -05:00
Jason Wessel	0bb9fef913	x86,early dr regs,kgdb: Allow kernel debugger early dr register access If the kernel debugger was configured, attached and started with kgdbwait, the hardware breakpoint registers should get restored by the kgdb code which is managing the dr registers. CC: x86@kernel.org CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-05-20 21:04:30 -05:00
Jason Wessel	031acd8c42	x86,kgdb: Implement early hardware breakpoint debugging It is not possible to use the hw_breakpoint.c API prior to mm_init(), but it is possible to use hardware breakpoints with the kernel debugger. Prior to smp_init() it is possible to simply write to the dr registers of the boot cpu directly. This can be used up until the kgdb_arch_late() is invoked, at which point the standard hw_breakpoint.c API will get used. CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-05-20 21:04:30 -05:00
Jason Wessel	0b4b3827db	x86, kgdb, init: Add early and late debug states The kernel debugger can operate well before mm_init(), but the x86 hardware breakpoint code which uses the perf api requires that the kernel allocators are initialized. This means the kernel debug core needs to provide an optional arch specific call back to allow the initialization functions to run after the kernel has been further initialized. The kdb shell already had a similar restriction with an early initialization and late initialization. The kdb_init() was moved into the debug core's version of the late init which is called dbg_late_init(); CC: kgdb-bugreport@lists.sourceforge.net Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-05-20 21:04:29 -05:00
Jan Kiszka	29c843912a	x86, kgdb: early trap init for early debug Allow the x86 arch to have early exception processing for the purpose of debugging via the kgdb. Signed-off-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-05-20 21:04:29 -05:00
Jason Wessel	f503b5ae53	x86,kgdb: Add low level debug hook The only way the debugger can handle a trap in inside rcu_lock, notify_die, or atomic_notifier_call_chain without a triple fault is to have a low level "first opportunity handler" in the int3 exception handler. Generally this will be something the vast majority of folks will not need, but for those who need it, it is added as a kernel .config option called KGDB_LOW_LEVEL_TRAP. CC: Ingo Molnar <mingo@elte.hu> CC: Thomas Gleixner <tglx@linutronix.de> CC: H. Peter Anvin <hpa@zytor.com> CC: x86@kernel.org Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-05-20 21:04:25 -05:00
Jason Wessel	98ec1878ca	kgdb: remove post_primary_code references Remove all the references to the kgdb_post_primary_code. This function serves no useful purpose because you can obtain the same information from the "struct kgdb_state *ks" from with in the debugger, if for some reason you want the data. Also remove the unintentional duplicate assignment for ks->ex_vector. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-05-20 21:04:25 -05:00
Jason Wessel	dcc7871128	kgdb: core changes to support kdb These are the minimum changes to the kgdb core in order to enable an API to connect a new front end (kdb) to the debug core. This patch introduces the dbg_kdb_mode variable controls where the user level I/O is routed. It will be routed to the gdbstub (kgdb) or to the kdb front end which is a simple shell available over the kgdboc connection. You can switch back and forth between kdb or the gdb stub mode of operation dynamically. From gdb stub mode you can blindly type "$3#33", or from the kdb mode you can enter "kgdb" to switch to the gdb stub. The logic in the debug core depends on kdb to look for the typical gdb connection sequences and return immediately with KGDB_PASS_EVENT if a gdb serial command sequence is detected. That should allow a reasonably seamless transition between kdb -> gdb without leaving the kernel exception state. The two gdb serial queries that kdb is responsible for detecting are the "?" and "qSupported" packets. CC: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Martin Hicks <mort@sgi.com>	2010-05-20 21:04:21 -05:00
Linus Torvalds	04afb40593	Merge branch 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (22 commits) ACPI: fix early DSDT dmi check warnings on ia64 ACPICA: Update version to 20100428. ACPICA: Update/clarify some parameter names associated with acpi_handle ACPICA: Rename acpi_ex_system_do_suspend->acpi_ex_system_do_sleep ACPICA: Prevent possible allocation overrun during object copy ACPICA: Split large file, evgpeblk ACPICA: Add GPE support for dynamically loaded ACPI tables ACPICA: Clarify/rename some root table descriptor fields ACPICA: Update version to 20100331. ACPICA: Minimize the differences between linux GPE code and ACPICA code base ACPI: add boot option acpi=copy_dsdt to fix corrupt DSDT ACPICA: Update DSDT copy/detection. ACPICA: Add subsystem option to force copy of DSDT to local memory ACPICA: Add detection of corrupted/replaced DSDT ACPICA: Add write support for DataTable operation regions ACPICA: Fix for acpi_reallocate_root_table for incorrect root table copy ACPICA: Update comments/headers, no functional change ACPICA: Update version to 20100304 ACPICA: Fix for possible fault in acpi_ex_release_mutex ACPICA: Standardize integer output for ACPICA warnings/errors ...	2010-05-20 09:45:38 -07:00
Linus Torvalds	f39d01be4c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (44 commits) vlynq: make whole Kconfig-menu dependant on architecture add descriptive comment for TIF_MEMDIE task flag declaration. EEPROM: max6875: Header file cleanup EEPROM: 93cx6: Header file cleanup EEPROM: Header file cleanup agp: use NULL instead of 0 when pointer is needed rtc-v3020: make bitfield unsigned PCI: make bitfield unsigned jbd2: use NULL instead of 0 when pointer is needed cciss: fix shadows sparse warning doc: inode uses a mutex instead of a semaphore. uml: i386: Avoid redefinition of NR_syscalls fix "seperate" typos in comments cocbalt_lcdfb: correct sections doc: Change urls for sparse Powerpc: wii: Fix typo in comment i2o: cleanup some exit paths Documentation/: it's -> its where appropriate UML: Fix compiler warning due to missing task_struct declaration UML: add kernel.h include to signal.c ...	2010-05-20 09:20:59 -07:00
Ingo Molnar	dfacc4d6c9	Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core	2010-05-20 14:38:55 +02:00
Huang Ying	482908b49e	ACPI, APEI, Use ERST for persistent storage of MCE Traditionally, fatal MCE will cause Linux print error log to console then reboot. Because MCE registers will preserve their content after warm reboot, the hardware error can be logged to disk or network after reboot. But system may fail to warm reboot, then you may lose the hardware error log. ERST can help here. Through saving the hardware error log into flash via ERST before go panic, the hardware error log can be gotten from the flash after system boot successful again. The fatal MCE processing procedure with ERST involved is as follow: - Hardware detect error, MCE raised - MCE read MCE registers, check error severity (fatal), prepare error record - Write MCE error record into flash via ERST - Go panic, then trigger system reboot - System reboot, /sbin/mcelog run, it reads /dev/mcelog to check flash for error record of previous boot via ERST, and output and clear them if available - /sbin/mcelog logs error records into disk or network ERST only accepts CPER record format, but there is no pre-defined CPER section can accommodate all information in struct mce, so a customized section type is defined to hold struct mce inside a CPER record as an error section. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2010-05-19 22:41:40 -04:00
Huang Ying	d334a49113	ACPI, APEI, Generic Hardware Error Source memory error support Generic Hardware Error Source provides a way to report platform hardware errors (such as that from chipset). It works in so called "Firmware First" mode, that is, hardware errors are reported to firmware firstly, then reported to Linux by firmware. This way, some non-standard hardware error registers or non-standard hardware link can be checked by firmware to produce more valuable hardware error information for Linux. Now, only SCI notification type and memory errors are supported. More notification type and hardware error type will be added later. These memory errors are reported to user space through /dev/mcelog via faking a corrected Machine Check, so that the error memory page can be offlined by /sbin/mcelog if the error count for one page is beyond the threshold. On some machines, Machine Check can not report physical address for some corrected memory errors, but GHES can do that. So this simplified GHES is implemented firstly. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2010-05-19 22:41:16 -04:00
Linus Torvalds	6fa0fddd5f	Merge branch 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hpet: Add reference to chipset erratum documentation for disable-hpet-msi-quirk x86, hpet: Restrict read back to affected ATI chipsets	2010-05-19 17:10:24 -07:00
Linus Torvalds	7d02093e29	Merge branch 'timers-for-linus-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: avr32: Fix typo in read_persistent_clock() sparc: Convert sparc to use read/update_persistent_clock cris: Convert cris to use read/update_persistent_clock m68k: Convert m68k to use read/update_persistent_clock m32r: Convert m32r to use read/update_peristent_clock blackfin: Convert blackfin to use read/update_persistent_clock ia64: Convert ia64 to use read/update_persistent_clock avr32: Convert avr32 to use read/update_persistent_clock h8300: Convert h8300 to use read/update_persistent_clock frv: Convert frv to use read/update_persistent_clock mn10300: Convert mn10300 to use read/update_persistent_clock alpha: Convert alpha to use read/update_persistent_clock xtensa: Fix unnecessary setting of xtime time: Clean up direct xtime usage in xen	2010-05-19 17:10:06 -07:00
H. Peter Anvin	14671386dc	x86, mrst: make mrst_timer_options an enum We have an enum mrst_timer_options, use it so that the kernel knows if we're missing something from a switch statement or equivalent. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <1274295685-6774-4-git-send-email-jacob.jun.pan@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>	2010-05-19 14:37:40 -07:00
H. Peter Anvin	a75af580bb	x86, mrst: make mrst_identify_cpu() an inline returning enum We have an enum, might as well use it. While we're at it, make it an inline... there is really no point in calling a function for this stuff. LKML-Reference: <1274295685-6774-3-git-send-email-jacob.jun.pan@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>	2010-05-19 13:47:11 -07:00
Jacob Pan	a875c01944	x86, mrst: add more timer config options Always-on local APIC timer (ARAT) has been introduced to Medfield, along with the platform APB timers we have more timer configuration options between Moorestown and Medfield. This patch adds run-time detection of avaiable timer features so that we can treat Medfield as a variant of Moorestown and set up the optimal timer options for each platform. i.e. Medfield: per cpu always-on local APIC timer Moorestown: per cpu APB timer Manual override is possible via cmdline option x86_mrst_timer. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> LKML-Reference: <1274295685-6774-4-git-send-email-jacob.jun.pan@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-05-19 13:45:39 -07:00
Jacob Pan	a0c173bd8a	x86, mrst: add cpu type detection Medfield is the follow-up of Moorestown, it is treated under the same HW sub-architecture. However, we do need to know the CPU type in order for some of the driver to act accordingly. We also have different optimal clock configuration for each CPU type. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> LKML-Reference: <1274295685-6774-3-git-send-email-jacob.jun.pan@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-05-19 13:32:29 -07:00
Jacob Pan	1dedefd1a0	x86: detect scattered cpuid features earlier Some extra CPU features such as ARAT is needed in early boot so that x86_init function pointers can be set up properly. http://lkml.org/lkml/2010/5/18/519 At start_kernel() level, this patch moves init_scattered_cpuid_features() from check_bugs() to setup_arch() -> early_cpu_init() which is earlier than platform specific x86_init layer setup. Suggested by HPA. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> LKML-Reference: <1274295685-6774-2-git-send-email-jacob.jun.pan@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-05-19 13:32:12 -07:00
Avi Kivity	8fbf065d62	KVM: x86: Add missing locking to arch specific vcpu ioctls Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-19 11:41:11 +03:00
Avi Kivity	3dbe141595	KVM: MMU: Segregate shadow pages with different cr0.wp When cr0.wp=0, we may shadow a gpte having u/s=1 and r/w=0 with an spte having u/s=0 and r/w=1. This allows excessive access if the guest sets cr0.wp=1 and accesses through this spte. Fix by making cr0.wp part of the base role; we'll have different sptes for the two cases and the problem disappears. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:41:09 +03:00
Sheng Yang	a3d204e285	KVM: x86: Check LMA bit before set_efer kvm_x86_ops->set_efer() would execute vcpu->arch.efer = efer, so the checking of LMA bit didn't work. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:41:09 +03:00
Avi Kivity	f78e917688	KVM: Don't allow lmsw to clear cr0.pe The current lmsw implementation allows the guest to clear cr0.pe, contrary to the manual, which breaks EMM386.EXE. Fix by ORing the old cr0.pe with lmsw's operand. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:41:08 +03:00
Glauber Costa	371bcf646d	KVM: x86: Tell the guest we'll warn it about tsc stability This patch puts up the flag that tells the guest that we'll warn it about the tsc being trustworthy or not. By now, we also say it is not. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:41:06 +03:00
Glauber Costa	3a0d7256a6	x86, paravirt: don't compute pvclock adjustments if we trust the tsc If the HV told us we can fully trust the TSC, skip any correction Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:41:05 +03:00
Glauber Costa	838815a787	x86: KVM guest: Try using new kvm clock msrs We now added a new set of clock-related msrs in replacement of the old ones. In theory, we could just try to use them and get a return value indicating they do not exist, due to our use of kvm_write_msr_save. However, kvm clock registration happens very early, and if we ever try to write to a non-existant MSR, we raise a lethal #GP, since our idt handlers are not in place yet. So this patch tests for a cpuid feature exported by the host to decide which set of msrs are supported. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:41:04 +03:00
Glauber Costa	84478c829d	KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID Right now, we were using individual KVM_CAP entities to communicate userspace about which cpuids we support. This is suboptimal, since it generates a delay between the feature arriving in the host, and being available at the guest. A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID. This makes userspace automatically aware of what we provide. And if we ever add a new cpuid bit in the future, we have to do that again, which create some complexity and delay in feature adoption. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:41:03 +03:00
Glauber Costa	0e6ac58acb	KVM: x86: add new KVMCLOCK cpuid feature This cpuid, KVM_CPUID_CLOCKSOURCE2, will indicate to the guest that kvmclock is available through a new set of MSRs. The old ones are deprecated. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:41:02 +03:00
Glauber Costa	11c6bffa42	KVM: x86: change msr numbers for kvmclock Avi pointed out a while ago that those MSRs falls into the pentium PMU range. So the idea here is to add new ones, and after a while, deprecate the old ones. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:41:01 +03:00
Glauber Costa	489fb490db	x86, paravirt: Add a global synchronization point for pvclock In recent stress tests, it was found that pvclock-based systems could seriously warp in smp systems. Using ingo's time-warp-test.c, I could trigger a scenario as bad as 1.5mi warps a minute in some systems. (to be fair, it wasn't that bad in most of them). Investigating further, I found out that such warps were caused by the very offset-based calculation pvclock is based on. This happens even on some machines that report constant_tsc in its tsc flags, specially on multi-socket ones. Two reads of the same kernel timestamp at approx the same time, will likely have tsc timestamped in different occasions too. This means the delta we calculate is unpredictable at best, and can probably be smaller in a cpu that is legitimately reading clock in a forward ocasion. Some adjustments on the host could make this window less likely to happen, but still, it pretty much poses as an intrinsic problem of the mechanism. A while ago, I though about using a shared variable anyway, to hold clock last state, but gave up due to the high contention locking was likely to introduce, possibly rendering the thing useless on big machines. I argue, however, that locking is not necessary. We do a read-and-return sequence in pvclock, and between read and return, the global value can have changed. However, it can only have changed by means of an addition of a positive value. So if we detected that our clock timestamp is less than the current global, we know that we need to return a higher one, even though it is not exactly the one we compared to. OTOH, if we detect we're greater than the current time source, we atomically replace the value with our new readings. This do causes contention on big boxes (but big here means BIG), but it seems like a good trade off, since it provide us with a time source guaranteed to be stable wrt time warps. After this patch is applied, I don't see a single warp in time during 5 days of execution, in any of the machines I saw them before. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> CC: Jeremy Fitzhardinge <jeremy@goop.org> CC: Avi Kivity <avi@redhat.com> CC: Marcelo Tosatti <mtosatti@redhat.com> CC: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:41:00 +03:00
Glauber Costa	424c32f1aa	x86, paravirt: Enable pvclock flags in vcpu_time_info structure This patch removes one padding byte and transform it into a flags field. New versions of guests using pvclock will query these flags upon each read. Flags, however, will only be interpreted when the guest decides to. It uses the pvclock_valid_flags function to signal that a specific set of flags should be taken into consideration. Which flags are valid are usually devised via HV negotiation. Signed-off-by: Glauber Costa <glommer@redhat.com> CC: Jeremy Fitzhardinge <jeremy@goop.org> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:40:59 +03:00
Roedel, Joerg	b69e8caef5	KVM: x86: Inject #GP with the right rip on efer writes This patch fixes a bug in the KVM efer-msr write path. If a guest writes to a reserved efer bit the set_efer function injects the #GP directly. The architecture dependent wrmsr function does not see this, assumes success and advances the rip. This results in a #GP in the guest with the wrong rip. This patch fixes this by reporting efer write errors back to the architectural wrmsr function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-19 11:36:39 +03:00
Joerg Roedel	0d945bd935	KVM: SVM: Don't allow nested guest to VMMCALL into host This patch disables the possibility for a l2-guest to do a VMMCALL directly into the host. This would happen if the l1-hypervisor doesn't intercept VMMCALL and the l2-guest executes this instruction. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-19 11:36:38 +03:00
Joerg Roedel	3f0fd2927b	KVM: x86: Fix exception reinjection forced to true The patch merged recently which allowed to mark an exception as reinjected has a bug as it always marks the exception as reinjected. This breaks nested-svm shadow-on-shadow implementation. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-19 11:36:37 +03:00
Avi Kivity	9ed3c444ab	KVM: Fix wallclock version writing race Wallclock writing uses an unprotected global variable to hold the version; this can cause one guest to interfere with another if both write their wallclock at the same time. Acked-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-19 11:36:36 +03:00
Avi Kivity	8facbbff07	KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots On svm, kvm_read_pdptr() may require reading guest memory, which can sleep. Push the spinlock into mmu_alloc_roots(), and only take it after we've read the pdptr. Tested-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-19 11:36:35 +03:00
Shane Wang	cafd66595d	KVM: VMX: enable VMXON check with SMX enabled (Intel TXT) Per document, for feature control MSR: Bit 1 enables VMXON in SMX operation. If the bit is clear, execution of VMXON in SMX operation causes a general-protection exception. Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution of VMXON outside SMX operation causes a general-protection exception. This patch is to enable this kind of check with SMX for VMXON in KVM. Signed-off-by: Shane Wang <shane.wang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-19 11:36:34 +03:00
Marcelo Tosatti	f1d86e469b	KVM: x86: properly update ready_for_interrupt_injection The recent changes to emulate string instructions without entering guest mode exposed a bug where pending interrupts are not properly reflected in ready_for_interrupt_injection. The result is that userspace overwrites a previously queued interrupt, when irqchip's are emulated in userspace. Fix by always updating state before returning to userspace. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-19 11:36:33 +03:00
Avi Kivity	84ad33ef5d	KVM: VMX: Atomically switch efer if EPT && !EFER.NX When EPT is enabled, we cannot emulate EFER.NX=0 through the shadow page tables. This causes accesses through ptes with bit 63 set to succeed instead of failing a reserved bit check. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-19 11:36:32 +03:00
Avi Kivity	61d2ef2ce3	KVM: VMX: Add facility to atomically switch MSRs on guest entry/exit Some guest msr values cannot be used on the host (for example. EFER.NX=0), so we need to switch them atomically during guest entry or exit. Add a facility to program the vmx msr autoload registers accordingly. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-19 11:36:31 +03:00
Avi Kivity	5dfa3d170e	KVM: VMX: Add definitions for guest and host EFER autoswitch vmcs entries Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-19 11:36:31 +03:00
Avi Kivity	19b95dba03	KVM: VMX: Add definition for msr autoload entry Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-19 11:36:30 +03:00
Avi Kivity	0ee75bead8	KVM: Let vcpu structure alignment be determined at runtime vmx and svm vcpus have different contents and therefore may have different alignmment requirements. Let each specify its required alignment. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-19 11:36:29 +03:00
Xiao Guangrong	884a0ff0b6	KVM: MMU: cleanup invlpg code Using is_last_spte() to cleanup invlpg code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:36:28 +03:00
Xiao Guangrong	5e1b3ddbf2	KVM: MMU: move unsync/sync tracpoints to proper place Move unsync/sync tracepoints to the proper place, it's good for us to obtain unsync page live time Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:36:27 +03:00
Xiao Guangrong	85f2067c31	KVM: MMU: convert mmu tracepoints Convert mmu tracepoints by using DECLARE_EVENT_CLASS Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:36:26 +03:00
Xiao Guangrong	22c9b2d166	KVM: MMU: fix for calculating gpa in invlpg code If the guest is 32-bit, we should use 'quadrant' to adjust gpa offset Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:36:25 +03:00
Gui Jianfeng	d35b8dd935	KVM: Fix mmu shrinker error kvm_mmu_remove_one_alloc_mmu_page() assumes kvm_mmu_zap_page() only reclaims only one sp, but that's not the case. This will cause mmu shrinker returns a wrong number. This patch fix the counting error. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:36:23 +03:00
Eric Northup	5a7388c2d2	KVM: MMU: fix hashing for TDP and non-paging modes For TDP mode, avoid creating multiple page table roots for the single guest-to-host physical address map by fixing the inputs used for the shadow page table hash in mmu_alloc_roots(). Signed-off-by: Eric Northup <digitaleric@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:36:22 +03:00
Cyrill Gorcunov	ce7f15452c	perf, x86: P4 PMU -- add missing bit in CCCR mask Should be there for the sake of RAW events. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Lin Ming <ming.m.lin@intel.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100518212439.354345151@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-19 09:41:06 +02:00
Cyrill Gorcunov	9d36dfcf21	perf, x86: P4_pmu_schedule_events -- use smp_processor_id instead of raw_ This snippet somehow escaped the commit: \| commit `137351e0fe` \| Author: Cyrill Gorcunov <gorcunov@openvz.org> \| Date: Sat May 8 15:25:52 2010 +0400 \| \| x86, perf: P4 PMU -- protect sensible procedures from preemption so bring it eventually back. It helps to catch preemption issue (if there will be, rule of thumb -- don't use raw_ if you can). Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100518212439.167259349@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-19 09:41:05 +02:00
Cyrill Gorcunov	623aab896e	perf, x86: P4 PMU -- do a real check for ESCR address being in hash To prevent from clashes in future code modifications do a real check for ESCR address being in hash. At moment the callers are known to pass sane values but better to be on a safe side. And comment fix. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Lin Ming <ming.m.lin@intel.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100518212439.004503600@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-19 09:41:05 +02:00
Feng Tang	a02ce953a1	x86/PCI: make ACPI MCFG reserved error messages ACPI specific Both ACPI and SFI firmwares will have MCFG space, but the error message isn't valid on SFI, so don't print the message in that case. Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-05-18 15:03:27 -07:00
Linus Torvalds	537b60d178	Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV: uv_irq.c: Fix all sparse warnings x86, UV: Improve BAU performance and error recovery x86, UV: Delete unneeded boot messages x86, UV: Clean up UV headers for MMR definitions	2010-05-18 09:46:35 -07:00
Linus Torvalds	3ae684e1c4	Merge branch 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, tboot: Add support for S3 memory integrity protection	2010-05-18 09:28:24 -07:00
Linus Torvalds	c4fd308ed6	Merge branch 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pat: Update the page flags for memtype atomically instead of using memtype_lock x86, pat: In rbt_memtype_check_insert(), update new->type only if valid x86, pat: Migrate to rbtree only backend for pat memtype management x86, pat: Preparatory changes in pat.c for bigger rbtree change rbtree: Add support for augmented rbtrees	2010-05-18 09:28:04 -07:00
Linus Torvalds	96fbeb973a	Merge branch 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mrst: add nop functions to x86_init mpparse functions x86, mrst, pci: return 0 for non-present pci bars x86: Avoid check hlt for newer cpus	2010-05-18 09:27:49 -07:00
Linus Torvalds	1f8caa986a	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64: Combine SRAT regions when possible	2010-05-18 09:17:50 -07:00
Linus Torvalds	d6f3875252	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/microcode: Use nonseekable_open() x86: Improve Intel microcode loader performance	2010-05-18 09:17:17 -07:00
Linus Torvalds	cb41838bbc	Merge branch 'core-hweight-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-hweight-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hweight: Use a 32-bit popcnt for __arch_hweight32() arch, hweight: Fix compilation errors x86: Add optimized popcnt variants bitops: Optimize hweight() by making use of compile-time evaluation	2010-05-18 09:17:01 -07:00
Linus Torvalds	98f01720cb	Merge branch 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, acpi/irq: Define gsi_end when X86_IO_APIC is undefined x86, irq: Kill io_apic_renumber_irq x86, acpi/irq: Handle isa irqs that are not identity mapped to gsi's. x86, ioapic: Simplify probe_nr_irqs_gsi. x86, ioapic: Optimize pin_2_irq x86, ioapic: Move nr_ioapic_registers calculation to mp_register_ioapic. x86, ioapic: In mpparse use mp_register_ioapic x86, ioapic: Teach mp_register_ioapic to compute a global gsi_end x86, ioapic: Fix the types of gsi values x86, ioapic: Fix io_apic_redir_entries to return the number of entries. x86, ioapic: Only export mp_find_ioapic and mp_find_ioapic_pin in io_apic.h x86, acpi/irq: Generalize mp_config_acpi_legacy_irqs x86, acpi/irq: Fix acpi_sci_ioapic_setup so it has both bus_irq and gsi x86, acpi/irq: pci device dev->irq is an isa irq not a gsi x86, acpi/irq: Teach acpi_get_override_irq to take a gsi not an isa_irq x86, acpi/irq: Introduce apci_isa_irq_to_gsi	2010-05-18 09:15:57 -07:00
Linus Torvalds	41d59102e1	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, fpu: Use static_cpu_has() to implement use_xsave() x86: Add new static_cpu_has() function using alternatives x86, fpu: Use the proper asm constraint in use_xsave() x86, fpu: Unbreak FPU emulation x86: Introduce 'struct fpu' and related API x86: Eliminate TS_XSAVE x86-32: Don't set ignore_fpu_irq in simd exception x86: Merge kernel_math_error() into math_error() x86: Merge simd_math_error() into math_error() x86-32: Rework cache flush denied handler Fix trivial conflict in arch/x86/kernel/process.c	2010-05-18 08:58:16 -07:00
Linus Torvalds	07d77759c9	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hypervisor: add missing <linux/module.h> Modify the VMware balloon driver for the new x86_hyper API x86, hypervisor: Export the x86_hyper* symbols x86: Clean up the hypervisor layer x86, HyperV: fix up the license to mshyperv.c x86: Detect running on a Microsoft HyperV system x86, cpu: Make APERF/MPERF a normal table-driven flag x86, k8: Fix build error when K8_NB is disabled x86, cacheinfo: Disable index in all four subcaches x86, cacheinfo: Make L3 cache info per node x86, cacheinfo: Reorganize AMD L3 cache structure x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments x86, cacheinfo: Unify AMD L3 cache index disable checking cpufreq: Unify sysfs attribute definition macros powernow-k8: Fix frequency reporting x86, cpufreq: Add APERF/MPERF support for AMD processors x86: Unify APERF/MPERF support powernow-k8: Add core performance boost support x86, cpu: Add AMD core boosting feature flag to /proc/cpuinfo Fix up trivial conflicts in arch/x86/kernel/cpu/intel_cacheinfo.c and drivers/cpufreq/cpufreq_ondemand.c	2010-05-18 08:49:13 -07:00
Linus Torvalds	b7723f9d21	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Clean up arch/x86/Kconfig* x86-64: Don't export init_level4_pgt	2010-05-18 08:40:21 -07:00
Linus Torvalds	93c9d7f60c	Merge branch 'x86-atomic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-atomic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix LOCK_PREFIX_HERE for uniprocessor build x86, atomic64: In selftest, distinguish x86-64 from 586+ x86-32: Fix atomic64_inc_not_zero return value convention lib: Fix atomic64_inc_not_zero test lib: Fix atomic64_add_unless return value convention x86-32: Fix atomic64_add_unless return value convention lib: Fix atomic64_add_unless test x86: Implement atomic[64]_dec_if_positive() lib: Only test atomic64_dec_if_positive on archs having it x86-32: Rewrite 32-bit atomic64 functions in assembly lib: Add self-test for atomic64_t x86-32: Allow UP/SMP lock replacement in cmpxchg64 x86: Add support for lock prefix in alternatives	2010-05-18 08:40:05 -07:00
Linus Torvalds	7421a10de7	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Use .cfi_sections for assembly code x86-64: Reduce SMP locks table size x86, asm: Introduce and use percpu_inc()	2010-05-18 08:35:37 -07:00
Linus Torvalds	4d7b4ac22f	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (311 commits) perf tools: Add mode to build without newt support perf symbols: symbol inconsistency message should be done only at verbose=1 perf tui: Add explicit -lslang option perf options: Type check all the remaining OPT_ variants perf options: Type check OPT_BOOLEAN and fix the offenders perf options: Check v type in OPT_U?INTEGER perf options: Introduce OPT_UINTEGER perf tui: Add workaround for slang < 2.1.4 perf record: Fix bug mismatch with -c option definition perf options: Introduce OPT_U64 perf tui: Add help window to show key associations perf tui: Make <- exit menus too perf newt: Add single key shortcuts for zoom into DSO and threads perf newt: Exit browser unconditionally when CTRL+C, q or Q is pressed perf newt: Fix the 'A'/'a' shortcut for annotate perf newt: Make <- exit the ui_browser x86, perf: P4 PMU - fix counters management logic perf newt: Make <- zoom out filters perf report: Report number of events, not samples perf hist: Clarify events_stats fields usage ... Fix up trivial conflicts in kernel/fork.c and tools/perf/builtin-record.c	2010-05-18 08:19:03 -07:00
Linus Torvalds	3aaf51ace5	Merge branch 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits) oprofile/x86: make AMD IBS hotplug capable oprofile/x86: notify cpus only when daemon is running oprofile/x86: reordering some functions oprofile/x86: stop disabled counters in nmi handler oprofile/x86: protect cpu hotplug sections oprofile/x86: remove CONFIG_SMP macros oprofile/x86: fix uninitialized counter usage during cpu hotplug oprofile/x86: remove duplicate IBS capability check oprofile/x86: move IBS code oprofile/x86: return -EBUSY if counters are already reserved oprofile/x86: moving shutdown functions oprofile/x86: reserve counter msrs pairwise oprofile/x86: rework error handler in nmi_setup() oprofile: update file list in MAINTAINERS file oprofile: protect from not being in an IRQ context oprofile: remove double ring buffering ring-buffer: Add lost event count to end of sub buffer tracing: Show the lost events in the trace_pipe output ring-buffer: Add place holder recording of dropped events tracing: Fix compile error in module tracepoints when MODULE_UNLOAD not set ...	2010-05-18 08:18:07 -07:00
Linus Torvalds	1014cfe2fb	Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: lockdep: Reduce stack_trace usage lockdep: No need to disable preemption in debug atomic ops lockdep: Actually _dec_ in debug_atomic_dec lockdep: Provide off case for redundant_hardirqs_on increment lockdep: Simplify debug atomic ops lockdep: Fix redundant_hardirqs_on incremented with irqs enabled lockstat: Make lockstat counting per cpu i8253: Convert i8253_lock to raw_spinlock	2010-05-18 08:17:35 -07:00
Linus Torvalds	8123d8f17d	Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/amd-iommu: Add amd_iommu=off command line option iommu-api: Remove iommu_{un}map_range functions x86/amd-iommu: Implement ->{un}map callbacks for iommu-api x86/amd-iommu: Make amd_iommu_iova_to_phys aware of multiple page sizes x86/amd-iommu: Make iommu_unmap_page and fetch_pte aware of page sizes x86/amd-iommu: Make iommu_map_page and alloc_pte aware of page sizes kvm: Change kvm_iommu_map_pages to map large pages VT-d: Change {un}map_range functions to implement {un}map interface iommu-api: Add ->{un}map callbacks to iommu_ops iommu-api: Add iommu_map and iommu_unmap functions iommu-api: Rename ->{un}map function pointers to ->{un}map_range	2010-05-18 07:22:37 -07:00
Cyrill Gorcunov	ef4f30f54e	perf, x86: P4 PMU -- fix typo in unflagged NMI handling Tested-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> LKML-Reference: <1274174954.22793.17.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-18 12:05:20 +02:00
Cyrill Gorcunov	0db1a7bc00	perf, x86: P4 PMU -- handle unflagged events It might happen that an event can overflow without the proper overflow flag set. Check the sign bit in the raw counter value to solve this problem. Tested-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: fweisbec@gmail.com Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1274083984.6540.15.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-18 08:25:34 +02:00
H. Peter Anvin	c59bd56882	x86, hweight: Use a 32-bit popcnt for __arch_hweight32() Use a 32-bit popcnt instruction for __arch_hweight32(), even on x86-64. Even though the input register will usually be zero-extended due to the standard operation of the hardware, it isn't necessarily so if the input value was the result of truncating a 64-bit operation. Note: the POPCNT32 variant used on x86-64 has a technically unnecessary REX prefix to make it five bytes long, the same as a CALL instruction, therefore avoiding an unnecessary NOP. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <alpine.LFD.2.00.1005171443060.4195@i5.linux-foundation.org>	2010-05-17 15:17:16 -07:00
Andreas Herrmann	fec84e3307	x86, hpet: Add reference to chipset erratum documentation for disable-hpet-msi-quirk (At the moment the "SB700 Family Product Errata" document is available at http://support.amd.com/us/Embedded_TechDocs/46837.pdf) Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100517164324.GB10254@alberich.amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-05-17 10:04:43 -07:00
Sreedhara DS	9a58a33339	IPC driver for Intel Mobile Internet Device (MID) platforms The IPC (inter processor communications) is used to provide the communications between kernel and system control units on some embedded Intel x86 platforms. (Various bits of clean up and restructuring by Alan Cox) Signed-off-by: Sreedhara DS <sreedhara.ds@intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com>	2010-05-17 12:06:07 -04:00
Anton Blanchard	f3d46f9d31	atomic_t: Cast to volatile when accessing atomic variables In preparation for removing volatile from the atomic_t definition, this patch adds a volatile cast to all the atomic read functions. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-17 07:57:27 -07:00
Gui Jianfeng	df2fb6e710	KVM: MMU: fix sp->unsync type error in trace event definition sp->unsync is bool now, so update trace event declaration. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:19:29 +03:00
Joerg Roedel	ff47a49b23	KVM: SVM: Handle MCE intercepts always on host level This patch prevents MCE intercepts from being propagated into the L1 guest if they happened in an L2 guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:27 +03:00
Joerg Roedel	ce7ddec4bb	KVM: x86: Allow marking an exception as reinjected This patch adds logic to kvm/x86 which allows to mark an injected exception as reinjected. This allows to remove an ugly hack from svm_complete_interrupts that prevented exceptions from being reinjected at all in the nested case. The hack was necessary because an reinjected exception into the nested guest could cause a nested vmexit emulation. But reinjected exceptions must not intercept. The downside of the hack is that a exception that in injected could get lost. This patch fixes the problem and puts the code for it into generic x86 files because. Nested-VMX will likely have the same problem and could reuse the code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:26 +03:00
Joerg Roedel	c2c63a4939	KVM: SVM: Report emulated SVM features to userspace This patch implements the reporting of the emulated SVM features to userspace instead of the real hardware capabilities. Every real hardware capability needs emulation in nested svm so the old behavior was broken. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:24 +03:00
Joerg Roedel	d4330ef2fb	KVM: x86: Add callback to let modules decide over some supported cpuid bits This patch adds the get_supported_cpuid callback to kvm_x86_ops. It will be used in do_cpuid_ent to delegate the decission about some supported cpuid bits to the architecture modules. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:23 +03:00
Joerg Roedel	228070b1b3	KVM: SVM: Propagate nested entry failure into guest hypervisor This patch implements propagation of a failes guest vmrun back into the guest instead of killing the whole guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:21 +03:00
Joerg Roedel	2be4fc7a02	KVM: SVM: Sync cr0 and cr3 to kvm state before nested handling This patch syncs cr0 and cr3 from the vmcb to the kvm state before nested intercept handling is done. This allows to simplify the vmexit path. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:20 +03:00
Joerg Roedel	2041a06a50	KVM: SVM: Make sure rip is synced to vmcb before nested vmexit This patch fixes a bug where a nested guest always went over the same instruction because the rip was not advanced on a nested vmexit. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:18 +03:00
Joerg Roedel	924584ccb0	KVM: SVM: Fix nested nmi handling The patch introducing nested nmi handling had a bug. The check does not belong to enable_nmi_window but must be in nmi_allowed. This patch fixes this. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:17 +03:00
Avi Kivity	8d3b932309	Merge remote branch 'tip/perf/core' Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:15 +03:00
Lai Jiangshan	cdbecfc398	KVM: VMX: free vpid when fail to create vcpu Fix bug of the exception path, free allocated vpid when fail to create vcpu. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:19:10 +03:00
Wei Yongjun	77a1a71570	KVM: MMU: cleanup for function unaccount_shadowed() Since gfn is not changed in the for loop, we do not need to call gfn_to_memslot_unaliased() under the loop, and it is safe to move it out. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:18:12 +03:00
Gui Jianfeng	2a059bf444	KVM: Get rid of dead function gva_to_page() Nobody use gva_to_page() anymore, get rid of it. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:18:10 +03:00
Gui Jianfeng	b2fc15a5ef	KVM: MMU: Remove unused varialbe in rmap_next() Remove unused varialbe in rmap_next() Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:18:09 +03:00
Gui Jianfeng	814a59d207	KVM: MMU: Make use of is_large_pte() in walker Make use of is_large_pte() instead of checking PT_PAGE_SIZE_MASK bit directly. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:18:07 +03:00
Gui Jianfeng	51fb60d81b	KVM: MMU: Move sync_page() first pte address calculation out of loop Move first pte address calculation out of loop to save some cycles. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:18:06 +03:00
Avi Kivity	87bc3bf972	KVM: MMU: Drop cr4.pge from shadow page role Since commit `bf47a760f6`, we no longer handle ptes with the global bit set specially, so there is no reason to distinguish between shadow pages created with cr4.gpe set and clear. Such tracking is expensive when the guest toggles cr4.pge, so drop it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:18:03 +03:00
Lai Jiangshan	90d83dc3d4	KVM: use the correct RCU API for PROVE_RCU=y The RCU/SRCU API have already changed for proving RCU usage. I got the following dmesg when PROVE_RCU=y because we used incorrect API. This patch coverts rcu_deference() to srcu_dereference() or family API. =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by qemu-system-x86/8550: #0: (&kvm->slots_lock){+.+.+.}, at: [<ffffffffa011a6ac>] kvm_set_memory_region+0x29/0x50 [kvm] #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [<ffffffffa012262d>] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm] stack backtrace: Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27 Call Trace: [<ffffffff8106c59e>] lockdep_rcu_dereference+0xaa/0xb3 [<ffffffffa012f6c1>] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm] [<ffffffffa012263e>] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm] [<ffffffffa011a5d7>] __kvm_set_memory_region+0x636/0x6e2 [kvm] [<ffffffffa011a6ba>] kvm_set_memory_region+0x37/0x50 [kvm] [<ffffffffa015e956>] vmx_set_tss_addr+0x46/0x5a [kvm_intel] [<ffffffffa0126592>] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm] [<ffffffff810a8692>] ? unlock_page+0x27/0x2c [<ffffffff810bf879>] ? __do_fault+0x3a9/0x3e1 [<ffffffffa011b12f>] kvm_vm_ioctl+0x364/0x38d [kvm] [<ffffffff81060cfa>] ? up_read+0x23/0x3d [<ffffffff810f3587>] vfs_ioctl+0x32/0xa6 [<ffffffff810f3b19>] do_vfs_ioctl+0x495/0x4db [<ffffffff810e6b2f>] ? fget_light+0xc2/0x241 [<ffffffff810e416c>] ? do_sys_open+0x104/0x116 [<ffffffff81382d6d>] ? retint_swapgs+0xe/0x13 [<ffffffff810f3ba6>] sys_ioctl+0x47/0x6a [<ffffffff810021db>] system_call_fastpath+0x16/0x1b Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:18:01 +03:00
Avi Kivity	9beeaa2d68	Merge branch 'perf' Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:17:58 +03:00
Xiao Guangrong	3246af0ece	KVM: MMU: cleanup for hlist walk restart Quote from Avi: \|Just change the assignment to a 'goto restart;' please, \|I don't like playing with list_for_each internals. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:56 +03:00
Gleb Natapov	acb5451789	KVM: prevent spurious exit to userspace during task switch emulation. If kvm_task_switch() fails code exits to userspace without specifying exit reason, so the previous exit reason is reused by userspace. Fix this by specifying exit reason correctly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:55 +03:00
Xiao Guangrong	6b18493d60	KVM: MMU: remove unused parameter in mmu_parent_walk() 'vcpu' is unused, remove it Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:53 +03:00
Xiao Guangrong	0571d366e0	KVM: MMU: reduce 'struct kvm_mmu_page' size Define 'multimapped' as 'bool'. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:52 +03:00
Xiao Guangrong	1b8c7934a4	KVM: MMU: remove unused struct kvm_unsync_walk Remove 'struct kvm_unsync_walk' since it's not used. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:50 +03:00
Gleb Natapov	19d0443726	KVM: fix emulator_task_switch() return value. emulator_task_switch() should return -1 for failure and 0 for success to the caller, just like x86_emulate_insn() does. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:49 +03:00
Avi Kivity	5b7e0102ae	KVM: MMU: Replace role.glevels with role.cr4_pae There is no real distinction between glevels=3 and glevels=4; both have exactly the same format and the code is treated exactly the same way. Drop role.glevels and replace is with role.cr4_pae (which is meaningful). This simplifies the code a bit. As a side effect, it allows sharing shadow page tables between pae and longmode guest page tables at the same guest page. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:47 +03:00
Jan Kiszka	e269fb2189	KVM: x86: Push potential exception error code on task switches When a fault triggers a task switch, the error code, if existent, has to be pushed on the new task's stack. Implement the missing bits. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:46 +03:00
Jan Kiszka	0760d44868	KVM: x86: Terminate early if task_switch_16/32 failed Stop the switch immediately if task_switch_16/32 returned an error. Only if that step succeeded, the switch should actually take place and update any register states. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:44 +03:00
Gleb Natapov	8f6abd06f5	KVM: x86: get rid of mmu_only parameter in emulator_write_emulated() We can call kvm_mmu_pte_write() directly from emulator_cmpxchg_emulated() instead of passing mmu_only down to emulator_write_emulated_onepage() and call it there. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:43 +03:00
Gleb Natapov	020df0794f	KVM: move DR register access handling into generic code Currently both SVM and VMX have their own DR handling code. Move it to x86.c. Acked-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:39 +03:00
Andre Przywara	6bc31bdc55	KVM: SVM: implement NEXTRIPsave SVM feature On SVM we set the instruction length of skipped instructions to hard-coded, well known values, which could be wrong when (bogus, but valid) prefixes (REX, segment override) are used. Newer AMD processors (Fam10h 45nm and better, aka. PhenomII or AthlonII) have an explicit NEXTRIP field in the VMCB containing the desired information. Since it is cheap to do so, we use this field to override the guessed value on newer processors. A fix for older CPUs would be rather expensive, as it would require to fetch and partially decode the instruction. As the problem is not a security issue and needs special, handcrafted code to trigger (no compiler will ever generate such code), I omit a fix for older CPUs. If someone is interested, I have both a patch for these CPUs as well as demo code triggering this issue: It segfaults under KVM, but runs perfectly on native Linux. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:38 +03:00
Avi Kivity	f7a711971e	KVM: Fix MAXPHYADDR calculation when cpuid does not support it MAXPHYADDR is derived from cpuid 0x80000008, but when that isn't present, we get some random value. Fix by checking first that cpuid 0x80000008 is supported. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:36 +03:00
Avi Kivity	e46479f852	KVM: Trace emulated instructions Log emulated instructions in ftrace, especially if they failed. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:35 +03:00
Avi Kivity	2fb53ad811	KVM: x86 emulator: Don't overwrite decode cache Currently if we an instruction spans a page boundary, when we fetch the second half we overwrite the first half. This prevents us from tracing the full instruction opcodes. Fix by appending the second half to the first. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:33 +03:00
Xiao Guangrong	24222c2fec	KVM: MMU: remove unnecessary NX check in walk_addr After is_rsvd_bits_set() checks, EFER.NXE must be enabled if NX bit is seted Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:30 +03:00
Xiao Guangrong	f84cbb0561	KVM: MMU: remove unused field kvm_mmu_page.oos_link is not used, so remove it Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:29 +03:00
Xiao Guangrong	805d32dea4	KVM: MMU: cleanup/fix mmu audit code This patch does: - 'sp' parameter in inspect_spte_fn() is not used, so remove it - fix 'kvm' and 'slots' is not defined in count_rmaps() - fix a bug in inspect_spte_has_rmap() Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:27 +03:00
Avi Kivity	84b0c8c6a6	KVM: MMU: Disassociate direct maps from guest levels Direct maps are linear translations for a section of memory, used for real mode or with large pages. As such, they are independent of the guest levels. Teach the mmu about this by making page->role.glevels = 0 for direct maps. This allows direct maps to be shared among real mode and the various paging modes. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:16:44 +03:00
Xiao Guangrong	f815bce894	KVM: MMU: check reserved bits only if CR4.PSE=1 or CR4.PAE=1 - Check reserved bits only if CR4.PAE=1 or CR4.PSE=1 when guest #PF occurs - Fix a typo in reset_rsvds_bits_mask() Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:16:42 +03:00
Marcelo Tosatti	041b1359a2	KVM: x86: document KVM_REQ_PENDING_TIMER usage Document that KVM_REQ_PENDING_TIMER is implicitly used during guest entry. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:16:40 +03:00
Gleb Natapov	de3e6480f7	KVM: x86 emulator: fix unlocked CMPXCHG8B emulation When CMPXCHG8B is executed without LOCK prefix it is racy. Preserve this behaviour in emulator too. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:16:38 +03:00
Gleb Natapov	6550e1f165	KVM: x86 emulator: add decoding of CMPXCHG8B dst operand Decode CMPXCHG8B destination operand in decoding stage. Fixes regression introduced by "If LOCK prefix is used dest arg should be memory" commit. This commit relies on dst operand be decoded at the beginning of an instruction emulation. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:16:37 +03:00
Gleb Natapov	482ac18ae2	KVM: x86 emulator: commit rflags as part of registers commit Make sure that rflags is committed only after successful instruction emulation. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:16:35 +03:00
Jan Kiszka	9749a6c0f0	KVM: x86: Fix 32-bit build breakage due to typo Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:16:34 +03:00
Gleb Natapov	92bf9748b5	KVM: small kvm_arch_vcpu_ioctl_run() cleanup. Unify all conditions that get us back into emulator after returning from userspace. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:32 +03:00
Gleb Natapov	7b262e90fc	KVM: x86 emulator: introduce pio in string read ahead. To optimize "rep ins" instruction do IO in big chunks ahead of time instead of doing it only when required during instruction emulation. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:31 +03:00
Gleb Natapov	5cd21917da	KVM: x86 emulator: restart string instruction without going back to a guest. Currently when string instruction is only partially complete we go back to a guest mode, guest tries to reexecute instruction and exits again and at this point emulation continues. Avoid all of this by restarting instruction without going back to a guest mode, but return to a guest mode each 1024 iterations to allow interrupt injection. Pending exception causes immediate guest entry too. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:29 +03:00
Gleb Natapov	cb404fe089	KVM: x86 emulator: remove saved_eip c->eip is never written back in case of emulation failure, so no need to set it to old value. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:28 +03:00
Gleb Natapov	7972995b0c	KVM: x86 emulator: Move string pio emulation into emulator.c Currently emulation is done outside of emulator so things like doing ins/outs to/from mmio are broken it also makes it hard (if not impossible) to implement single stepping in the future. The implementation in this patch is not efficient since it exits to userspace for each IO while previous implementation did 'ins' in batches. Further patch that implements pio in string read ahead address this problem. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:26 +03:00
Gleb Natapov	cf8f70bfe3	KVM: x86 emulator: fix in/out emulation. in/out emulation is broken now. The breakage is different depending on where IO device resides. If it is in userspace emulator reports emulation failure since it incorrectly interprets kvm_emulate_pio() return value. If IO device is in the kernel emulation of 'in' will do nothing since kvm_emulate_pio() stores result directly into vcpu registers, so emulator will overwrite result of emulation during commit of shadowed register. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:25 +03:00
Gleb Natapov	d9271123a4	KVM: x86 emulator: during rep emulation decrement ECX only if emulation succeeded Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:23 +03:00
Gleb Natapov	a682e35449	KVM: x86 emulator: add decoding of X,Y parameters from Intel SDM Add decoding of X,Y parameters from Intel SDM which are used by string instruction to specify source and destination. Use this new decoding to implement movs, cmps, stos, lods in a generic way. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:21 +03:00
Gleb Natapov	69f55cb11e	KVM: x86 emulator: populate OP_MEM operand during decoding. All struct operand fields are initialized during decoding for all operand types except OP_MEM, but there is no reason for that. Move OP_MEM operand initialization into decoding stage for consistency. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:20 +03:00
Gleb Natapov	ceffb45972	KVM: Use task switch from emulator.c Remove old task switch code from x86.c Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:18 +03:00
Gleb Natapov	2e873022f5	KVM: x86 emulator: Use load_segment_descriptor() instead of kvm_load_segment_descriptor() Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:17 +03:00
Gleb Natapov	38ba30ba51	KVM: x86 emulator: Emulate task switch in emulator.c Implement emulation of 16/32 bit task switch in emulator.c Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:15 +03:00
Gleb Natapov	2dafc6c234	KVM: x86 emulator: Provide more callbacks for x86 emulator. Provide get_cached_descriptor(), set_cached_descriptor(), get_segment_selector(), set_segment_selector(), get_gdt(), write_std() callbacks. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:14 +03:00
Gleb Natapov	aca06a8307	KVM: x86 emulator: cleanup grp3 return value When x86_emulate_insn() does not know how to emulate instruction it exits via cannot_emulate label in all cases except when emulating grp3. Fix that. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:12 +03:00
Gleb Natapov	a41ffb7540	KVM: x86 emulator: If LOCK prefix is used dest arg should be memory. If LOCK prefix is used dest arg should be memory, otherwise instruction should generate #UD. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:11 +03:00
Gleb Natapov	fd5253658b	KVM: x86 emulator: do not call writeback if msr access fails. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:09 +03:00
Gleb Natapov	2e901c4cf4	KVM: x86 emulator: fix return values of syscall/sysenter/sysexit emulations Return X86EMUL_PROPAGATE_FAULT is fault was injected. Also inject #UD for those instruction when appropriate. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:08 +03:00
Gleb Natapov	1e470be5a1	KVM: x86 emulator: fix mov dr to inject #UD when needed. If CR4.DE=1 access to registers DR4/DR5 cause #UD. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:06 +03:00
Gleb Natapov	6aebfa6ea7	KVM: x86 emulator: inject #UD on access to non-existing CR Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:05 +03:00
Gleb Natapov	ab8557b2b3	KVM: x86 emulator: 0f (20\|21\|22\|23) ignore mod bits. Resent spec says that for 0f (20\|21\|22\|23) the 2 bits in the mod field are ignored. Interestingly enough older spec says that 11 is only valid encoding. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:03 +03:00
Gleb Natapov	6e1e5ffee8	KVM: x86 emulator: fix 0f 01 /5 emulation It is undefined and should generate #UD. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:02 +03:00
Gleb Natapov	5e3ae6c540	KVM: x86 emulator: fix mov r/m, sreg emulation. mov r/m, sreg generates #UD ins sreg is incorrect. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:16:00 +03:00
Gleb Natapov	063db061b9	KVM: Provide current eip as part of emulator context. Eliminate the need to call back into KVM to get it from emulator. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:59 +03:00
Gleb Natapov	9c5372445c	KVM: Provide x86_emulate_ctxt callback to get current cpl Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:57 +03:00
Gleb Natapov	93a152be5a	KVM: remove realmode_lmsw function. Use (get\|set)_cr callback to emulate lmsw inside emulator. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:56 +03:00
Gleb Natapov	52a4661737	KVM: Provide callback to get/set control registers in emulator ops. Use this callback instead of directly call kvm function. Also rename realmode_(set\|get)_cr to emulator_(set\|get)_cr since function has nothing to do with real mode. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:54 +03:00
Gui Jianfeng	3129994458	KVM: VMX: change to use bool return values Make use of bool as return values, and remove some useless bool value converting. Thanks Avi to point this out. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:51 +03:00
Gleb Natapov	49c6799a2c	KVM: Remove pointer to rflags from realmode_set_cr parameters. Mov reg, cr instruction doesn't change flags in any meaningful way, so no need to update rflags after instruction execution. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:50 +03:00
Gleb Natapov	af5b4f7ff7	KVM: x86 emulator: check return value against correct define Check return value against correct define instead of open code the value. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:48 +03:00
Gleb Natapov	c73e197bc5	KVM: x86 emulator: fix RCX access during rep emulation During rep emulation access length to RCX depends on current address mode. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:47 +03:00
Gleb Natapov	d6d367d678	KVM: x86 emulator: Fix DstAcc decoding. Set correct operation length. Add RAX (64bit) handling. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:45 +03:00
Avi Kivity	08e850c653	KVM: MMU: Reinstate pte prefetch on invlpg Commit `fb341f57` removed the pte prefetch on guest invlpg, citing guest races. However, the SDM is adamant that prefetch is allowed: "The processor may create entries in paging-structure caches for translations required for prefetches and for accesses that are a result of speculative execution that would never actually occur in the executed code path." And, in fact, there was a race in the prefetch code: we picked up the pte without the mmu lock held, so an older invlpg could install the pte over a newer invlpg. Reinstate the prefetch logic, but this time note whether another invlpg has executed using a counter. If a race occured, do not install the pte. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:43 +03:00
Avi Kivity	fbc5d139bb	KVM: MMU: Do not instantiate nontrapping spte on unsync page The update_pte() path currently uses a nontrapping spte when a nonpresent (or nonaccessed) gpte is written. This is fine since at present it is only used on sync pages. However, on an unsync page this will cause an endless fault loop as the guest is under no obligation to invlpg a gpte that transitions from nonpresent to present. Needed for the next patch which reinstates update_pte() on invlpg. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:42 +03:00
Avi Kivity	4a5f48f666	KVM: Don't follow an atomic operation by a non-atomic one Currently emulated atomic operations are immediately followed by a non-atomic operation, so that kvm_mmu_pte_write() can be invoked. This updates the mmu but undoes the whole point of doing things atomically. Fix by only performing the atomic operation and the mmu update, and avoiding the non-atomic write. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:40 +03:00
Avi Kivity	daea3e73cb	KVM: Make locked operations truly atomic Once upon a time, locked operations were emulated while holding the mmu mutex. Since mmu pages were write protected, it was safe to emulate the writes in a non-atomic manner, since there could be no other writer, either in the guest or in the kernel. These days emulation takes place without holding the mmu spinlock, so the write could be preempted by an unshadowing event, which exposes the page to writes by the guest. This may cause corruption of guest page tables. Fix by using an atomic cmpxchg for these operations. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:39 +03:00
Avi Kivity	72016f3a42	KVM: MMU: Consolidate two guest pte reads in kvm_mmu_pte_write() kvm_mmu_pte_write() reads guest ptes in two different occasions, both to allow a 32-bit pae guest to update a pte with 4-byte writes. Consolidate these into a single read, which also allows us to consolidate another read from an invlpg speculating a gpte into the shadow page table. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:37 +03:00
Wei Yongjun	160d2f6c0c	KVM: x86: fix the error of ioctl KVM_IRQ_LINE if no irq chip If no irq chip in kernel, ioctl KVM_IRQ_LINE will return -EFAULT. But I see in other place such as KVM_[GET\|SET]IRQCHIP, -ENXIO is return. So this patch used -ENXIO instead of -EFAULT. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:31 +03:00
Wei Yongjun	ec68798c8f	KVM: x86: Use native_store_idt() instead of kvm_get_idt() This patch use generic linux function native_store_idt() instead of kvm_get_idt(), and also removed the useless function kvm_get_idt(). Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:28 +03:00
Avi Kivity	5c1c85d08d	KVM: Trace exception injection Often an exception can help point out where things start to go wrong. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:15:27 +03:00
Avi Kivity	5bfd8b5455	KVM: Move kvm_exit tracepoint rip reading inside tracepoint Reading rip is expensive on vmx, so move it inside the tracepoint so we only incur the cost if tracing is enabled. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:15:25 +03:00
Minchan Kim	d4f64b6cad	KVM: remove redundant initialization of page->private The prep_new_page() in page allocator calls set_page_private(page, 0). So we don't need to reinitialize private of page. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Cc: Avi Kivity<avi@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:15:24 +03:00
Xiao Guangrong	2ed152afc7	KVM: cleanup kvm trace This patch does: - no need call tracepoint_synchronize_unregister() when kvm module is unloaded since ftrace can handle it - cleanup ftrace's macro Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:15:22 +03:00
Gleb Natapov	835e6b8047	KVM: x86 emulator mark VMMCALL and LMSW as privileged LMSW is present in both group tables. It was marked privileged only in one of them. Intel analog of VMMCALL is already marked privileged. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:18 +03:00
Joerg Roedel	f71385383f	KVM: SVM: Ignore lower 12 bit of nested msrpm_pa These bits are ignored by the hardware too. Implement this for nested svm too. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:16 +03:00
Joerg Roedel	ce2ac085ff	KVM; SVM: Add correct handling of nested iopm This patch adds the correct handling of the nested io permission bitmap. Old behavior was to not lookup the port in the iopm but only reinject an io intercept to the guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:15 +03:00
Joerg Roedel	0d6b35378e	KVM: SVM: Use svm_msrpm_offset in nested_svm_exit_handled_msr There is a generic function now to calculate msrpm offsets. Use that function in nested_svm_exit_handled_msr() remove the duplicate logic (which had a bug anyway). Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:13 +03:00
Joerg Roedel	323c3d809b	KVM: SVM: Optimize nested svm msrpm merging This patch optimizes the way the msrpm of the host and the guest are merged. The old code merged the 2 msrpm pages completly. This code needed to touch 24kb of memory for that operation. The optimized variant this patch introduces merges only the parts where the host msrpm may contain zero bits. This reduces the amount of memory which is touched to 48 bytes. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:12 +03:00
Joerg Roedel	ac72a9b733	KVM: SVM: Introduce direct access msr list This patch introduces a list with all msrs a guest might have direct access to and changes the svm_vcpu_init_msrpm function to use this list. It also adds a check to set_msr_interception which triggers a warning if a developer changes a msr intercept that is not in the list. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:10 +03:00
Joerg Roedel	455716fa94	KVM: SVM: Move msrpm offset calculation to seperate function The algorithm to find the offset in the msrpm for a given msr is needed at other places too. Move that logic to its own function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:08 +03:00
Joerg Roedel	d24778265a	KVM: SVM: Return correct values in nested_svm_exit_handled_msr The nested_svm_exit_handled_msr() returned an bool which is a bug. I worked by accident because the exected integer return values match with the true and false values. This patch changes the return value to int and let the function return the correct values. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:07 +03:00
Andrea Gelmini	0fc5c3a54d	KVM: arch/x86/kvm/kvm_timer.h checkpatch cleanup arch/x86/kvm/kvm_timer.h:13: ERROR: code indent should use tabs where possible Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:14:42 +03:00
Jacob Pan	fea24e28c6	x86, mrst: add nop functions to x86_init mpparse functions Moorestown does not have BIOS provided MP tables, we can save some time by avoiding scaning of these tables. e.g. [ 0.000000] Scan SMP from c0000000 for 1024 bytes. [ 0.000000] Scan SMP from c009fc00 for 1024 bytes. [ 0.000000] Scan SMP from c00f0000 for 65536 bytes. [ 0.000000] Scan SMP from c00bfff0 for 1024 bytes. Searching EBDA with the base at 0x40E will also result in random pointer deferencing within 1MB. This can be a problem in Lincroft if the pointer hits VGA area and VGA mode is not enabled. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> LKML-Reference: <1273873281-17489-8-git-send-email-jacob.jun.pan@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-05-16 22:47:41 -07:00
Jacob Pan	e4af4268a3	x86, mrst, pci: return 0 for non-present pci bars Moorestown PCI code has special handling of devices with fixed BARs. In case of BAR sizing writes, we need to update the fake PCI MMCFG space with real size decode value. When a BAR is not present, we need to return 0 instead of ~0. ~0 will be treated as device error per bugzilla 12006. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> LKML-Reference: <1273873281-17489-2-git-send-email-jacob.jun.pan@linux.intel.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-05-16 22:45:36 -07:00
Don Zickus	cafcd80d21	lockup_detector: Cross arch compile fixes Combining the softlockup and hardlockup code causes watchdog.c to build even without the hardlockup detection support. So if an arch, that has the previous and the new nmi watchdog implementations cohabiting, wants to know if the generic one is in use, CONFIG_LOCKUP_DETECTOR is not a reliable check. We need to use CONFIG_HARDLOCKUP_DETECTOR instead. Fixes: kernel/built-in.o: In function `touch_nmi_watchdog': (.text+0x449bc): multiple definition of `touch_nmi_watchdog' arch/sparc/kernel/built-in.o:(.text+0x11b28): first defined here Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Don Zickus <dzickus@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> LKML-Reference: <20100514151121.GR15159@redhat.com> [ use CONFIG_HARDLOCKUP_DETECTOR instead of CONFIG_PERF_EVENTS_NMI] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-05-16 04:25:14 +02:00
Frederic Weisbecker	c01d432330	lockup_detector: Adapt CONFIG_PERF_EVENT_NMI to other archs CONFIG_PERF_EVENT_NMI is something that need to be enabled from the arch. This is fine on x86 as PERF_EVENTS is builtin but if other archs select it, they will need to handle the PERF_EVENTS dependency. Instead, handle the dependency in the generic layer: - archs need to tell what they support through HAVE_PERF_EVENTS_NMI - Enable magically PERF_EVENTS_NMI if we have PERF_EVENTS and HAVE_PERF_EVENTS_NMI. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Don Zickus <dzickus@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com>	2010-05-16 01:57:36 +02:00
Cyrill Gorcunov	1ff3d7d792	x86, perf: P4 PMU - fix counters management logic Jaswinder reported this #GP: \| \| Message from syslogd@ht at May 14 09:39:32 ... \| kernel:[ 314.908612] EIP: [<c100ccca>] \| x86_perf_event_set_period+0x19d/0x1b2 SS:ESP 0068:edac3d70 \| Ming has narrowed it down to a comparision issue between arguments with different sizes and signs. As result event index reached a wrong value which in turn led to a GP fault. At the same time it was found that p4_next_cntr has broken logic and should return the counter index only if it was not yet borrowed for another event. Reported-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com> Reported-by: Lin Ming <ming.m.lin@intel.com> Bisected-by: Lin Ming <ming.m.lin@intel.com> Tested-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100514190815.GG13509@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-15 08:38:55 +02:00
Linus Torvalds	ecbb458a48	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mrst: Don't blindly access extended config space	2010-05-14 21:28:23 -07:00
H. Peter Anvin	e9b1d5d0ff	x86, mrst: Don't blindly access extended config space Do not blindly access extended configuration space unless we actively know we're on a Moorestown platform. The fixed-size BAR capability lives in the extended configuration space, and thus is not applicable if the configuration space isn't appropriately sized. This fixes booting certain VMware configurations with CONFIG_MRST=y. Moorestown will add a fake PCI-X 266 capability to advertise the presence of extended configuration space. Reported-and-tested-by: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Jacob Pan <jacob.jun.pan@intel.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> LKML-Reference: <AANLkTiltKUa3TrKR1M51eGw8FLNoQJSLT0k0_K5X3-OJ@mail.gmail.com>	2010-05-14 13:55:57 -07:00
Linus Torvalds	ef0e9180d3	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments x86, k8: Fix build error when K8_NB is disabled x86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRs x86: Fix fake apicid to node mapping for numa emulation	2010-05-14 12:20:09 -07:00
Frank Arnold	7f284d3cc9	x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments When running a quest kernel on xen we get: BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 IP: [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df PGD 0 Oops: 0000 [#1] SMP last sysfs file: CPU 0 Modules linked in: Pid: 0, comm: swapper Tainted: G W 2.6.34-rc3 #1 /HVM domU RIP: 0010:[<ffffffff8142f2fb>] [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x 2ca/0x3df RSP: 0018:ffff880002203e08 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060 RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000 RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38 R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0 R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68 FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020) Stack: ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30 <0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70 <0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b Call Trace: <IRQ> [<ffffffff810d7ecb>] ? sync_supers_timer_fn+0x0/0x1c [<ffffffff81059140>] ? mod_timer+0x23/0x25 [<ffffffff810d7ec9>] ? arm_supers_timer+0x34/0x36 [<ffffffff81068b8b>] ? hrtimer_get_next_event+0xa7/0xc3 [<ffffffff81058e85>] ? get_next_timer_interrupt+0x19a/0x20d [<ffffffff8142fa23>] get_cpu_leaves+0x5c/0x232 [<ffffffff8106a7b1>] ? sched_clock_local+0x1c/0x82 [<ffffffff8106a9a0>] ? sched_clock_tick+0x75/0x7a [<ffffffff8107748c>] generic_smp_call_function_single_interrupt+0xae/0xd0 [<ffffffff8101f6ef>] smp_call_function_single_interrupt+0x18/0x27 [<ffffffff8100a773>] call_function_single_interrupt+0x13/0x20 <EOI> [<ffffffff8143c468>] ? notifier_call_chain+0x14/0x63 [<ffffffff810295c6>] ? native_safe_halt+0xc/0xd [<ffffffff810114eb>] ? default_idle+0x36/0x53 [<ffffffff81008c22>] cpu_idle+0xaa/0xe4 [<ffffffff81423a9a>] rest_init+0x7e/0x80 [<ffffffff81b10dd2>] start_kernel+0x40e/0x419 [<ffffffff81b102c8>] x86_64_start_reservations+0xb3/0xb7 [<ffffffff81b103c4>] x86_64_start_kernel+0xf8/0x107 Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b 00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb RIP [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df RSP <ffff880002203e08> CR2: 0000000000000038 ---[ end trace a7919e7f17c0a726 ]--- The L3 cache index disable feature of AMD CPUs has to be disabled if the kernel is running as guest on top of a hypervisor because northbridge devices are not available to the guest. Currently, this fixes a boot crash on top of Xen. In the future this will become an issue on KVM as well. Check if northbridge devices are present and do not enable the feature if there are none. [ hpa: backported to 2.6.34 ] Signed-off-by: Frank Arnold <frank.arnold@amd.com> LKML-Reference: <1271945222-5283-3-git-send-email-bp@amd64.org> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@kernel.org>	2010-05-14 11:53:01 -07:00
Borislav Petkov	ade029e2aa	x86, k8: Fix build error when K8_NB is disabled K8_NB depends on PCI and when the last is disabled (allnoconfig) we fail at the final linking stage due to missing exported num_k8_northbridges. Add a header stub for that. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100503183036.GJ26107@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@kernel.org>	2010-05-14 11:53:01 -07:00
Andreas Dilger	0ddc9324b1	add descriptive comment for TIF_MEMDIE task flag declaration. Signed-off-by: Andreas Dilger <adilger@dilger.ca> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-05-14 11:13:27 +02:00
Roland McGrath	9e56529227	x86: Use .cfi_sections for assembly code The newer assemblers support the .cfi_sections directive so we can put the CFI from .S files into the .debug_frame section that is preserved in unstripped vmlinux and in separate debuginfo, rather than the .eh_frame section that is now discarded by vmlinux.lds.S. Signed-off-by: Roland McGrath <roland@redhat.com> LKML-Reference: <20100514044303.A6FE7400BE@magilla.sf.frob.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-05-13 22:15:18 -07:00
Andreas Herrmann	f01487119d	x86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRs If host CPU is exposed to a guest the OSVW MSRs are not guaranteed to be present and a GP fault occurs. Thus checking the feature flag is essential. Cc: <stable@kernel.org> # .32.x .33.x Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100427101348.GC4489@alberich.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-05-13 16:21:20 -07:00
Ingo Molnar	5e85391b3b	x86, watchdog: Fix build error in hw_nmi.c On some configs the following build error triggers: arch/x86/kernel/apic/hw_nmi.c:35: error: 'apic' undeclared (first use in this function) arch/x86/kernel/apic/hw_nmi.c:35: error: (Each undeclared identifier is reported only once arch/x86/kernel/apic/hw_nmi.c:35: error: for each function it appears in.) Because asm/apic.h was only included implicitly. Include it explicitly. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> LKML-Reference: <1273713674-8434-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-13 09:12:39 +02:00
Cyrill Gorcunov	720019908f	x86, perf: P4 PMU -- use hash for p4_get_escr_idx() Linear search over all p4 MSRs should be fine if only we would not use it in events scheduling routine which is pretty time critical. Lets use hashes. It should speed scheduling up significantly. v2: Steven proposed to use more gentle approach than issue BUG on error, so we use WARN_ONCE now Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100512174242.GA5190@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-13 08:51:13 +02:00
Jan Kiszka	f8c5fae166	KVM: VMX: blocked-by-sti must not defer NMI injections As the processor may not consider GUEST_INTR_STATE_STI as a reason for blocking NMI, it could return immediately with EXIT_REASON_NMI_WINDOW when we asked for it. But as we consider this state as NMI-blocking, we can run into an endless loop. Resolve this by allowing NMI injection if just GUEST_INTR_STATE_STI is active (originally suggested by Gleb). Intel confirmed that this is safe, the processor will never complain about NMI injection in this state. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> KVM-Stable-Tag Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-13 01:31:37 -03:00
Dongxiao Xu	fe19c5a46b	KVM: x86: Call vcpu_load and vcpu_put in cpuid_update cpuid_update may operate VMCS, so vcpu_load() and vcpu_put() should be called to ensure correctness. Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-13 01:31:02 -03:00
Joerg Roedel	061e2fd168	KVM: SVM: Fix wrong intercept masks on 32 bit This patch makes KVM on 32 bit SVM working again by correcting the masks used for iret interception. With the wrong masks the upper 32 bits of the intercepts are masked out which leaves vmrun unintercepted. This is not legal on svm and the vmrun fails. Bug was introduced by commits `95ba827313` and `3cfc3092`. Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-13 01:24:08 -03:00
Don Zickus	10f9014912	x86: Cleanup hw_nmi.c cruft The design of the hardlockup watchdog has changed and cruft was left behind in the hw_nmi.c file. Just remove the code that isn't used anymore. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <1273266711-18706-7-git-send-email-dzickus@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-05-12 23:55:48 +02:00
Don Zickus	7cbb7e7fa4	x86: Move trigger_all_cpu_backtrace to its own die_notifier As part of the transition of the nmi watchdog to something more generic, the trigger_all_cpu_backtrace code is getting left behind. Put it in its own die_notifier so it can still be used. V2: - use arch_spin_locks Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <1273266711-18706-6-git-send-email-dzickus@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-05-12 23:55:47 +02:00
Don Zickus	58687acba5	lockup_detector: Combine nmi_watchdog and softlockup detector The new nmi_watchdog (which uses the perf event subsystem) is very similar in structure to the softlockup detector. Using Ingo's suggestion, I combined the two functionalities into one file: kernel/watchdog.c. Now both the nmi_watchdog (or hardlockup detector) and softlockup detector sit on top of the perf event subsystem, which is run every 60 seconds or so to see if there are any lockups. To detect hardlockups, cpus not responding to interrupts, I implemented an hrtimer that runs 5 times for every perf event overflow event. If that stops counting on a cpu, then the cpu is most likely in trouble. To detect softlockups, tasks not yielding to the scheduler, I used the previous kthread idea that now gets kicked every time the hrtimer fires. If the kthread isn't being scheduled neither is anyone else and the warning is printed to the console. I tested this on x86_64 and both the softlockup and hardlockup paths work. V2: - cleaned up the Kconfig and softlockup combination - surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI - seperated out the softlockup case from perf event subsystem - re-arranged the enabling/disabling nmi watchdog from proc space - added cpumasks for hardlockup failure cases - removed fallback to soft events if no PMU exists for hard events V3: - comment cleanups - drop support for older softlockup code - per_cpu cleanups - completely remove software clock base hardlockup detector - use per_cpu masking on hard/soft lockup detection - #ifdef cleanups - rename config option NMI_WATCHDOG to LOCKUP_DETECTOR - documentation additions V4: - documentation fixes - convert per_cpu to __get_cpu_var - powerpc compile fixes V5: - split apart warn flags for hard and soft lockups TODO: - figure out how to make an arch-agnostic clock2cycles call (if possible) to feed into perf events as a sample period [fweisbec: merged conflict patch] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-05-12 23:55:33 +02:00
Frederic Weisbecker	a9aa1d02de	Merge commit 'v2.6.34-rc7' into perf/nmi Merge reason: catch up with latest softlockup detector changes.	2010-05-12 23:20:33 +02:00
Matthew Garrett	b6dacf63e9	ACPI: Unconditionally set SCI_EN on resume The ACPI spec tells us that the firmware will reenable SCI_EN on resume. Reality disagrees in some cases. The ACPI spec tells us that the only way to set SCI_EN is via an SMM call. https://bugzilla.kernel.org/show_bug.cgi?id=13745 shows us that doing so may break machines. Tracing the ACPI calls made by Windows shows that it unconditionally sets SCI_EN on resume with a direct register write, and therefore the overwhelming probability is that everything is fine with this behaviour. Signed-off-by: Matthew Garrett <mjg@redhat.com> Tested-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com>	2010-05-12 01:12:18 -04:00
H. Peter Anvin	c9775b4cc5	x86, fpu: Use static_cpu_has() to implement use_xsave() use_xsave() is now just a special case of static_cpu_has(), so use static_cpu_has(). Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Avi Kivity <avi@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1273135546-29690-2-git-send-email-avi@redhat.com>	2010-05-11 17:49:54 -07:00
H. Peter Anvin	a3c8acd043	x86: Add new static_cpu_has() function using alternatives For CPU-feature-specific code that touches performance-critical paths, introduce a static patching version of [boot_]cpu_has(). This is run at alternatives time and is therefore not appropriate for most initialization code, but on the other hand initialization code is generally not performance critical. On gcc 4.5+ this uses the new "asm goto" feature. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Avi Kivity <avi@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1273135546-29690-2-git-send-email-avi@redhat.com>	2010-05-11 17:47:07 -07:00
Seth Heasley	33852cb03e	x86/PCI: irq and pci_ids patch for additional Intel Cougar Point DeviceIDs This patch adds additional LPC Controller DeviceIDs for the Intel Cougar Point PCH. The DeviceIDs are defined and referenced as a range of values, the same way Ibex Peak was implemented. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-05-11 12:01:40 -07:00
Thomas Gleixner	d19f61f098	x86/PCI: Convert pci_config_lock to raw_spinlock pci_config_lock must be a real spinlock in preempt-rt. Convert it to raw_spinlock. No change for !RT kernels. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-05-11 12:01:09 -07:00
Joerg Roedel	795e74f7a6	Merge branch 'iommu/largepages' into amd-iommu/2.6.35 Conflicts: arch/x86/kernel/amd_iommu.c	2010-05-11 17:40:57 +02:00
Joerg Roedel	a523572596	x86/amd-iommu: Add amd_iommu=off command line option This patch adds a command line option to tell the AMD IOMMU driver to not initialize any IOMMU it finds. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2010-05-11 17:12:33 +02:00
Masami Hiramatsu	829e924585	kprobes/x86: Fix removed int3 checking order Fix kprobe/x86 to check removed int3 when failing to get kprobe from hlist. Since we have a time window between checking int3 exists on probed address and getting kprobe on that address, we can have following scenario: ------- CPU1 CPU2 hit int3 check int3 exists remove int3 remove kprobe from hlist get kprobe from hlist no kprobe->OOPS! ------- This patch moves int3 checking if there is no kprobe on that address for fixing this problem as follows: ------ CPU1 CPU2 hit int3 remove int3 remove kprobe from hlist get kprobe from hlist no kprobe->check int3 exists ->rollback&retry ------ Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Dave Anderson <anderson@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100427223348.2322.9112.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-11 09:14:25 +02:00
H. Peter Anvin	dce8bf4e11	x86, fpu: Use the proper asm constraint in use_xsave() The proper constraint for a receiving 8-bit variable is "=qm", not "=g" which equals "=rim"; even though the "i" will never match, bugs can and do happen due to the difference between "q" and "r". Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Avi Kivity <avi@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1273135546-29690-2-git-send-email-avi@redhat.com>	2010-05-10 13:41:41 -07:00
H. Peter Anvin	c3f8978ea3	x86, fpu: Unbreak FPU emulation Unbreak FPU emulation, broken by checkin `8660328332`: x86: Introduce 'struct fpu' and related API Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Avi Kivity <avi@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1273135546-29690-3-git-send-email-avi@redhat.com>	2010-05-10 13:37:16 -07:00
Eric Anholt	34dc4d4423	Merge remote branch 'origin/master' into drm-intel-next Conflicts: drivers/gpu/drm/i915/i915_dma.c drivers/gpu/drm/i915/i915_drv.h drivers/gpu/drm/radeon/r300.c The BSD ringbuffer support that is landing in this branch significantly conflicts with the Ironlake PIPE_CONTROL fix on master, and requires it to be tested successfully anyway.	2010-05-10 13:36:52 -07:00
Avi Kivity	8660328332	x86: Introduce 'struct fpu' and related API Currently all fpu state access is through tsk->thread.xstate. Since we wish to generalize fpu access to non-task contexts, wrap the state in a new 'struct fpu' and convert existing access to use an fpu API. Signal frame handlers are not converted to the API since they will remain task context only things. Signed-off-by: Avi Kivity <avi@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1273135546-29690-3-git-send-email-avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-05-10 10:48:55 -07:00
Avi Kivity	c9ad488289	x86: Eliminate TS_XSAVE The fpu code currently uses current->thread_info->status & TS_XSAVE as a way to distinguish between XSAVE capable processors and older processors. The decision is not really task specific; instead we use the task status to avoid a global memory reference - the value should be the same across all threads. Eliminate this tie-in into the task structure by using an alternative instruction keyed off the XSAVE cpu feature; this results in shorter and faster code, without introducing a global memory reference. [ hpa: in the future, this probably should use an asm jmp ] Signed-off-by: Avi Kivity <avi@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1273135546-29690-2-git-send-email-avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-05-10 10:39:33 -07:00
Stephen Rothwell	4f7b9e7cbe	i7core_edac: do not export static functions Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:32 -03:00
Mauro Carvalho Chehab	d1fd4fb69e	i7core_edac: Add a code to probe Xeon 55xx bus This code changes the detection procedure of i7core_edac. Instead of directly probing for MC registers, it probes for another register found on Nehalem. If found, it tries to pick the first MC PCI BUS. This should work fine with Xeon 35xx, but, on Xeon 55xx, this is at bus 254 and 255 that are not properly detected by the non-legacy PCI methods. The new detection code scans specifically at buses 254 and 255 for the Xeon 55xx devices. This code has not tested yet. After working, a change at the code will be needed, since the i7core is not yet ready for working with 2 sets of MC. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:51 -03:00
Aristeu Rozanski	5707b24a50	pci: Add a probing code that seeks for an specific bus This patch adds a probing code that seeks for an specific pci bus. It still needs testing, but it is hoped that this will help to identify the memory controller with Xeon 55xx series. Signed-off-by: Aristeu Sergio <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:51 -03:00
Mauro Carvalho Chehab	696e409dbd	edac_mce: Add an interface driver to report mce errors via edac edac_mce module is an interface module that gets mcelog data and forwards to any registered edac module that expects to receive data via mce. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:49 -03:00
John Villalovos	45c34e05c4	Oprofile: Change CPUIDS from decimal to hex, and add some comments Back when the patch was submitted for "Add Xeon 7500 series support to oprofile", Robert Richter had asked for a followon patch that converted all the CPU ID values to hex. I have done that here for the "i386/core_i7" and "i386/atom" class processors in the ppro_init() function and also added some comments on where to find documentation on the Intel processors. Signed-off-by: John L. Villalovos <john.l.villalovos@intel.com> Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-05-10 15:08:33 +02:00
Ingo Molnar	cc49b092d3	Merge branch 'core' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into oprofile	2010-05-10 13:13:40 +02:00
Ingo Molnar	7c224a03a7	Merge commit 'v2.6.34-rc7' into oprofile Merge reason: Update to Linus's latest -rc. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-10 13:12:29 +02:00
H. Peter Anvin	3998d09535	x86, hypervisor: add missing <linux/module.h> EXPORT_SYMBOL() needs <linux/module.h> to be included; fixes modular builds of the VMware balloon driver, and any future modular drivers which depends on the hypervisor. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Greg KH <greg@kroah.com> Cc: Hank Janssen <hjanssen@microsoft.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Ky Srinivasan <ksrinivasan@novell.com> Cc: Dmitry Torokhov <dtor@vmware.com> LKML-Reference: <4BE49778.6060800@zytor.com>	2010-05-09 22:46:54 -07:00
H. Peter Anvin	96f6e775b5	x86, hypervisor: Export the x86_hyper* symbols Export x86_hyper and the related specific structures, allowing for hypervisor identification by modules. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Greg KH <greg@kroah.com> Cc: Hank Janssen <hjanssen@microsoft.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Ky Srinivasan <ksrinivasan@novell.com> Cc: Dmitry Torokhov <dtor@vmware.com> LKML-Reference: <4BE49778.6060800@zytor.com>	2010-05-09 01:10:34 -07:00
H. Peter Anvin	d7be0ce6af	Merge commit 'v2.6.34-rc6' into x86/cpu	2010-05-08 14:59:58 -07:00
Cyrill Gorcunov	c7993165ef	x86, perf: P4 PMU -- check for proper event index in RAW events RAW events are special and we should be ready for user passing in insane event index values. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100508112717.315897547@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-08 14:17:53 +02:00
Cyrill Gorcunov	3f51b7119d	x86, perf: P4 PMU -- Get rid of redundant check for array index The caller already has done such a check. And it was wrong anyway, it had to be '>=' rather than '>' Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100508112717.130386882@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-08 14:17:53 +02:00
Cyrill Gorcunov	137351e0fe	x86, perf: P4 PMU -- protect sensible procedures from preemption Steven reported: \| \| I'm getting: \| \| Pid: 3477, comm: perf Not tainted 2.6.34-rc6 #2727 \| Call Trace: \| [<ffffffff811c7565>] debug_smp_processor_id+0xd5/0xf0 \| [<ffffffff81019874>] p4_hw_config+0x2b/0x15c \| [<ffffffff8107acbc>] ? trace_hardirqs_on_caller+0x12b/0x14f \| [<ffffffff81019143>] hw_perf_event_init+0x468/0x7be \| [<ffffffff810782fd>] ? debug_mutex_init+0x31/0x3c \| [<ffffffff810c68b2>] T.850+0x273/0x42e \| [<ffffffff810c6cab>] sys_perf_event_open+0x23e/0x3f1 \| [<ffffffff81009e6a>] ? sysret_check+0x2e/0x69 \| [<ffffffff81009e32>] system_call_fastpath+0x16/0x1b \| \| When running perf record in latest tip/perf/core \| Due to the fact that p4 counters are shared between HT threads we synthetically divide the whole set of counters into two non-intersected subsets. And while we're "borrowing" counters from these subsets we should not be preempted (well, strictly speaking in p4_hw_config we just pre-set reference to the subset which allow to save some cycles in schedule routine if it happens on the same cpu). So use get_cpu/put_cpu pair. Also p4_pmu_schedule_events should use smp_processor_id rather than raw_ version. This allow us to catch up preemption issue (if there will ever be). Reported-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100508112716.963478928@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-08 14:17:53 +02:00
Cyrill Gorcunov	de902d967f	x86, perf: P4 PMU -- configure predefined events If an event is not RAW we should not exit p4_hw_config early but call x86_setup_perfctr as well. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Robert Richter <robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-08 14:17:52 +02:00
H. Peter Anvin	e08cae4181	x86: Clean up the hypervisor layer Clean up the hypervisor layer and the hypervisor drivers, using an ops structure instead of an enumeration with if statements. The identity of the hypervisor, if needed, can be tested by testing the pointer value in x86_hyper. The MS-HyperV private state is moved into a normal global variable (it's per-system state, not per-CPU state). Being a normal bss variable, it will be left at all zero on non-HyperV platforms, and so can generally be tested for HyperV-specific features without additional qualification. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Greg KH <greg@kroah.com> Cc: Hank Janssen <hjanssen@microsoft.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Ky Srinivasan <ksrinivasan@novell.com> LKML-Reference: <4BE49778.6060800@zytor.com>	2010-05-07 17:13:04 -07:00
Greg Kroah-Hartman	9fa0231742	x86, HyperV: fix up the license to mshyperv.c This should have been GPLv2 only, we cut and pasted from the wrong file originally, sorry. Also removed some unneeded boilerplate license code, we all know where to find the GPLv2, and that there's no warranty as that is implicit from the license. Cc: Ky Srinivasan <ksrinivasan@novell.com> Cc: Hank Janssen <hjanssen@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> LKML-Reference: <20100507235541.GA15448@kroah.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-05-07 17:00:25 -07:00
Jacob Pan	2b107d9363	x86: Avoid check hlt for newer cpus Check hlt instruction was targeted for some older CPUs. It is an expensive operation in that it takes 4 ticks to break out the check. We can avoid such check completely for newer x86 cpus (family >= 5). [ hpa: corrected family > 5 to family >= 5 ] Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> LKML-Reference: <1273269585-14346-1-git-send-email-jacob.jun.pan@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-05-07 15:31:17 -07:00
Lin Ming	4d1c52b02d	perf, x86: implement group scheduling transactional APIs Convert to the transactional PMU API and remove the duplication of group_sched_in(). Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1272002172.5707.61.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-07 11:31:03 +02:00
Peter Zijlstra	ab608344bc	perf, x86: Improve the PEBS ABI Rename perf_event_attr::precise to perf_event_attr::precise_ip and widen it to 2 bits. This new field describes the required precision of the PERF_SAMPLE_IP field: 0 - SAMPLE_IP can have arbitrary skid 1 - SAMPLE_IP must have constant skid 2 - SAMPLE_IP requested to have 0 skid 3 - SAMPLE_IP must have 0 skid And modify the Intel PEBS code accordingly. The PEBS implementation now supports up to precise_ip == 2, where we perform the IP fixup. Also s/PERF_RECORD_MISC_EXACT/&_IP/ to clarify its meaning, this bit should be set for each PERF_SAMPLE_IP field known to match the actual instruction triggering the event. This new scheme allows for a PEBS mode that uses the buffer for more than a single event. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-07 11:31:02 +02:00
Peter Zijlstra	2b0b5c6fe9	perf, x86: Consolidate some code repetition Remove some duplicated logic. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-07 11:31:02 +02:00
Peter Zijlstra	1e9a6d8d44	perf, x86: Remove PEBS SAMPLE_RAW support Its broken, we really should get PERF_SAMPLE_REGS sorted. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-07 11:31:01 +02:00
Robert Richter	a1f2b70a94	perf, x86: Use weight instead of cmask in for_each_event_constraint() There may exist constraints with a cmask set to zero. In this case for_each_event_constraint() will not work properly. Now weight is used instead of the cmask for loop exit detection. Weight is always a value other than zero since the default contains the HWEIGHT from the counter mask and in other cases a value of zero does not fit too. This is in preparation of ibs event constraints that wont have a cmask. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1271190201-25705-7-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-07 11:31:01 +02:00
Robert Richter	31fa58af57	perf, x86: Pass enable bit mask to __x86_pmu_enable_event() To reuse this function for events with different enable bit masks, this mask is part of the function's argument list now. The function will be used later to control ibs events too. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1271190201-25705-6-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-07 11:31:00 +02:00
Robert Richter	9d0fcba67e	perf, x86: Call x86_setup_perfctr() from .hw_config() The perfctr setup calls are in the corresponding .hw_config() functions now. This makes it possible to introduce config functions for other pmu events that are not perfctr specific. Also, all of a sudden the code looks much nicer. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1271190201-25705-4-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-07 11:31:00 +02:00

... 6 7 8 9 10 ...

11184 Commits