linux

Author	SHA1	Message	Date
Matthew Garrett	660e34cebf	x86: Reorder reboot method preferences We have a never ending stream of 'reboot quirks' for new boxes that will not reboot properly under Linux (they will hang on reboot). The reason is widespread 'Windows compatible' assumption of modern x86 hardware, which expects the following reboot sequence: - hitting the ACPI reboot vector (if available) - trying the keyboard controller - hitting the ACPI reboot vector again - then giving the keyboard controller one last go This sequence expectation gets more and more embedded in modern hardware, which often lacks a keyboard controller and may even lock up if the legacy io ports are hit - and which hardware is often not tested with Linux during development. The end result is that reboot works under Windows-alike OSs but not under Linux. Rework our reboot process to meet this hardware externality a little better and match this assumption of newer x86 hardware. In addition to the ACPI,kbd,ACPI,kbd sequence we'll still fall through to attempting a legacy triple fault if nothing else works - and keep trying that and the kbd reset. Signed-off-by: Matthew Garrett <mjg@redhat.com> [ this commit will also save special casing Oaktrail boards ] Acked-by: Alan Cox <alan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Leann Ogasawara <leann.ogasawara@canonical.com> Cc: Dave Jones <davej@redhat.com> Cc: Len Brown <len.brown@intel.com> LKML-Reference: <1301939705-2404-1-git-send-email-mjg@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-06 10:36:50 +02:00
Konrad Rzeszutek Wilk	d88885d092	xen/debug: Don't be so verbose with WARN on 1-1 mapping errors. There are valid situations in which this error is not a warning. Mainly when QEMU maps a guest memory and uses the VM_IO flag to set the MFNs. For right now make the WARN be WARN_ONCE. In the future we will: 1). Remove the VM_IO code handling.. 2). .. which will also remove this debug facility. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-04-04 14:48:20 -04:00
Jason Baron	ef64789413	jump label: Add _ASM_ALIGN for x86 and x86_64 The linker should not be adding holes to word size aligned pointers, but out of paranoia we are explicitly specifying that alignment. I have not seen any holes in the jump label section in practice. Signed-off-by: Jason Baron <jbaron@redhat.com> LKML-Reference: <e119fbd060c9452c56063ea6148ba1070e7434cc.1300299760.git.jbaron@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-04-04 13:42:51 -04:00
Jason Baron	d430d3d7e6	jump label: Introduce static_branch() interface Introduce: static __always_inline bool static_branch(struct jump_label_key key); instead of the old JUMP_LABEL(key, label) macro. In this way, jump labels become really easy to use: Define: struct jump_label_key jump_key; Can be used as: if (static_branch(&jump_key)) do unlikely code enable/disale via: jump_label_inc(&jump_key); jump_label_dec(&jump_key); that's it! For the jump labels disabled case, the static_branch() becomes an atomic_read(), and jump_label_inc()/dec() are simply atomic_inc(), atomic_dec() operations. We show testing results for this change below. Thanks to H. Peter Anvin for suggesting the 'static_branch()' construct. Since we now require a 'struct jump_label_key key', we can store a pointer into the jump table addresses. In this way, we can enable/disable jump labels, in basically constant time. This change allows us to completely remove the previous hashtable scheme. Thanks to Peter Zijlstra for this re-write. Testing: I ran a series of 'tbench 20' runs 5 times (with reboots) for 3 configurations, where tracepoints were disabled. jump label configured in avg: 815.6 jump label not configured in (using atomic reads) avg: 800.1 jump label not configured in (regular reads) avg: 803.4 Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20110316212947.GA8792@redhat.com> Signed-off-by: Jason Baron <jbaron@redhat.com> Suggested-by: H. Peter Anvin <hpa@linux.intel.com> Tested-by: David Daney <ddaney@caviumnetworks.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-04-04 12:48:08 -04:00
Linus Torvalds	d7c764c4c7	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV: Fix kdump reboot x86, amd-nb: Rename CPU PCI id define for F4 sound: Add delay.h to sound/soc/codecs/sn95031.c x86, mtrr, pat: Fix one cpu getting out of sync during resume x86, microcode: Unregister syscore_ops after microcode unloaded x86: Stop including <linux/delay.h> in two asm header files	2011-04-04 08:37:45 -07:00
Linus Torvalds	4da7e90e65	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix task_struct reference leak perf: Fix task context scheduling perf: mmap 512 kiB by default perf: Rebase max unprivileged mlock threshold on top of page size perf tools: Fix NO_NEWT=1 python build error perf symbols: Properly align symbol_conf.priv_size perf tools: Emit clearer message for sys_perf_event_open ENOENT return perf tools: Fixup exit path when not able to open events perf symbols: Fix vsyscall symbol lookup oprofile, x86: Allow setting EDGE/INV/CMASK for counter events	2011-04-04 08:36:40 -07:00
Linus Torvalds	fb9a7d76da	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: create new rcu_access_index() and use in mce WARN_ON_SMP(): Add comment to explain ({0;})	2011-04-04 08:36:15 -07:00
Rakib Mullick	64d21fc194	x86, mpparse: Put check_slot() into .init section check_slot() is only called from replace_intsrc_all() - which is in the .init section. So, put check_slot into the .init section as well, so it can be freed after system boot. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> LKML-Reference: <AANLkTing52ntzRcHkODCWDKOfRF=0uhXw5-cCUhx6M54@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-04 17:17:19 +02:00
Tejun Heo	765af22da8	x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change Commit `d8fc3afc49` (x86, NUMA: Move *_numa_init() invocations into initmem_init()) moved acpi_numa_init() call into NUMA initmem_init() but forgot to update 32bit NUMA init breaking ACPI NUMA configuration for 32bit. acpi_numa_init() call was later moved again to srat_64.c. Match it by adding the call to get_memcfg_from_srat() in srat_32.c. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <20110404100645.GE1420@mtj.dyndns.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-04 16:56:33 +02:00
Thomas Gleixner	43a6246f9c	x86: visws: Fixup irq overhaul fallout Reported-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-04-04 16:51:15 +02:00
Ingo Molnar	9402efaa96	Merge branch 'x86-mm' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into x86/mm	2011-04-04 16:39:20 +02:00
Florian Mickler	711b8c87a5	x86-64, NUMA: Remove unused variable In case !CONFIG_ACPI_NUMA and !CONFIG_AMD_NUMA gcc emits a warning about the unused variable ret. As that variable is in fact not needed I choose to remove it. Signed-off-by: Florian Mickler <florian@mickler.org> LKML-Reference: <1301843624-22364-1-git-send-email-florian@mickler.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-04-04 01:21:00 +02:00
Paul E. McKenney	a4dd99250d	rcu: create new rcu_access_index() and use in mce The MCE subsystem needs to sample an RCU-protected index outside of any protection for that index. If this was a pointer, we would use rcu_access_pointer(), but there is no corresponding rcu_access_index(). This commit therefore creates an rcu_access_index() and applies it to MCE. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Zdenek Kabelac <zkabelac@redhat.com>	2011-04-01 07:27:31 -07:00
Tejun Heo	3b16651f80	x86: Clean up memory model related configs in arch/x86/Kconfig * Remove bogus dependency on ARCH_SELECT_MEMORY_MODEL from ARCH_FLATMEM_ENABLE. ENABLE configs don't interfere with SELECT_MEMORY_MODEL. They just need to indicate whether the specific memory model is supported. * Relocate HAVE_ARCH_ALLOC_REMAP, ARCH_PROC_KCORE_TEXT and ARCH_SPARSEMEM_DEFAULT so that memory model related configs are together in consistent order. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de>	2011-04-01 11:15:12 +02:00
Tejun Heo	052936080c	x86-64, NUMA: Remove custom phys_to_nid() implementation phys_to_nid() maps physical address to NUMA node id. This is implemented by building perfect hash in compute_hash_shift() during initialization. However, with SPARSE memory model, the nid is encoded in page flags. The perfect hash implementation was for DISCONTIG memory model which got removed years ago by `b263295dbf` (x86: 64-bit, make sparsemem vmemmap the only memory model). So, the perfect hash ends up being used only during initialization when the core SPARSE code already provides perfectly acceptable generic early_pfn_to_nid() implementation. Drop phys_to_nid() and use the generic ealry_pfn_to_nid() instead. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de>	2011-04-01 11:15:12 +02:00
Cliff Wickman	818987e9a1	x86, UV: Fix kdump reboot After a crash dump on an SGI Altix UV system the crash kernel fails to cause a reboot. EFI mode is disabled in the kdump kernel, so only the reboot_type of BOOT_ACPI works. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: rja@sgi.com LKML-Reference: <E1Q5Iuo-00013b-UK@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-31 18:44:03 +02:00
Borislav Petkov	cb6c8520f6	x86, amd-nb: Rename CPU PCI id define for F4 With increasing number of PCI function ids, add the PCI function id in the define name instead of its symbolic name in the BKDG for more clarity. This renames function 4 define. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> LKML-Reference: <20110330183447.GA3668@aftab> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-31 08:51:38 +02:00
Ingo Molnar	9f644c4ba8	Merge commit 'v2.6.39-rc1' into perf/urgent Merge reason: use the post-merge-window tree. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-30 09:07:43 +02:00
Ingo Molnar	ae18cbfe9f	Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core	2011-03-30 09:06:51 +02:00
Suresh Siddha	84ac7cdbdd	x86, mtrr, pat: Fix one cpu getting out of sync during resume On laptops with core i5/i7, there were reports that after resume graphics workloads were performing poorly on a specific AP, while the other cpu's were ok. This was observed on a 32bit kernel specifically. Debug showed that the PAT init was not happening on that AP during resume and hence it contributing to the poor workload performance on that cpu. On this system, resume flow looked like this: 1. BP starts the resume sequence and we reinit BP's MTRR's/PAT early on using mtrr_bp_restore() 2. Resume sequence brings all AP's online 3. Resume sequence now kicks off the MTRR reinit on all the AP's. 4. For some reason, between point 2 and 3, we moved from BP to one of the AP's. My guess is that printk() during resume sequence is contributing to this. We don't see similar behavior with the 64bit kernel but there is no guarantee that at this point the remaining resume sequence (after AP's bringup) has to happen on BP. 5. set_mtrr() was assuming that we are still on BP and skipped the MTRR/PAT init on that cpu (because of 1 above) 6. But we were on an AP and this led to not reprogramming PAT on this cpu leading to bad performance. Fix this by doing unconditional mtrr_if->set_all() in set_mtrr() during MTRR/PAT init. This might be unnecessary if we are still running on BP. But it is of no harm and will guarantee that after resume, all the cpu's will be in sync with respect to the MTRR/PAT registers. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1301438292-28370-1-git-send-email-eric@anholt.net> Signed-off-by: Eric Anholt <eric@anholt.net> Tested-by: Keith Packard <keithp@keithp.com> Cc: stable@kernel.org [v2.6.32+] Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-03-29 16:17:42 -07:00
Thomas Gleixner	86cc8dfc21	x86: apb_timer: Fixup genirq fallout The lonely user of the internal interface was not in the coccinelle script. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-30 00:13:30 +02:00
Linus Torvalds	90f1e7481e	Merge branch 'stable/bug-fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug-fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: Use new irq_move functions xen: Convert genirq namespace xen: fix p2m section mismatches xen/p2m: Allocate p2m tracking pages on override xen-gntdev: unlock on error path in gntdev_mmap() xen-gntdev: return -EFAULT on copy_to_user failure	2011-03-29 11:36:52 -07:00
Randy Dunlap	b83c6e55ac	xen: fix p2m section mismatches Fix section mismatch warnings: set_phys_range_identity() is called by __init xen_set_identity(), so also mark set_phys_range_identity() as __init. then: __early_alloc_p2m() is called set_phys_range_identity(), so also mark __early_alloc_p2m() as __init. WARNING: arch/x86/built-in.o(.text+0x7856): Section mismatch in reference from the function __early_alloc_p2m() to the function .init.text:extend_brk() The function __early_alloc_p2m() references the function __init extend_brk(). This is often because __early_alloc_p2m lacks a __init annotation or the annotation of extend_brk is wrong. WARNING: arch/x86/built-in.o(.text+0x7967): Section mismatch in reference from the function set_phys_range_identity() to the function .init.text:extend_brk() The function set_phys_range_identity() references the function __init extend_brk(). This is often because set_phys_range_identity lacks a __init annotation or the annotation of extend_brk is wrong. [v2: Per Stephen Hemming recommonedation made __early_alloc_p2m static] Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-29 10:01:03 -04:00
Xiaotian Feng	4ac5fc6a3e	x86, microcode: Unregister syscore_ops after microcode unloaded Currently, microcode doesn't unregister syscore_ops after it's unloaded. So if we modprobe then rmmod microcode, the stale microcode syscore_ops info will stay on syscore_ops_list. Later when we're trying to reboot/halt/shutdown the machine, kernel will panic on syscore_shutdown(). With the patch applied, I can reboot/halt/shutdown my machine successfully. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Rafael J. Wysocki <rjw@sisk.pl> LKML-Reference: <1301387672-23661-1-git-send-email-dfeng@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-29 11:12:04 +02:00
Christoph Lameter	fe5042138b	x86: Use this_cpu_has for thermal_interrupt current cpu It is more effective to use a segment prefix instead of calculating the address of the current cpu area amd then testing flags. Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-03-29 10:18:30 +02:00
Christoph Lameter	349c004e3d	x86: A fast way to check capabilities of the current cpu Add this_cpu_has() which determines if the current cpu has a certain ability using a segment prefix and a bit test operation. For that we need to add bit operations to x86s percpu.h. Many uses of cpu_has use a pointer passed to a function to determine the current flags. That is no longer necessary after this patch. However, this patch only converts the straightforward cases where cpu_has is used with this_cpu_ptr. The rest is work for later. -tj: Rolled up patch to add x86_ prefix and use percpu_read() instead of percpu_read_stable(). Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-03-29 10:18:30 +02:00
Jean Delvare	ca444564a9	x86: Stop including <linux/delay.h> in two asm header files Stop including <linux/delay.h> in x86 header files which don't need it. This will let the compiler complain when this header is not included by source files when it should, so that contributors can fix the problem before building on other architectures starts to fail. Credits go to Geert for the idea. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: James E.J. Bottomley <James.Bottomley@suse.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> LKML-Reference: <20110325152014.297890ec@endymion.delvare> [ this also fixes an upstream build bug in drivers/media/rc/ite-cir.c ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-29 09:37:42 +02:00
Cyrill Gorcunov	1d321881af	perf, x86: P4 PMU - clean up the code a bit No change on the functional level, just align the table properly. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <4D8FA213.5050108@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-29 09:36:33 +02:00
Ingo Molnar	1dfd7b494b	Merge branch 'core' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent	2011-03-29 09:32:28 +02:00
Linus Torvalds	036a98263a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: aesni-intel - fixed problem with packets that are not multiple of 64bytes	2011-03-28 13:07:49 -07:00
Linus Torvalds	551b0bda46	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: mfd: Clean up max8997 IRQ namespace mfd: Fold irq_set_chip/irq_set_handler mfd: Cleanup irq namespace mfd: twl6030: Cleanup interrupt handling mfd: twl4030: Cleanup interrupt handling mfd: mx8925: Remove irq_desc leftovers mfd: htc-i2cpld: Cleanup interrupt handling mfd: htc-egpio: Cleanup interrupt handling mfd: ezx-pcap: Remvove open coded irq handling mfd: 88pm860x: Remove unused irq_desc leftovers mfd: asic3: Cleanup irq handling mfd: Select MFD_CORE if TPS6105X driver is configured mfd: Add MODULE_DEVICE_TABLE to rdc321x-southbridge mfd: Add MAX8997/8966 IRQ control mfd: Constify i2c_device_id tables mfd: OLPC: Clean up names to match what OLPC actually uses mfd: Add mfd_clone_cell(), convert cs5535-mfd/olpc-xo1 to it	2011-03-27 20:07:01 -07:00
Christoph Lameter	d7c3f8cee8	percpu: Omit segment prefix in the UP case for cmpxchg_double Omit the segment prefix in the UP case. GS is not used then and we will generate segfaults if cmpxchg16b is used otherwise. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-27 19:25:36 -07:00
Tadeusz Struk	60af520cf2	crypto: aesni-intel - fixed problem with packets that are not multiple of 64bytes This patch fixes problem with packets that are not multiple of 64bytes. Signed-off-by: Adrian Hoban <adrian.hoban@intel.com> Signed-off-by: Aidan O'Mahony <aidan.o.mahony@intel.com> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2011-03-27 10:29:39 +08:00
Daniel Drake	adfa4bd4a8	mfd: OLPC: Clean up names to match what OLPC actually uses The cs5535-pms cell doesn't actually need to be cloned, so we can drop that and simply have the olpc-xo1.c driver use "cs5535-pms" directly. Also, rename the cs5535-acpi clones to what we actually use for the (currently out-of-tree) SCI driver. In the process, that fixes a subtle bug in olpc-xo1.c which broke powerdown on XO-1s.. olpc-xo1-ac-acpi was a typo, not something that actually existed. Signed-off-by: Daniel Drake <dsd@laptop.org> Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2011-03-27 00:09:31 +01:00
Andres Salomon	fa1df69168	mfd: Add mfd_clone_cell(), convert cs5535-mfd/olpc-xo1 to it Replace mfd_shared_platform_driver_register with mfd_clone_cell. The former was called by an mfd client, and registered both a platform driver and device. The latter is called by an mfd driver, and registers only a platform device. The downside of this is that mfd drivers need to be modified whenever new clients are added that share a cell; the upside is that it fits Linux's driver model better. It's also simpler. This also converts cs5535-mfd/olpc-xo1 from the old API. cs5535-mfd now creates the olpc-xo1-{acpi,pms} devices, while olpc-xo1 binds to them via platform drivers. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2011-03-27 00:09:30 +01:00
Linus Torvalds	16c29dafcc	Merge branch 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: Introduce ARCH_NO_SYSDEV_OPS config option (v2) cpufreq: Use syscore_ops for boot CPU suspend/resume (v2) KVM: Use syscore_ops instead of sysdev class and sysdev PCI / Intel IOMMU: Use syscore_ops instead of sysdev class and sysdev timekeeping: Use syscore_ops instead of sysdev class and sysdev x86: Use syscore_ops instead of sysdev classes and sysdevs	2011-03-25 21:07:59 -07:00
Linus Torvalds	95e14ed7fc	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kdb: add usage string of 'per_cpu' command kgdb,x86_64: fix compile warning found with sparse kdb: code cleanup to use macro instead of value kgdboc,kgdbts: strlen() doesn't count the terminator	2011-03-25 21:04:56 -07:00
Linus Torvalds	2a20b02c05	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue perf, x86: P4 PMU - Read proper MSR register to catch unflagged overflows perf symbols: Look at .dynsym again if .symtab not found perf build-id: Add quirk to deal with perf.data file format breakage perf session: Pass evsel in event_ops->sample() perf: Better fit max unprivileged mlock pages for tools needs perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx() perf top: Fix uninitialized 'counter' variable tracing: Fix set_ftrace_filter probe function display perf, x86: Fix Intel fixed counters base initialization	2011-03-25 17:53:09 -07:00
Linus Torvalds	94df491c4a	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Fix WARN_ON() test for UP WARN_ON_SMP(): Allow use in if() statements on UP x86, dumpstack: Use %pB format specifier for stack trace vsprintf: Introduce %pB format specifier lockdep: Remove unused 'factor' variable from lockdep_stats_show()	2011-03-25 17:52:22 -07:00
Linus Torvalds	26ff6801f7	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: DT: Cleanup namespace and call irq_set_irq_type() unconditional x86: DT: Fix return condition in irq_create_of_mapping() x86, mpparse: Move check_slot into CONFIG_X86_IO_APIC context	2011-03-25 17:51:51 -07:00
Jason Wessel	21431c2900	kgdb,x86_64: fix compile warning found with sparse Fix sparse warning: arch/x86/kernel/kgdb.c:123:9: warning: switch with no cases Reported-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2011-03-25 16:37:31 -05:00
Ingo Molnar	45daae575e	perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue Eric Dumazet reported that hardware PMU events do not work on his system, due to the BIOS corrupting PMU state: Performance Events: PEBS fmt0+, Core2 events, Broken BIOS detected, using software events only. [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 43003c) Linus suggested that we continue in the face of such BIOS-induced CPU state corruption: http://lkml.org/lkml/2011/3/24/608 Such BIOSes will have to be fixed - Linux developers rely on a working and fully capable PMU and the BIOS interfering with the CPU's PMU state is simply not acceptable. So this patch changes perf to continue when it detects such BIOS interaction, some hardware events may be unreliable due to the BIOS writing and re-writing them - there's not much the kernel can do about that but to detect the corruption and report it. Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-25 11:23:41 +01:00
Thomas Gleixner	07611dbda5	x86: DT: Cleanup namespace and call irq_set_irq_type() unconditional That call escaped the name space cleanup. Fix it up. We really want to call there. The chip might have changed since the irq was setup initially. So let the core code and the chip decide what to do. The status is just an unreliable snapshot. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2011-03-24 23:17:56 +01:00
Thomas Gleixner	00a30b254b	x86: DT: Fix return condition in irq_create_of_mapping() The xlate() function returns 0 or a negative error code. Returning the error code blindly will be seen as an huge irq number by the calling function because irq_create_of_mapping() returns an unsigned value. Return 0 (NO_IRQ) as required. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2011-03-24 23:17:56 +01:00
Don Zickus	242214f9c1	perf, x86: P4 PMU - Read proper MSR register to catch unflagged overflows The read of a proper MSR register was missed and instead of counter the configration register was tested (it has ARCH_P4_UNFLAGGED_BIT always cleared) leading to unknown NMI hitting the system. As result the user may obtain "Dazed and confused, but trying to continue" message. Fix it by reading a proper MSR register. When an NMI happens on a P4, the perf nmi handler checks the configuration register to see if the overflow bit is set or not before taking appropriate action. Unfortunately, various P4 machines had a broken overflow bit, so a backup mechanism was implemented. This mechanism checked to see if the counter rolled over or not. A previous commit that implemented this backup mechanism was broken. Instead of reading the counter register, it used the configuration register to determine if the counter rolled over or not. Reading that bit would give incorrect results. This would lead to 'Dazed and confused' messages for the end user when using the perf tool (or if the nmi watchdog is running). The fix is to read the counter register before determining if the counter rolled over or not. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <4D8BAB49.3080701@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-24 21:40:01 +01:00
Andi Kleen	914a76ca5e	oprofile, x86: Allow setting EDGE/INV/CMASK for counter events For some performance events it's useful to set the EDGE and INV bits and the CMASK mask in the counter control register. The list of predefined events Intel releases for each CPU has some events which require these settings to get more "natural" to use higher level events. oprofile currently doesn't allow this. This patch adds new extra configuration fields for them, so that they can be specified in oprofilefs. An updated oprofile daemon can then make use of this to set them. v2: Write back masked extra value to variable. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Robert Richter <robert.richter@amd.com>	2011-03-24 18:45:44 +01:00
Linus Torvalds	047f61c5d1	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (42 commits) ACPI: minor printk format change in acpi_pad ACPI: make acpi_pad /sys output more readable ACPICA: Update version to 20110316 ACPICA: Header support for SLIC table ACPI: Make sure the FADT is at least rev 2 before using the reset register ACPI: Bug compatibility for Windows on the ACPI reboot vector ACPICA: Fix access width for reset vector ACPI battery: fribble sysfs files from a resume notifier ACPI button: remove unused procfs I/F ACPI, APEI, Add PCIe AER error information printing support PCIe, AER, use pre-generated prefix in error information printing ACPI, APEI, Add ERST record ID cache ACPI: Use syscore_ops instead of sysdev class and sysdev ACPI: Remove the unused EC sysdev class ACPI: use __cpuinit for the acpi_processor_set_pdc() call tree ACPI: use __init where possible in processor driver Thermal_Framework-Fix_crash_during_hwmon_unregister ACPICA: Update version to 20110211. ACPICA: Add mechanism to defer _REG methods for some installed handlers ACPICA: Add support for FunctionalFixedHW in acpi_ut_get_region_name ...	2011-03-24 08:25:15 -07:00
Linus Torvalds	a6a1d6485e	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (90 commits) mfd: Push byte swaps out of wm8994 bulk read path mfd: Rename ab8500 gpadc header mfd: Constify WM8994 write path mfd: Push byte swap out of WM8994 bulk I/O mfd: Avoid copying data in WM8994 I2C write mfd: Remove copy from WM831x I2C write function mfd: Staticise WM8994 PM ops regulator: Add a subdriver for TI TPS6105x regulator portions v2 mfd: Add a core driver for TI TPS61050/TPS61052 chips v2 gpio: Add Tunnel Creek support to sch_gpio mfd: Add Tunnel Creek support to lpc_sch pci_ids: Add Intel Tunnel Creek LPC Bridge device ID. regulator: MAX8997/8966 support mfd: Add WM8994 bulk register write operation mfd: Append additional read write on 88pm860x mfd: Adopt mfd_data in 88pm860x input driver mfd: Adopt mfd_data in 88pm860x regulator mfd: Adopt mfd_data in 88pm860x led mfd: Adopt mfd_data in 88pm860x backlight mfd: Fix MAX8997 Kconfig entry typos ...	2011-03-24 07:59:01 -07:00
Daniel De Graaf	b254244d26	xen/p2m: Allocate p2m tracking pages on override It is possible to add a p2m override on pages that are currently mapped to INVALID_P2M_ENTRY; in particular, this will happen when using ballooned pages in gntdev. This means that set_phys_to_machine must be used instead of __set_phys_to_machine. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-24 10:36:01 -04:00
Rakib Mullick	9f1f1bfd8d	x86, mpparse: Remove unnecessary variable 'ret' isn't used by check_slot(), gets initialized but has no real use, so remove it. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> LKML-Reference: <AANLkTikh3y+is3xixKBdyHhr_cHxzPFJF729Fcvt8+Ns@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-24 09:05:40 +01:00
Namhyung Kim	71f9e59800	x86, dumpstack: Use %pB format specifier for stack trace Improve noreturn function entries in call traces: Before: Call Trace: [<ffffffff812a8502>] panic+0x8c/0x18d [<ffffffffa000012a>] deep01+0x0/0x38 [test_panic] <--- bad [<ffffffff81104666>] proc_file_write+0x73/0x8d [<ffffffff811000b3>] proc_reg_write+0x8d/0xac [<ffffffff810c7d32>] vfs_write+0xa1/0xc5 [<ffffffff810c7e0f>] sys_write+0x45/0x6c [<ffffffff8f02943b>] system_call_fastpath+0x16/0x1b After: Call Trace: [<ffffffff812bce69>] panic+0x8c/0x18d [<ffffffffa000012a>] panic_write+0x20/0x20 [test_panic] <--- good [<ffffffff81115fab>] proc_file_write+0x73/0x8d [<ffffffff81111a5f>] proc_reg_write+0x8d/0xac [<ffffffff810d90ee>] vfs_write+0xa1/0xc5 [<ffffffff810d91cb>] sys_write+0x45/0x6c [<ffffffff812c07fb>] system_call_fastpath+0x16/0x1b Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1300934550-21394-2-git-send-email-namhyung@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-24 08:36:10 +01:00
Linus Torvalds	b81a618dcd	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: deal with races in /proc//{syscall,stack,personality} proc: enable writing to /proc/pid/mem proc: make check_mem_permission() return an mm_struct on success proc: hold cred_guard_mutex in check_mem_permission() proc: disable mem_write after exec mm: implement access_remote_vm mm: factor out main logic of access_process_vm mm: use mm_struct to resolve gate vma's in __get_user_pages mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm mm: arch: make in_gate_area take an mm_struct instead of a task_struct mm: arch: make get_gate_vma take an mm_struct instead of a task_struct x86: mark associated mm when running a task in 32 bit compatibility mode x86: add context tag to mark mm when running a task in 32-bit compatibility mode auxv: require the target to be tracable (or yourself) close race in /proc//environ report errors in /proc//map* sanely pagemap: close races with suid execve make sessionid permissions in /proc//task/ match those in /proc/* fix leaks in path_lookupat() Fix up trivial conflicts in fs/proc/base.c	2011-03-23 20:51:42 -07:00
Olaf Hering	93a72052be	crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn The Xen PV drivers in a crashed HVM guest can not connect to the dom0 backend drivers because both frontend and backend drivers are still in connected state. To run the connection reset function only in case of a crashdump, the is_kdump_kernel() function needs to be available for the PV driver modules. Consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn into kernel/crash_dump.c Also export elfcorehdr_addr to make is_kdump_kernel() usable for modules. Leave 'elfcorehdr' as early_param(). This changes powerpc from __setup() to early_param(). It adds an address range check from x86 also on ia64 and powerpc. [akpm@linux-foundation.org: additional #includes] [akpm@linux-foundation.org: remove elfcorehdr_addr export] [akpm@linux-foundation.org: fix for Tejun's mm/nobootmem.c changes] Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Russell King <rmk@arm.linux.org.uk> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-23 19:47:19 -07:00
FUJITA Tomonori	8547727756	remove dma64_addr_t There is no user now. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: David Miller <davem@davemloft.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-23 19:47:18 -07:00
Alexandre Bounine	388b78adc9	rapidio: modify configuration to support PCI-SRIO controller 1. Add an option to include RapidIO support if the PCI is available. 2. Add FSL_RIO configuration option to enable controller selection. 3. Add RapidIO support option into x86 and MIPS architectures. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Acked-by: Kumar Gala <galak@kernel.crashing.org> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Micha Nelissen <micha@neli.hopto.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-23 19:46:42 -07:00
Akinobu Mita	61f2e7b0f4	bitops: remove minix bitops from asm/bitops.h minix bit operations are only used by minix filesystem and useless by other modules. Because byte order of inode and block bitmaps is different on each architecture like below: m68k: big-endian 16bit indexed bitmaps h8300, microblaze, s390, sparc, m68knommu: big-endian 32 or 64bit indexed bitmaps m32r, mips, sh, xtensa: big-endian 32 or 64bit indexed bitmaps for big-endian mode little-endian bitmaps for little-endian mode Others: little-endian bitmaps In order to move minix bit operations from asm/bitops.h to architecture independent code in minix filesystem, this provides two config options. CONFIG_MINIX_FS_BIG_ENDIAN_16BIT_INDEXED is only selected by m68k. CONFIG_MINIX_FS_NATIVE_ENDIAN is selected by the architectures which use native byte order bitmaps (h8300, microblaze, s390, sparc, m68knommu, m32r, mips, sh, xtensa). The architectures which always use little-endian bitmaps do not select these options. Finally, we can remove minix bit operations from asm/bitops.h for all architectures. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Ungerer <gerg@uclinux.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hirokazu Takata <takata@linux-m32r.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-23 19:46:22 -07:00
Akinobu Mita	f312eff816	bitops: remove ext2 non-atomic bitops from asm/bitops.h As the result of conversions, there are no users of ext2 non-atomic bit operations except for ext2 filesystem itself. Now we can put them into architecture independent code in ext2 filesystem, and remove from asm/bitops.h for all architectures. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-23 19:46:21 -07:00
Akinobu Mita	861b5ae7cd	bitops: introduce little-endian bitops for most architectures Introduce little-endian bit operations to the big-endian architectures which do not have native little-endian bit operations and the little-endian architectures. (alpha, avr32, blackfin, cris, frv, h8300, ia64, m32r, mips, mn10300, parisc, sh, sparc, tile, x86, xtensa) These architectures can just include generic implementation (asm-generic/bitops/le.h). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <willy@debian.org> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Zankel <chris@zankel.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-23 19:46:15 -07:00
Rafael J. Wysocki	d47d81c0e9	Introduce ARCH_NO_SYSDEV_OPS config option (v2) Introduce Kconfig option allowing architectures where sysdev operations used during system suspend, resume and shutdown have been completely replaced with struct sycore_ops operations to avoid building sysdev code that will never be used. Make callbacks in struct sys_device and struct sysdev_driver depend on ARCH_NO_SYSDEV_OPS to allows us to verify if all of the references have been actually removed from the code the given architecture depends on. Make x86 select ARCH_NO_SYSDEV_OPS. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-03-23 22:16:41 +01:00
Rafael J. Wysocki	f3c6ea1b06	x86: Use syscore_ops instead of sysdev classes and sysdevs Some subsystems in the x86 tree need to carry out suspend/resume and shutdown operations with one CPU on-line and interrupts disabled and they define sysdev classes and sysdevs or sysdev drivers for this purpose. This leads to unnecessarily complicated code and excessive memory usage, so switch them to using struct syscore_ops objects for this purpose instead. Generally, there are three categories of subsystems that use sysdevs for implementing PM operations: (1) subsystems whose suspend/resume callbacks ignore their arguments entirely (the majority), (2) subsystems whose suspend/resume callbacks use their struct sys_device argument, but don't really need to do that, because they can be implemented differently in an arguably simpler way (io_apic.c), and (3) subsystems whose suspend/resume callbacks use their struct sys_device argument, but the value of that argument is always the same and could be ignored (microcode_core.c). In all of these cases the subsystems in question may be readily converted to using struct syscore_ops objects for power management and shutdown. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu>	2011-03-23 22:15:54 +01:00
Stephen Wilson	cae5d39032	mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm Now that gate vma's are referenced with respect to a particular mm and not a particular task it only makes sense to propagate the change to this predicate as well. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-23 16:36:55 -04:00
Stephen Wilson	83b964bbf8	mm: arch: make in_gate_area take an mm_struct instead of a task_struct Morally, the question of whether an address lies in a gate vma should be asked with respect to an mm, not a particular task. Moreover, dropping the dependency on task_struct will help make existing and future operations on mm's more flexible and convenient. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-23 16:36:54 -04:00
Stephen Wilson	31db58b3ab	mm: arch: make get_gate_vma take an mm_struct instead of a task_struct Morally, the presence of a gate vma is more an attribute of a particular mm than a particular task. Moreover, dropping the dependency on task_struct will help make both existing and future operations on mm's more flexible and convenient. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-23 16:36:54 -04:00
Stephen Wilson	375906f876	x86: mark associated mm when running a task in 32 bit compatibility mode This patch simply follows the same practice as for setting the TIF_IA32 flag. In particular, an mm is marked as holding 32-bit tasks when a 32-bit binary is exec'ed. Both ELF and a.out formats are updated. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-23 16:36:53 -04:00
Stephen Wilson	c2ef45df3b	x86: add context tag to mark mm when running a task in 32-bit compatibility mode This tag is intended to mirror the thread info TIF_IA32 flag. Will be used to identify mm's which support 32 bit tasks running in compatibility mode without requiring a reference to the task itself. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-23 16:36:52 -04:00
Andres Salomon	f77289ac25	mfd: Rename mfd_shared_cell_{en,dis}able to drop the "shared" part As requested by Samuel, there's not really any reason to have "shared" in the name. This also modifies the only user of the function, as well. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2011-03-23 10:42:03 +01:00
Andres Salomon	1310e6d638	mfd: Add sharing for cs5535 acpi/pms cells This enables sharing of cs5535-mfd cells via the new mfd_shared_* API. Hooks for enable/disble of resources are added, with refcounting of resources being automatically handled so that cs5535_mfd_res_enable/disable are only called when necessary. Clients of cs5535-mfd (in this case, olpc-xo1.c) are also modified to use the mfd_shared API. The platform drivers are also renamed to olpc-xo1-{pms,acpi}, and resource enabling/disabling is replaced with mfd_shared API calls. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2011-03-23 10:41:58 +01:00
Len Brown	02e2407858	Merge branch 'linus' into release Conflicts: arch/x86/kernel/acpi/sleep.c Signed-off-by: Len Brown <len.brown@intel.com>	2011-03-23 02:34:54 -04:00
Len Brown	5c129a8600	Merge commit 'v2.6.38' into release	2011-03-23 02:33:54 -04:00
David Rientjes	1c00f0161f	x86: allow CONFIG_ISA_DMA_API to be disabled Not all 64-bit systems require ISA-style DMA, so allow it to be configurable. x86 utilizes the generic ISA DMA allocator from kernel/dma.c, so require it only when CONFIG_ISA_DMA_API is enabled. Disabling CONFIG_ISA_DMA_API is dependent on x86_64 since those machines do not have ISA slots and benefit the most from disabling the option (and on CONFIG_EXPERT as required by H. Peter Anvin). When disabled, this also avoids declaring claim_dma_lock(), release_dma_lock(), request_dma(), and free_dma() since those interfaces will no longer be provided. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-22 17:44:16 -07:00
David Rientjes	8df3bd9e18	x86: only compile floppy driver if CONFIG_ISA_DMA_API is enabled The generic floppy disk driver utilizies the interface provided by CONFIG_ISA_DMA_API, specifically claim_dma_lock(), release_dma_lock(), request_dma(), and free_dma(). Thus, there's a strict dependency on the config option and the driver should only be loaded if the kernel supports ISA-style DMA. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-22 17:44:16 -07:00
David Rientjes	4061d68e1a	x86: only compile 8237A if CONFIG_ISA_DMA_API is enabled 8237A utilizes the interface provided by CONFIG_ISA_DMA_API, specifically claim_dma_lock() and release_dma_lock(). Thus, there's a strict dependency on the config option and the module should only be loaded if the kernel supports ISA-style DMA. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-22 17:44:16 -07:00
Olaf Hering	d404ab0a11	move x86 specific oops=panic to generic code The oops=panic cmdline option is not x86 specific, move it to generic code. Update documentation. Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-22 17:44:11 -07:00
FUJITA Tomonori	3e50594e8e	add the common dma_addr_t typedef to include/linux/types.h All architectures can use the common dma_addr_t typedef now. We can remove the arch specific dma_addr_t. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Cc: Matt Turner <mattst88@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-22 17:44:09 -07:00
Eric Dumazet	b6a84016bd	mm: NUMA aware alloc_thread_info_node() Add a node parameter to alloc_thread_info(), and change its name to alloc_thread_info_node() This change is needed to allow NUMA aware kthread_create_on_cpu() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-22 17:44:01 -07:00
Linus Torvalds	73d5a8675f	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: xen: update mask_rw_pte after kernel page tables init changes xen: set max_pfn_mapped to the last pfn mapped x86: Cleanup highmap after brk is concluded Fix up trivial onflict (added header file includes) in arch/x86/mm/init_64.c	2011-03-22 10:41:36 -07:00
Rakib Mullick	cbb84c4cc1	x86, mpparse: Move check_slot into CONFIG_X86_IO_APIC context When CONFIG_X86_MPPARSE=y and CONFIG_X86_IO_APIC=n, then we get the following warning: arch/x86/kernel/mpparse.c:723: warning: 'check_slot' defined but not used So, put check_slot into CONFIG_X86_IO_APIC context. Its only called from CONFIG_X86_IO_APIC=y context. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> LKML-Reference: <AANLkTinsUfGc=NG_GeH_B+zFVu+DXJzZbJKdQLscqfuH@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-22 13:00:22 +01:00
Len Brown	25076246e8	Merge branch 'apei-release' into release	2011-03-22 01:41:47 -04:00
Huang Ying	885b976fad	ACPI, APEI, Add ERST record ID cache APEI ERST firmware interface and implementation has no multiple users in mind. For example, if there is four records in storage with ID: 1, 2, 3 and 4, if two ERST readers enumerate the records via GET_NEXT_RECORD_ID as follow, reader 1 reader 2 1 2 3 4 -1 -1 where -1 signals there is no more record ID. Reader 1 has no chance to check record 2 and 4, while reader 2 has no chance to check record 1 and 3. And any other GET_NEXT_RECORD_ID will return -1, that is, other readers will has no chance to check any record even they are not cleared by anyone. This makes raw GET_NEXT_RECORD_ID not suitable for used by multiple users. To solve the issue, an in-memory ERST record ID cache is designed and implemented. When enumerating record ID, the ID returned by GET_NEXT_RECORD_ID is added into cache in addition to be returned to caller. So other readers can check the cache to get all record ID available. Signed-off-by: Huang Ying <ying.huang@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-03-21 22:59:06 -04:00
Sage Weil	b7ed78f565	introduce sys_syncfs to sync a single file system It is frequently useful to sync a single file system, instead of all mounted file systems via sync(2): - On machines with many mounts, it is not at all uncommon for some of them to hang (e.g. unresponsive NFS server). sync(2) will get stuck on those and may never get to the one you do care about (e.g., /). - Some applications write lots of data to the file system and then want to make sure it is flushed to disk. Calling fsync(2) on each file introduces unnecessary ordering constraints that result in a large amount of sub-optimal writeback/flush/commit behavior by the file system. There are currently two ways (that I know of) to sync a single super_block: - BLKFLSBUF ioctl on the block device: That also invalidates the bdev mapping, which isn't usually desirable, and doesn't work for non-block file systems. - 'mount -o remount,rw' will call sync_filesystem as an artifact of the current implemention. Relying on this little-known side effect for something like data safety sounds foolish. Both of these approaches require root privileges, which some applications do not have (nor should they need?) given that sync(2) is an unprivileged operation. This patch introduces a new system call syncfs(2) that takes an fd and syncs only the file system it references. Maybe someday we can $ sync /some/path and not get sync: ignoring all arguments The syscall is motivated by comments by Al and Christoph at the last LSF. syncfs(2) seems like an appropriate name given statfs(2). A similar ioctl was also proposed a while back, see http://marc.info/?l=linux-fsdevel&m=127970513829285&w=2 Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-21 00:40:29 -04:00
Stefano Stabellini	d8aa5ec338	xen: update mask_rw_pte after kernel page tables init changes After "x86-64, mm: Put early page table high" already existing kernel page table pages can be mapped using early_ioremap too so we need to update mask_rw_pte to make sure these pages are still mapped RO. The reason why we have to do that is explain by the commit message of `fef5ba7979`: "Xen requires that all pages containing pagetable entries to be mapped read-only. If pages used for the initial pagetable are already mapped then we can change the mapping to RO. However, if they are initially unmapped, we need to make sure that when they are later mapped, they are also mapped RO. ..SNIP.. the pagetable setup code early_ioremaps the pages to write their entries, so we must make sure that mappings created in the early_ioremap fixmap area are mapped RW. (Those mappings are removed before the pages are presented to Xen as pagetable pages.)" We accomplish all this in mask_rw_pte by mapping RO all the pages mapped using early_ioremap apart from the last one that has been allocated because it is not a page table page yet (it has not been hooked into the page tables yet). Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-03-19 11:58:28 -07:00
Stefano Stabellini	14988a4d35	xen: set max_pfn_mapped to the last pfn mapped Do not set max_pfn_mapped to the end of the initial memory mappings, that also contain pages that don't belong in pfn space (like the mfn list). Set max_pfn_mapped to the last real pfn mapped in the initial memory mappings that is the pfn backing _end. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-03-19 11:58:25 -07:00
Yinghai Lu	e5f15b45dd	x86: Cleanup highmap after brk is concluded Now cleanup_highmap actually is in two steps: one is early in head64.c and only clears above _end; a second one is in init_memory_mapping() and tries to clean from _brk_end to _end. It should check if those boundaries are PMD_SIZE aligned but currently does not. Also init_memory_mapping() is called several times for numa or memory hotplug, so we really should not handle initial kernel mappings there. This patch moves cleanup_highmap() down after _brk_end is settled so we can do everything in one step. Also we honor max_pfn_mapped in the implementation of cleanup_highmap. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-03-19 11:58:19 -07:00
Stephane Eranian	fc66c5210e	perf, x86: Fix Intel fixed counters base initialization The following patch solves the problems introduced by Robert's commit `41bf498` and reported by Arun Sharma. This commit gets rid of the base + index notation for reading and writing PMU msrs. The problem is that for fixed counters, the new calculation for the base did not take into account the fixed counter indexes, thus all fixed counters were read/written from fixed counter 0. Although all fixed counters share the same config MSR, they each have their own counter register. Without: $ task -e unhalted_core_cycles -e instructions_retired -e baclears noploop 1 noploop for 1 seconds 242202299 unhalted_core_cycles (0.00% scaling, ena=1000790892, run=1000790892) 2389685946 instructions_retired (0.00% scaling, ena=1000790892, run=1000790892) 49473 baclears (0.00% scaling, ena=1000790892, run=1000790892) With: $ task -e unhalted_core_cycles -e instructions_retired -e baclears noploop 1 noploop for 1 seconds 2392703238 unhalted_core_cycles (0.00% scaling, ena=1000840809, run=1000840809) 2389793744 instructions_retired (0.00% scaling, ena=1000840809, run=1000840809) 47863 baclears (0.00% scaling, ena=1000840809, run=1000840809) Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: ming.m.lin@intel.com Cc: robert.richter@amd.com Cc: asharma@fb.com Cc: perfmon2-devel@lists.sf.net LKML-Reference: <20110319172005.GB4978@quad> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-19 19:00:49 +01:00
Len Brown	05534c9ffc	Merge branch 'acpica' into release	2011-03-18 18:06:08 -04:00
Linus Torvalds	99759619b2	Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: label: remove #include of ACPI header to avoid warnings PCI: label: Fix compilation error when CONFIG_ACPI is unset PCI: pre-allocate additional resources to devices only after successful allocation of essential resources. PCI: introduce reset_resource() PCI: data structure agnostic free list function PCI: refactor io size calculation code PCI: do not create quirk I/O regions below PCIBIOS_MIN_IO for ICH PCI hotplug: acpiphp: set current_state to D0 in register_slot PCI: Export ACPI _DSM provided firmware instance number and string name to sysfs PCI: add more checking to ICH region quirks PCI: aer-inject: Override PCIe AER Mask Registers PCI: fix tlan build when CONFIG_PCI is not enabled PCI: remove quirk for pre-production systems PCI: Avoid potential NULL pointer dereference in pci_scan_bridge PCI/lpc: irq and pci_ids patch for Intel DH89xxCC DeviceIDs PCI: sysfs: Fix failure path for addition of "vpd" attribute	2011-03-18 10:56:44 -07:00
Linus Torvalds	f2e1fbb5f2	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Flush TLB if PGD entry is changed in i386 PAE mode x86, dumpstack: Correct stack dump info when frame pointer is available x86: Clean up csum-copy_64.S a bit x86: Fix common misspellings x86: Fix misspelling and align params x86: Use PentiumPro-optimized partial_csum() on VIA C7	2011-03-18 10:45:21 -07:00
Linus Torvalds	619297855a	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits) trace, filters: Initialize the match variable in process_ops() properly trace, documentation: Fix branch profiling location in debugfs oprofile, s390: Cleanups oprofile, s390: Remove hwsampler_files.c and merge it into init.c perf: Fix tear-down of inherited group events perf: Reorder & optimize perf_event_context to remove alignment padding on 64 bit builds perf: Handle stopped state with tracepoints perf: Fix the software events state check perf, powerpc: Handle events that raise an exception without overflowing perf, x86: Use INTEL_*_CONSTRAINT() for all PEBS event constraints perf, x86: Clean up SandyBridge PEBS events perf lock: Fix sorting by wait_min perf tools: Version incorrect with some versions of grep perf evlist: New command to list the names of events present in a perf.data file perf script: Add support for H/W and S/W events perf script: Add support for dumping symbols perf script: Support custom field selection for output perf script: Move printing of 'common' data from print_event and rename perf tracing: Remove print_graph_cpu and print_graph_proc from trace-event-parse perf script: Change process_event prototype ...	2011-03-18 10:38:34 -07:00
Linus Torvalds	e16b396ce3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits) doc: CONFIG_UNEVICTABLE_LRU doesn't exist anymore Update cpuset info & webiste for cgroups dcdbas: force SMI to happen when expected arch/arm/Kconfig: remove one to many l's in the word. asm-generic/user.h: Fix spelling in comment drm: fix printk typo 'sracth' Remove one to many n's in a word Documentation/filesystems/romfs.txt: fixing link to genromfs drivers:scsi Change printk typo initate -> initiate serial, pch uart: Remove duplicate inclusion of linux/pci.h header fs/eventpoll.c: fix spelling mm: Fix out-of-date comments which refers non-existent functions drm: Fix printk typo 'failled' coh901318.c: Change initate to initiate. mbox-db5500.c Change initate to initiate. edac: correct i82975x error-info reported edac: correct i82975x mci initialisation edac: correct commented info fs: update comments to point correct document target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c ... Trivial conflict in fs/eventpoll.c (spelling vs addition)	2011-03-18 10:37:40 -07:00
Shaohua Li	4981d01ead	x86: Flush TLB if PGD entry is changed in i386 PAE mode According to intel CPU manual, every time PGD entry is changed in i386 PAE mode, we need do a full TLB flush. Current code follows this and there is comment for this too in the code. But current code misses the multi-threaded case. A changed page table might be used by several CPUs, every such CPU should flush TLB. Usually this isn't a problem, because we prepopulate all PGD entries at process fork. But when the process does munmap and follows new mmap, this issue will be triggered. When it happens, some CPUs keep doing page faults: http://marc.info/?l=linux-kernel&m=129915020508238&w=2 Reported-by: Yasunori Goto<y-goto@jp.fujitsu.com> Tested-by: Yasunori Goto<y-goto@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Shaohua Li<shaohua.li@intel.com> Cc: Mallick Asit K <asit.k.mallick@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm <linux-mm@kvack.org> Cc: stable <stable@kernel.org> LKML-Reference: <1300246649.2337.95.camel@sli10-conroe> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-18 11:44:01 +01:00
Namhyung Kim	e8e999cf3c	x86, dumpstack: Correct stack dump info when frame pointer is available Current stack dump code scans entire stack and check each entry contains a pointer to kernel code. If CONFIG_FRAME_POINTER=y it could mark whether the pointer is valid or not based on value of the frame pointer. Invalid entries could be preceded by '?' sign. However this was not going to happen because scan start point was always higher than the frame pointer so that they could not meet. Commit `9c0729dc80` ("x86: Eliminate bp argument from the stack tracing routines") delayed bp acquisition point, so the bp was read in lower frame, thus all of the entries were marked invalid. This patch fixes this by reverting above commit while retaining stack_frame() helper as suggested by Frederic Weisbecker. End result looks like below: before: [ 3.508329] Call Trace: [ 3.508551] [<ffffffff814f35c9>] ? panic+0x91/0x199 [ 3.508662] [<ffffffff814f3739>] ? printk+0x68/0x6a [ 3.508770] [<ffffffff81a981b2>] ? mount_block_root+0x257/0x26e [ 3.508876] [<ffffffff81a9821f>] ? mount_root+0x56/0x5a [ 3.508975] [<ffffffff81a98393>] ? prepare_namespace+0x170/0x1a9 [ 3.509216] [<ffffffff81a9772b>] ? kernel_init+0x1d2/0x1e2 [ 3.509335] [<ffffffff81003894>] ? kernel_thread_helper+0x4/0x10 [ 3.509442] [<ffffffff814f6880>] ? restore_args+0x0/0x30 [ 3.509542] [<ffffffff81a97559>] ? kernel_init+0x0/0x1e2 [ 3.509641] [<ffffffff81003890>] ? kernel_thread_helper+0x0/0x10 after: [ 3.522991] Call Trace: [ 3.523351] [<ffffffff814f35b9>] panic+0x91/0x199 [ 3.523468] [<ffffffff814f3729>] ? printk+0x68/0x6a [ 3.523576] [<ffffffff81a981b2>] mount_block_root+0x257/0x26e [ 3.523681] [<ffffffff81a9821f>] mount_root+0x56/0x5a [ 3.523780] [<ffffffff81a98393>] prepare_namespace+0x170/0x1a9 [ 3.523885] [<ffffffff81a9772b>] kernel_init+0x1d2/0x1e2 [ 3.523987] [<ffffffff81003894>] kernel_thread_helper+0x4/0x10 [ 3.524228] [<ffffffff814f6880>] ? restore_args+0x0/0x30 [ 3.524345] [<ffffffff81a97559>] ? kernel_init+0x0/0x1e2 [ 3.524445] [<ffffffff81003890>] ? kernel_thread_helper+0x0/0x10 -v5: * fix build breakage with oprofile -v4: * use 0 instead of regs->bp * separate out printk changes -v3: * apply comment from Frederic * add a couple of printk fixes Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Soren Sandmann <ssp@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <1300416006-3163-1-git-send-email-namhyung@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-18 10:51:42 +01:00
Ingo Molnar	2c76397bdd	x86: Clean up csum-copy_64.S a bit The many stray whitespaces and other uncleanlinesses made this code almost unreadable to me - so fix those. No changes to the code. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-18 10:44:26 +01:00
Lucas De Marchi	0d2eb44f63	x86: Fix common misspellings They were generated by 'codespell' and then manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: trivial@kernel.org LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-18 10:39:30 +01:00
Ingo Molnar	8dd8997d2c	Merge branch 'linus' into x86/urgent Merge reason: Merge upstream commits to avoid conflicts in upcoming patches. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-18 10:39:00 +01:00
Linus Torvalds	ec0afc9311	Merge branch 'kvm-updates/2.6.39' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.39' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (55 commits) KVM: unbreak userspace that does not sets tss address KVM: MMU: cleanup pte write path KVM: MMU: introduce a common function to get no-dirty-logged slot KVM: fix rcu usage in init_rmode_* functions KVM: fix kvmclock regression due to missing clock update KVM: emulator: Fix permission checking in io permission bitmap KVM: emulator: Fix io permission checking for 64bit guest KVM: SVM: Load %gs earlier if CONFIG_X86_32_LAZY_GS=n KVM: x86: Remove useless regs_page pointer from kvm_lapic KVM: improve comment on rcu use in irqfd_deassign KVM: MMU: remove unused macros KVM: MMU: cleanup page alloc and free KVM: MMU: do not record gfn in kvm_mmu_pte_write KVM: MMU: move mmu pages calculated out of mmu lock KVM: MMU: set spte accessed bit properly KVM: MMU: fix kvm_mmu_slot_remove_write_access dropping intermediate W bits KVM: Start lock documentation KVM: better readability of efer_reserved_bits KVM: Clear async page fault hash after switching to real mode KVM: VMX: Initialize vm86 TSS only once. ...	2011-03-17 18:40:35 -07:00
Linus Torvalds	5a39837f76	Merge branches 'stable/irq.fairness' and 'stable/irq.ween_of_nr_irqs' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/irq.fairness' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: events: Remove redundant clear of l2i at end of round-robin loop xen: events: Make round-robin scan fairer by snapshotting each l2 word once only xen: events: Clean up round-robin evtchn scan. xen: events: Make last processed event channel a per-cpu variable. xen: events: Process event channels notifications in round-robin order. * 'stable/irq.ween_of_nr_irqs' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: events: Fix compile error if CONFIG_SMP is not defined. xen: events: correct locking in xen_irq_from_pirq xen: events: propagate irq allocation failure instead of panicking xen: events: do not workaround too-small nr_irqs xen: events: remove use of nr_irqs as upper bound on number of pirqs xen: events: dynamically allocate irq info structures xen: events: maintain a list of Xen interrupts xen: events: push setup of irq<->{evtchn,ipi,virq,pirq} maps into irq_info init functions xen: events: turn irq_info constructors into initialiser functions xen: events: use per-cpu variable for cpu_evtchn_mask xen: events: refactor GSI pirq bindings functions xen: events: rename restore_cpu_pirqs -> restore_pirqs xen: events: remove unused public functions xen: events: fix xen_map_pirq_gsi error return xen: events: simplify comment xen: events: separate two unrelated halves of if condition Fix up trivial conflicts in drivers/xen/events.c	2011-03-17 18:27:49 -07:00
Linus Torvalds	514af9f790	Merge branches 'stable/hvc-console', 'stable/gntalloc.v6' and 'stable/balloon' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/hvc-console' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/hvc: Disable probe_irq_on/off from poking the hvc-console IRQ line. * 'stable/gntalloc.v6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: gntdev: fix build warning xen/p2m/m2p/gnttab: do not add failed grant maps to m2p override xen-gntdev: Add cast to pointer xen-gntdev: Fix incorrect use of zero handle xen: change xen/[gntdev/gntalloc] to default m xen-gntdev: prevent using UNMAP_NOTIFY_CLEAR_BYTE on read-only mappings xen-gntdev: Avoid double-mapping memory xen-gntdev: Avoid unmapping ranges twice xen-gntdev: Use map->vma for checking map validity xen-gntdev: Fix unmap notify on PV domains xen-gntdev: Fix memory leak when mmap fails xen/gntalloc,gntdev: Add unmap notify ioctl xen-gntalloc: Userspace grant allocation driver xen-gntdev: Support mapping in HVM domains xen-gntdev: Add reference counting to maps xen-gntdev: Use find_vma rather than iterating our vma list manually xen-gntdev: Change page limit to be global instead of per-open * 'stable/balloon' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (24 commits) xen-gntdev: Use ballooned pages for grant mappings xen-balloon: Add interface to retrieve ballooned pages xen-balloon: Move core balloon functionality out of module xen/balloon: Remove pr_info's and don't alter retry_count xen/balloon: Protect against CPU exhaust by event/x process xen/balloon: Migration from mod_timer() to schedule_delayed_work() xen/balloon: Removal of driver_pages	2011-03-17 18:16:36 -07:00
Linus Torvalds	61ef46fd45	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] pcc-cpufreq: remove duplicate statements [CPUFREQ] Remove the pm_message_t argument from driver suspend [CPUFREQ] Remove unneeded locks [CPUFREQ] Remove old, deprecated per cpu ondemand/conservative sysfs files [CPUFREQ] Remove deprecated sysfs file sampling_rate_max [CPUFREQ] powernow-k8: The table index is not worth displaying [CPUFREQ] calculate delay after dbs_check_cpu [CPUFREQ] Add documentation for sampling_down_factor [CPUFREQ] drivers/cpufreq: Remove unnecessary semicolons	2011-03-17 17:42:14 -07:00
Linus Torvalds	978ca164bd	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (38 commits) amd64_edac: Fix decode_syndrome types amd64_edac: Fix DCT argument type amd64_edac: Fix ranges signedness amd64_edac: Drop local variable amd64_edac: Fix PCI config addressing types amd64_edac: Fix DRAM base macros amd64_edac: Fix node id signedness amd64_edac: Drop redundant declarations amd64_edac: Enable driver on F15h amd64_edac: Adjust ECC symbol size to F15h amd64_edac: Simplify scrubrate setting PCI: Rename CPU PCI id define amd64_edac: Improve DRAM address mapping amd64_edac: Sanitize ->read_dram_ctl_register amd64_edac: Adjust sys_addr to chip select conversion routine to F15h amd64_edac: Beef up early exit reporting amd64_edac: Revamp online spare handling amd64_edac: Fix channel interleave removal amd64_edac: Correct node interleaving removal amd64_edac: Add support for interleaved region swapping ... Fix up trivial conflict in include/linux/pci_ids.h due to AMD_15H_NB_MISC being renamed as AMD_15H_NB_F3 next to the new AMD_15H_NB_LINK entry.	2011-03-17 17:21:32 -07:00
Gleb Natapov	776e58ea3d	KVM: unbreak userspace that does not sets tss address Commit 6440e5967bc broke old userspaces that do not set tss address before entering vcpu. Unbreak it by setting tss address to a safe value on the first vcpu entry. New userspaces should set tss address, so print warning in case it doesn't. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:35 -03:00
Xiao Guangrong	0f53b5b1c0	KVM: MMU: cleanup pte write path This patch does: - call vcpu->arch.mmu.update_pte directly - use gfn_to_pfn_atomic in update_pte path The suggestion is from Avi. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:35 -03:00
Xiao Guangrong	5d163b1c9d	KVM: MMU: introduce a common function to get no-dirty-logged slot Cleanup the code of pte_prefetch_gfn_to_memslot and mapping_level_dirty_bitmap Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:34 -03:00
Xiao Guangrong	40dcaa9f69	KVM: fix rcu usage in init_rmode_* functions fix: [ 3494.671786] stack backtrace: [ 3494.671789] Pid: 10527, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23 [ 3494.671790] Call Trace: [ 3494.671796] [] ? lockdep_rcu_dereference+0x9d/0xa5 [ 3494.671826] [] ? kvm_memslots+0x6b/0x73 [kvm] [ 3494.671834] [] ? gfn_to_memslot+0x16/0x4f [kvm] [ 3494.671843] [] ? gfn_to_hva+0x16/0x27 [kvm] [ 3494.671851] [] ? kvm_write_guest_page+0x31/0x83 [kvm] [ 3494.671861] [] ? kvm_clear_guest_page+0x1a/0x1c [kvm] [ 3494.671867] [] ? vmx_set_tss_addr+0x83/0x122 [kvm_intel] and: [ 8328.789599] stack backtrace: [ 8328.789601] Pid: 18736, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23 [ 8328.789603] Call Trace: [ 8328.789609] [] ? lockdep_rcu_dereference+0x9d/0xa5 [ 8328.789621] [] ? kvm_memslots+0x6b/0x73 [kvm] [ 8328.789628] [] ? gfn_to_memslot+0x16/0x4f [kvm] [ 8328.789635] [] ? gfn_to_hva+0x16/0x27 [kvm] [ 8328.789643] [] ? kvm_write_guest_page+0x31/0x83 [kvm] [ 8328.789699] [] ? kvm_clear_guest_page+0x1a/0x1c [kvm] [ 8328.789713] [] ? vmx_create_vcpu+0x316/0x3c8 [kvm_intel] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:34 -03:00
Nikola Ciprich	1aa8ceef03	KVM: fix kvmclock regression due to missing clock update commit 387b9f97750444728962b236987fbe8ee8cc4f8c moved kvm_request_guest_time_update(vcpu), breaking 32bit SMP guests using kvm-clock. Fix this by moving (new) clock update function to proper place. Signed-off-by: Nikola Ciprich <nikola.ciprich@linuxbox.cz> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:33 -03:00
Gleb Natapov	399a40c92d	KVM: emulator: Fix permission checking in io permission bitmap Currently if io port + len crosses 8bit boundary in io permission bitmap the check may allow IO that otherwise should not be allowed. The patch fixes that. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:33 -03:00
Gleb Natapov	5601d05b8c	KVM: emulator: Fix io permission checking for 64bit guest Current implementation truncates upper 32bit of TR base address during IO permission bitmap check. The patch fixes this. Reported-and-tested-by: Francis Moreau <francis.moro@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:33 -03:00
Avi Kivity	831ca6093c	KVM: SVM: Load %gs earlier if CONFIG_X86_32_LAZY_GS=n With CONFIG_CC_STACKPROTECTOR, we need a valid %gs at all times, so disable lazy reload and do an eager reload immediately after the vmexit. Reported-by: IVAN ANGELOV <ivangotoy@gmail.com> Acked-By: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:33 -03:00
Takuya Yoshikawa	afc20184b7	KVM: x86: Remove useless regs_page pointer from kvm_lapic Access to this page is mostly done through the regs member which holds the address to this page. The exceptions are in vmx_vcpu_reset() and kvm_free_lapic() and these both can easily be converted to using regs. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:33 -03:00
Xiao Guangrong	676646ee4b	KVM: MMU: remove unused macros These macros are not used, so removed Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:32 -03:00
Xiao Guangrong	842f22ed9b	KVM: MMU: cleanup page alloc and free Using __get_free_page instead of alloc_page and page_address, using free_page instead of __free_page and virt_to_page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:32 -03:00
Xiao Guangrong	49b26e26e4	KVM: MMU: do not record gfn in kvm_mmu_pte_write No need to record the gfn to verifier the pte has the same mode as current vcpu, it's because we only speculatively update the pte only if the pte and vcpu have the same mode Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:32 -03:00
Xiao Guangrong	48c0e4e906	KVM: MMU: move mmu pages calculated out of mmu lock kvm_mmu_calculate_mmu_pages need to walk all memslots and it's protected by kvm->slots_lock, so move it out of mmu spinlock Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:32 -03:00
Xiao Guangrong	1b7fd45c32	KVM: MMU: set spte accessed bit properly Set spte accessed bit only if guest_initiated == 1 that means the really accessed Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:32 -03:00
Xiao Guangrong	da8dc75f0c	KVM: MMU: fix kvm_mmu_slot_remove_write_access dropping intermediate W bits Only remove write access in the last sptes. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:32 -03:00
Lai Jiangshan	1260edbe7d	KVM: better readability of efer_reserved_bits use EFER_SCE, EFER_LME and EFER_LMA instead of magic numbers. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:31 -03:00
Lai Jiangshan	d170c41906	KVM: Clear async page fault hash after switching to real mode The hash array of async gfns may still contain some left gfns after kvm_clear_async_pf_completion_queue() called, need to clear them. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:31 -03:00
Gleb Natapov	93ea5388ea	KVM: VMX: Initialize vm86 TSS only once. Currently vm86 task is initialized on each real mode entry and vcpu reset. Initialization is done by zeroing TSS and updating relevant fields. But since all vcpus are using the same TSS there is a race where one vcpu may use TSS while other vcpu is initializing it, so the vcpu that uses TSS will see wrong TSS content and will behave incorrectly. Fix that by initializing TSS only once. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:31 -03:00
Gleb Natapov	a8ba6c2622	KVM: VMX: update live TR selector if it changes in real mode When rmode.vm86 is active TR descriptor is updated with vm86 task values, but selector is left intact. vmx_set_segment() makes sure that if TR register is written into while vm86 is active the new values are saved for use after vm86 is deactivated, but since selector is not updated on vm86 activation/deactivation new value is lost. Fix this by writing new selector into vmcs immediately. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:31 -03:00
Lai Jiangshan	a3b5ba49a8	KVM: VMX: add the __noclone attribute to vmx_vcpu_run The changelog of `104f226` said "adds the __noclone attribute", but it was missing in its patch. I think it is still needed. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:31 -03:00
Jan Kiszka	038f8c110e	KVM: x86: Convert tsc_write_lock to raw_spinlock Code under this lock requires non-preemptibility. Ensure this also over -rt by converting it to raw spinlock. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:30 -03:00
Gleb Natapov	7049467b53	KVM: remove isr_ack logic from PIC isr_ack logic was added by `e48258009d` to avoid unnecessary IPIs. Back then it made sense, but now the code checks that vcpu is ready to accept interrupt before sending IPI, so this logic is no longer needed. The patch removes it. Fixes a regression with Debian/Hurd. Signed-off-by: Gleb Natapov <gleb@redhat.com> Reported-and-tested-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:30 -03:00
Joseph Cihula	23f3e99132	KVM: VMX: fix detection of BIOS disabling VMX This patch fixes the logic used to detect whether BIOS has disabled VMX, for the case where VMX is enabled only under SMX, but tboot is not active. Signed-off-by: Joseph Cihula <joseph.cihula@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:30 -03:00
Jan Kiszka	e935b8372c	KVM: Convert kvm_lock to raw_spinlock Code under this lock requires non-preemptibility. Ensure this also over -rt by converting it to raw spinlock. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:30 -03:00
Avi Kivity	bd3d1ec3d2	KVM: SVM: check for progress after IRET interception When we enable an NMI window, we ask for an IRET intercept, since the IRET re-enables NMIs. However, the IRET intercept happens before the instruction executes, while the NMI window architecturally opens afterwards. To compensate for this mismatch, we only open the NMI window in the following exit, assuming that the IRET has by then executed; however, this assumption is not always correct; we may exit due to a host interrupt or page fault, without having executed the instruction. Fix by checking for forward progress by recording and comparing the IRET's rip. This is somewhat of a hack, since an unchaging rip does not mean that no forward progress has been made, but is the simplest fix for now. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:30 -03:00
Avi Kivity	f86368493e	KVM: Fix race between nmi injection and enabling nmi window The interrupt injection logic looks something like if an nmi is pending, and nmi injection allowed inject nmi if an nmi is pending request exit on nmi window the problem is that "nmi is pending" can be set asynchronously by the PIT; if it happens to fire between the two if statements, we will request an nmi window even though nmi injection is allowed. On SVM, this has disasterous results, since it causes eflags.TF to be set in random guest code. The fix is simple; make nmi_pending synchronous using the standard vcpu->requests mechanism; this ensures the code above is completely synchronous wrt nmi_pending. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:30 -03:00
Avi Kivity	4005996e42	KVM: Drop ad-hoc vendor specific instruction restriction Use the new support in the emulator, and drop the ad-hoc code in x86.c. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:28 -03:00
Avi Kivity	d867162c6d	KVM: x86 emulator: vendor specific instructions Mark some instructions as vendor specific, and allow the caller to request emulation only of vendor specific instructions. This is useful in some circumstances (responding to a #UD fault). Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:28 -03:00
Avi Kivity	3e90943907	KVM: Drop bogus x86_decode_insn() error check x86_decode_insn() doesn't return X86EMUL_* values, so the check for X86EMUL_PROPOGATE_FAULT will always fail. There is a proper check later on, so there is no need for a replacement for this code. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:28 -03:00
Jan Kiszka	0bb8865979	KVM: x86: Drop obsolete warning about INIT on runnable VCPU This warning was once used for debugging QEMU user space. Though uncommon, it is actually possible to send an INIT request to a running VCPU. So better drop this warning before someone misuses it to flood kernel logs this way. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:28 -03:00
Glauber Costa	12f9a48f7b	KVM: x86: release kvmclock page on reset When a vcpu is reset, kvmclock page keeps being written to this days. This is wrong and inconsistent: a cpu reset should take it to its initial state. Signed-off-by: Glauber Costa <glommer@redhat.com> CC: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:28 -03:00
john cooper	91c9c3eda4	KVM: x86: handle guest access to BBL_CR_CTL3 MSR A correction to Intel cpu model CPUID data (patch queued) caused winxp to BSOD when booted with a Penryn model. This was traced to the CPUID "model" field correction from 6 -> 23 (as is proper for a Penryn class of cpu). Only in this case does the problem surface. The cause for this failure is winxp accessing the BBL_CR_CTL3 MSR which is unsupported by current kvm, appears to be a legacy MSR not fully characterized yet existing in current silicon, and is apparently carried forward in MSR space to accommodate vintage code as here. It is not yet conclusive whether this MSR implements any of its legacy functionality or is just an ornamental dud for compatibility. While I found no silicon version specific documentation link to this MSR, a general description exists in Intel's developer's reference which agrees with the functional behavior of other bootloader/kernel code I've examined accessing BBL_CR_CTL3. Regrettably winxp appears to be setting bit #19 called out as "reserved" in the above document. So to minimally accommodate this MSR, kvm msr get will provide the equivalent mock data and kvm msr write will simply toss the guest passed data without interpretation. While this treatment of BBL_CR_CTL3 addresses the immediate problem, the approach may be modified pending clarification from Intel. Signed-off-by: john cooper <john.cooper@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:27 -03:00
Xiao Guangrong	6b7e2d0991	KVM: Add "exiting guest mode" state Currently we keep track of only two states: guest mode and host mode. This patch adds an "exiting guest mode" state that tells us that an IPI will happen soon, so unless we need to wait for the IPI, we can avoid it completely. Also 1: No need atomically to read/write ->mode in vcpu's thread 2: reorganize struct kvm_vcpu to make ->mode and ->requests in the same cache line explicitly Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:26 -03:00
Jan Kiszka	9ca5231831	KVM: x86: Remove user space triggerable MCE error message This case is a pure user space error we do not need to record. Moreover, it can be misused to flood the kernel log. Remove it. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:26 -03:00
Xiao Guangrong	63f42e023e	KVM: fix rcu usage warning in kvm_arch_vcpu_ioctl_set_sregs() Fix: [ 1001.499596] =================================================== [ 1001.499599] [ INFO: suspicious rcu_dereference_check() usage. ] [ 1001.499601] --------------------------------------------------- [ 1001.499604] include/linux/kvm_host.h:301 invoked rcu_dereference_check() without protection! ...... [ 1001.499636] Pid: 6035, comm: qemu-system-x86 Not tainted 2.6.37-rc6+ #62 [ 1001.499638] Call Trace: [ 1001.499644] [] lockdep_rcu_dereference+0x9d/0xa5 [ 1001.499653] [] gfn_to_memslot+0x8d/0xc8 [kvm] [ 1001.499661] [] gfn_to_hva+0x16/0x3f [kvm] [ 1001.499669] [] kvm_read_guest_page+0x1e/0x5e [kvm] [ 1001.499681] [] kvm_read_guest_page_mmu+0x53/0x5e [kvm] [ 1001.499699] [] load_pdptrs+0x3f/0x9c [kvm] [ 1001.499705] [] ? vmx_set_cr0+0x507/0x517 [kvm_intel] [ 1001.499717] [] kvm_arch_vcpu_ioctl_set_sregs+0x1f3/0x3c0 [kvm] [ 1001.499727] [] kvm_vcpu_ioctl+0x6a5/0xbc5 [kvm] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:26 -03:00
Avi Kivity	40712faeb8	KVM: VMX: Avoid atomic operation in vmx_vcpu_run Instead of exchanging the guest and host rcx, have separate storage for each. This allows us to avoid using the xchg instruction, which is is a little slower than normal operations. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:26 -03:00
Avi Kivity	1c696d0e1b	KVM: VMX: Simplify saving guest rcx in vmx_vcpu_run Change push top-of-stack pop guest-rcx pop dummy to pop guest-rcx which is the same thing, only simpler. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:25 -03:00
Rik van Riel	00c25bce02	KVM: VMX: increase ple_gap default to 128 On some CPUs, a ple_gap of 41 is simply insufficient to ever trigger PLE exits, even with the minimalistic PLE test from kvm-unit-tests. http://git.kernel.org/?p=virt/kvm/kvm-unit-tests.git;a=commitdiff;h=eda71b28fa122203e316483b35f37aaacd42f545 For example, the Xeon X5670 CPU needs a ple_gap of at least 48 in order to get pause loop exits: # modprobe kvm_intel ple_gap=47 # taskset 1 /usr/local/bin/qemu-system-x86_64 \ -device testdev,chardev=log -chardev stdio,id=log \ -kernel x86/vmexit.flat -append ple-round-robin -smp 2 VNC server running on `::1:5900' enabling apic enabling apic ple-round-robin 58298446 # rmmod kvm_intel # modprobe kvm_intel ple_gap=48 # taskset 1 /usr/local/bin/qemu-system-x86_64 \ -device testdev,chardev=log -chardev stdio,id=log \ -kernel x86/vmexit.flat -append ple-round-robin -smp 2 VNC server running on `::1:5900' enabling apic enabling apic ple-round-robin 36616 Increase the ple_gap to 128 to be on the safe side. Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Zhai, Edwin <edwin.zhai@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:25 -03:00
Joerg Roedel	3781c01c15	KVM: SVM: Add support for perf-kvm This patch adds the necessary code to run perf-kvm on AMD machines. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:25 -03:00
Avi Kivity	a917949935	KVM: VMX: Avoid leaking fake realmode state to userspace When emulating real mode, we fake some state: - tr.base points to a fake vm86 tss - segment registers are made to conform to vm86 restrictions change vmx_get_segment() not to expose this fake state to userspace; instead, return the original state. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:25 -03:00
Avi Kivity	d0ba64f9b4	KVM: VMX: Save and restore tr selector across mode switches When emulating real mode we play with tr hidden state, but leave tr.selector alone. That works well, except for save/restore, since loading TR writes it to the hidden state in vmx->rmode. Fix by also saving and restoring the tr selector; this makes things more consistent and allows migration to work during the early boot stages of Windows XP. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:25 -03:00
Sedat Dilek	775077a063	KVM guest: Fix section mismatch derived from kvm_guest_cpu_online() WARNING: arch/x86/built-in.o(.text+0x1bb74): Section mismatch in reference from the function kvm_guest_cpu_online() to the function .cpuinit.text:kvm_guest_cpu_init() The function kvm_guest_cpu_online() references the function __cpuinit kvm_guest_cpu_init(). This is often because kvm_guest_cpu_online lacks a __cpuinit annotation or the annotation of kvm_guest_cpu_init is wrong. This patch fixes the warning. Tested with linux-next (next-20101231) Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:25 -03:00
Avi Kivity	8234b22e1c	KVM: MMU: Don't flush shadow when enabling dirty tracking Instead, drop large mappings, which were the reason we dropped shadow. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:24 -03:00
Borislav Petkov	cb293250c7	PCI: Rename CPU PCI id define With increasing number of PCI function ids, add the PCI function id in the define name instead of its symbolic name in the BKDG for more clarity. Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-03-17 14:46:25 +01:00
Jon Nettleton	1eda75c131	x86: Use PentiumPro-optimized partial_csum() on VIA C7 Testing on the OLPC XO-1.5 (VIA C7-M 1000MHz CPU) shows a partial_csum() speed increase by a factor of 1.5 when we switch to the Pentium-optimized version. Signed-off-by: Daniel Drake <dsd@laptop.org> Cc: dilinger@queued.net Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-17 09:32:14 +01:00
Linus Torvalds	a5e6b135bd	Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (50 commits) printk: do not mangle valid userspace syslog prefixes efivars: Add Documentation efivars: Expose efivars functionality to external drivers. efivars: Parameterize operations. efivars: Split out variable registration efivars: parameterize efivars efivars: Make efivars bin_attributes dynamic efivars: move efivars globals into struct efivars drivers:misc: ti-st: fix debugging code kref: Fix typo in kref documentation UIO: add PRUSS UIO driver support Fix spelling mistakes in Documentation/zh_CN/SubmittingPatches firmware: Fix unaligned memory accesses in dmi-sysfs firmware: Add documentation for /sys/firmware/dmi firmware: Expose DMI type 15 System Event Log firmware: Break out system_event_log in dmi-sysfs firmware: Basic dmi-sysfs support firmware: Add DMI entry types to the headers Driver core: convert platform_{get,set}_drvdata to static inline functions Translate linux-2.6/Documentation/magic-number.txt into Chinese ...	2011-03-16 15:05:40 -07:00
Chumbalkar, Nagananda	bdce2595a2	[CPUFREQ] pcc-cpufreq: remove duplicate statements Remove a couple of assigment statements that appear twice. Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Signed-off-by: Dave Jones <davej@redhat.com>	2011-03-16 17:54:33 -04:00
Thomas Renninger	9e91869544	[CPUFREQ] powernow-k8: The table index is not worth displaying and it also is misleading due to another message above which makes the index look like it is the CPU. https://bugzilla.kernel.org/show_bug.cgi?id=24562 Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com> CC: cpufreq@vger.kernel.org	2011-03-16 17:54:31 -04:00
Linus Torvalds	41e0e0738c	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, AMD: Set ARAT feature on AMD processors x86, quirk: Fix SB600 revision check x86: stop_machine_text_poke() should issue sync_core() x86, amd-nb: Misc cleanliness fixes	2011-03-16 10:14:56 -07:00
Linus Torvalds	e7fd3b4669	Merge branch 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix binutils-2.21 symbol related build failures x86-64, trampoline: Remove unused variable x86, reboot: Fix the use of passed arguments in 32-bit BIOS reboot x86, reboot: Move the real-mode reboot code to an assembly file x86: Make the GDT_ENTRY() macro in <asm/segment.h> safe for assembly x86, trampoline: Use the unified trampoline setup for ACPI wakeup x86, trampoline: Common infrastructure for low memory trampolines Fix up trivial conflicts in arch/x86/kernel/Makefile	2011-03-16 10:10:02 -07:00
Linus Torvalds	fc82e1d59a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: (21 commits) PM / Hibernate: Reduce autotuned default image size PM / Core: Introduce struct syscore_ops for core subsystems PM PM QoS: Make pm_qos settings readable PM / OPP: opp_find_freq_exact() documentation fix PM: Documentation/power/states.txt: fix repetition PM: Make system-wide PM and runtime PM treat subsystems consistently PM: Simplify kernel/power/Kconfig PM: Add support for device power domains PM: Drop pm_flags that is not necessary PM: Allow pm_runtime_suspend() to succeed during system suspend PM: Clean up PM_TRACE dependencies and drop unnecessary Kconfig option PM: Remove CONFIG_PM_OPS PM: Reorder power management Kconfig options PM: Make CONFIG_PM depend on (CONFIG_PM_SLEEP \|\| CONFIG_PM_RUNTIME) PM / ACPI: Remove references to pm_flags from bus.c PM: Do not create wakeup sysfs files for devices that cannot wake up USB / Hub: Do not call device_set_wakeup_capable() under spinlock PM: Use appropriate printk() priority level in trace.c PM / Wakeup: Don't update events_check_enabled in pm_get_wakeup_count() PM / Wakeup: Make pm_save_wakeup_count() work as documented ...	2011-03-16 09:24:44 -07:00
Linus Torvalds	0f6e0e8448	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits) AppArmor: kill unused macros in lsm.c AppArmor: cleanup generated files correctly KEYS: Add an iovec version of KEYCTL_INSTANTIATE KEYS: Add a new keyctl op to reject a key with a specified error code KEYS: Add a key type op to permit the key description to be vetted KEYS: Add an RCU payload dereference macro AppArmor: Cleanup make file to remove cruft and make it easier to read SELinux: implement the new sb_remount LSM hook LSM: Pass -o remount options to the LSM SELinux: Compute SID for the newly created socket SELinux: Socket retains creator role and MLS attribute SELinux: Auto-generate security_is_socket_class TOMOYO: Fix memory leak upon file open. Revert "selinux: simplify ioctl checking" selinux: drop unused packet flow permissions selinux: Fix packet forwarding checks on postrouting selinux: Fix wrong checks for selinux_policycap_netpeer selinux: Fix check for xfrm selinux context algorithm ima: remove unnecessary call to ima_must_measure IMA: remove IMA imbalance checking ...	2011-03-16 09:15:43 -07:00
Linus Torvalds	0d2ecee2bd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: tcrypt - do not attempt to write to readonly variable random: update interface comments to reflect reality crypto: picoxcell - add support for the picoxcell crypto engines crypto: sha1 - Add test vector to test partial block processing hwrng: omap - Convert release_resource to release_region/release_mem_region crypto: aesni-intel - Fix remaining leak in rfc4106_set_hash_key crypto: omap-sham - don't treat NULL clk as an error crypto: omap-aes - don't treat NULL clk as an error crypto: testmgr - mark ghash as fips_allowed crypto: testmgr - mark xts(aes) as fips_allowed crypto: skcipher - remove redundant NULL check hwrng: pixocell - add support for picoxcell TRNG crypto: aesni-intel - Don't leak memory in rfc4106_set_hash_subkey	2011-03-16 09:15:21 -07:00
Ingo Molnar	344c21c322	Merge branch 'x86/amd-nb' into x86/urgent Merge reason: This is one followup commit that was not in x86/mm - merge it via the urgent path Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-16 16:34:01 +01:00
Linus Torvalds	79d8a8f736	Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu, x86: Add arch-specific this_cpu_cmpxchg_double() support percpu: Generic support for this_cpu_cmpxchg_double() alpha: use L1_CACHE_BYTES for cacheline size in the linker script percpu: align percpu readmostly subsection to cacheline Fix up trivial conflict in arch/x86/kernel/vmlinux.lds.S due to the percpu alignment having changed ("x86: Reduce back the alignment of the per-CPU data section")	2011-03-16 08:22:41 -07:00
Lin Ming	7d5d02dadd	perf, x86: Use INTEL_*_CONSTRAINT() for all PEBS event constraints PEBS_EVENT_CONSTRAINT() is just a duplicate of INTEL_UEVENT_CONSTRAINT(). Remove it and use INTEL_UEVENT_CONSTRAINT() instead. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1299684089-22835-3-git-send-email-ming.m.lin@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-16 14:04:12 +01:00
Lin Ming	eefaaac464	perf, x86: Clean up SandyBridge PEBS events Use INTEL_EVENT_CONSTRAINT() for the events where all umasks support PEBS. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1299684089-22835-2-git-send-email-ming.m.lin@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-16 14:04:12 +01:00
Boris Ostrovsky	b87cf80af3	x86, AMD: Set ARAT feature on AMD processors Support for Always Running APIC timer (ARAT) was introduced in commit `db954b5898`. This feature allows us to avoid switching timers from LAPIC to something else (e.g. HPET) and go into timer broadcasts when entering deep C-states. AMD processors don't provide a CPUID bit for that feature but they also keep APIC timers running in deep C-states (except for cases when the processor is affected by erratum 400). Therefore we should set ARAT feature bit on AMD CPUs. Tested-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Mark Langsdorf <mark.langsdorf@amd.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> LKML-Reference: <1300205624-4813-1-git-send-email-ostr@amd64.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-16 14:03:33 +01:00
Andreas Herrmann	1d3e09a304	x86, quirk: Fix SB600 revision check Commit `7f74f8f28a` (x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systems) introduced a regression. It removed some SB600 specific code to determine the revision ID without adapting a corresponding revision ID check for SB600. See this mail thread: http://marc.info/?l=linux-kernel&m=129980296006380&w=2 This patch adapts the corresponding check to cover all SB600 revisions. Tested-by: Wang Lei <f3d27b@gmail.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org # 38.x, 37.x, 32.x LKML-Reference: <20110315143137.GD29499@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-16 14:03:32 +01:00
Linus Torvalds	d10902812c	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits) x86: Clean up apic.c and apic.h x86: Remove superflous goal definition of tsc_sync x86: dt: Correct local apic documentation in device tree bindings x86: dt: Cleanup local apic setup x86: dt: Fix OLPC=y/INTEL_CE=n build rtc: cmos: Add OF bindings x86: ce4100: Use OF to setup devices x86: ioapic: Add OF bindings for IO_APIC x86: dtb: Add generic bus probe x86: dtb: Add support for PCI devices backed by dtb nodes x86: dtb: Add device tree support for HPET x86: dtb: Add early parsing of IO_APIC x86: dtb: Add irq domain abstraction x86: dtb: Add a device tree for CE4100 x86: Add device tree support x86: e820: Remove conditional early mapping in parse_e820_ext x86: OLPC: Make OLPC=n build again x86: OLPC: Remove extra OLPC_OPENFIRMWARE_DT indirection x86: OLPC: Cleanup config maze completely x86: OLPC: Hide OLPC_OPENFIRMWARE config switch ... Fix up conflicts in arch/x86/platform/ce4100/ce4100.c	2011-03-15 20:01:36 -07:00
Linus Torvalds	181f977d13	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (93 commits) x86, tlb, UV: Do small micro-optimization for native_flush_tlb_others() x86-64, NUMA: Don't call numa_set_distanc() for all possible node combinations during emulation x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() x86-64, NUMA: Clean up initmem_init() x86-64, NUMA: Fix numa_emulation code with node0 without RAM x86-64, NUMA: Revert NUMA affine page table allocation x86: Work around old gas bug x86-64, NUMA: Better explain numa_distance handling x86-64, NUMA: Fix distance table handling mm: Move early_node_map[] reverse scan helpers under HAVE_MEMBLOCK x86-64, NUMA: Fix size of numa_distance array x86: Rename e820_table_* to pgt_buf_* bootmem: Move __alloc_memory_core_early() to nobootmem.c bootmem: Move contig_page_data definition to bootmem.c/nobootmem.c bootmem: Separate out CONFIG_NO_BOOTMEM code into nobootmem.c x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance() x86-64, NUMA: Add proper function comments to global functions x86-64, NUMA: Move NUMA emulation into numa_emulation.c x86-64, NUMA: Prepare numa_emulation() for moving NUMA emulation into a separate file x86-64, NUMA: Do not scan two times for setup_node_bootmem() ... Fix up conflicts in arch/x86/kernel/smpboot.c	2011-03-15 19:49:10 -07:00
Linus Torvalds	d5d42399bd	Merge branch 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64, mem: Convert memmove() to assembly file and fix return value bug	2011-03-15 19:41:42 -07:00
Linus Torvalds	209b6c8fa7	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, microcode, AMD: Fix signedness bug in generic_load_microcode() x86, microcode, AMD: Extend ucode size verification x86, microcode, AMD: Cleanup dmesg output x86, microcode, AMD: Remove unneeded memset call x86, microcode, AMD: Simplify get_next_ucode x86, microcode, AMD: Simplify install_equiv_cpu_table x86, microcode, AMD: Release firmware on error x86, microcode: Correct sysdev_add error path	2011-03-15 19:40:53 -07:00
Linus Torvalds	5f6fb45466	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (116 commits) x86: Enable forced interrupt threading support x86: Mark low level interrupts IRQF_NO_THREAD x86: Use generic show_interrupts x86: ioapic: Avoid redundant lookup of irq_cfg x86: ioapic: Use new move_irq functions x86: Use the proper accessors in fixup_irqs() x86: ioapic: Use irq_data->state x86: ioapic: Simplify irq chip and handler setup x86: Cleanup the genirq name space genirq: Add chip flag to force mask on suspend genirq: Add desc->irq_data accessor genirq: Add comments to Kconfig switches genirq: Fixup fasteoi handler for oneshot mode genirq: Provide forced interrupt threading sched: Switch wait_task_inactive to schedule_hrtimeout() genirq: Add IRQF_NO_THREAD genirq: Allow shared oneshot interrupts genirq: Prepare the handling of shared oneshot interrupts genirq: Make warning in handle_percpu_event useful x86: ioapic: Move trigger defines to io_apic.h ... Fix up trivial(?) conflicts in arch/x86/pci/xen.c due to genirq name space changes clashing with the Xen cleanups. The set_irq_msi() had moved to xen_bind_pirq_msi_to_irq().	2011-03-15 19:23:40 -07:00
Linus Torvalds	3904afb41d	Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Combine printk()s in show_regs_common() x86: Don't call dump_stack() from arch_trigger_all_cpu_backtrace_handler()	2011-03-15 19:16:00 -07:00
Linus Torvalds	502f4d4f74	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix and clean up generic_processor_info() x86: Don't copy per_cpu cpuinfo for BSP two times x86: Move llc_shared_map out of cpu_info	2011-03-15 19:00:53 -07:00
Linus Torvalds	da849abeb8	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, binutils, xen: Fix another wrong size directive x86: Remove dead config option X86_CPU x86: Really print supported CPUs if PROCESSOR_SELECT=y x86: Fix a bogus unwind annotation in lib/semaphore_32.S um, x86-64: Fix UML build after adding CFI annotations to lib/rwsem_64.S x86: Remove unused bits from lib/thunk_*.S x86: Use {push,pop}_cfi in more places x86-64: Add CFI annotations to lib/rwsem_64.S x86, asm: Cleanup unnecssary macros in asm-offsets.c x86, system.h: Drop unused __SAVE/__RESTORE macros x86: Use bitmap library functions x86: Partly unify asm-offsets_{32,64}.c x86: Reduce back the alignment of the per-CPU data section	2011-03-15 18:59:56 -07:00
Linus Torvalds	420c1c572d	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits) posix-clocks: Check write permissions in posix syscalls hrtimer: Remove empty hrtimer_init_hres_timer() hrtimer: Update hrtimer->state documentation hrtimer: Update base[CLOCK_BOOTTIME].offset correctly timers: Export CLOCK_BOOTTIME via the posix timers interface timers: Add CLOCK_BOOTTIME hrtimer base time: Extend get_xtime_and_monotonic_offset() to also return sleep time: Introduce get_monotonic_boottime and ktime_get_boottime hrtimers: extend hrtimer base code to handle more then 2 clockids ntp: Remove redundant and incorrect parameter check mn10300: Switch do_timer() to xtimer_update() posix clocks: Introduce dynamic clocks posix-timers: Cleanup namespace posix-timers: Add support for fd based clocks x86: Add clock_adjtime for x86 posix-timers: Introduce a syscall for clock tuning. time: Splitout compat timex accessors ntp: Add ADJ_SETOFFSET mode bit time: Introduce timekeeping_inject_offset posix-timer: Update comment ... Fix up new system-call-related conflicts in arch/x86/ia32/ia32entry.S arch/x86/include/asm/unistd_32.h arch/x86/include/asm/unistd_64.h arch/x86/kernel/syscall_table_32.S (name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some due to movement of get_jiffies_64() in: kernel/time.c	2011-03-15 18:53:35 -07:00
Linus Torvalds	a926021cb1	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (184 commits) perf probe: Clean up probe_point_lazy_walker() return value tracing: Fix irqoff selftest expanding max buffer tracing: Align 4 byte ints together in struct tracer tracing: Export trace_set_clr_event() tracing: Explain about unstable clock on resume with ring buffer warning ftrace/graph: Trace function entry before updating index ftrace: Add .ref.text as one of the safe areas to trace tracing: Adjust conditional expression latency formatting. tracing: Fix event alignment: skb:kfree_skb tracing: Fix event alignment: mce:mce_record tracing: Fix event alignment: kvm:kvm_hv_hypercall tracing: Fix event alignment: module:module_request tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup tracing: Remove lock_depth from event entry perf header: Stop using 'self' perf session: Use evlist/evsel for managing perf.data attributes perf top: Don't let events to eat up whole header line perf top: Fix events overflow in top command ring-buffer: Remove unused #include <linux/trace_irq.h> tracing: Add an 'overwrite' trace_option. ...	2011-03-15 18:31:30 -07:00
Linus Torvalds	0586bed3e8	Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rtmutex: tester: Remove the remaining BKL leftovers lockdep/timers: Explain in detail the locking problems del_timer_sync() may cause rtmutex: Simplify PI algorithm and make highest prio task get lock rwsem: Remove redundant asmregparm annotation rwsem: Move duplicate function prototypes to linux/rwsem.h rwsem: Unify the duplicate rwsem_is_locked() inlines rwsem: Move duplicate init macros and functions to linux/rwsem.h rwsem: Move duplicate struct rwsem declaration to linux/rwsem.h x86: Cleanup rwsem_count_t typedef rwsem: Cleanup includes locking: Remove deprecated lock initializers cred: Replace deprecated spinlock initialization kthread: Replace deprecated spinlock initialization xtensa: Replace deprecated spinlock initialization um: Replace deprecated spinlock initialization sparc: Replace deprecated spinlock initialization mips: Replace deprecated spinlock initialization cris: Replace deprecated spinlock initialization alpha: Replace deprecated spinlock initialization rtmutex-tester: Remove BKL tests	2011-03-15 18:28:30 -07:00
Linus Torvalds	b80cd62b7d	Merge branch 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: arm: Remove bogus comment in futex_atomic_cmpxchg_inatomic() futex: Deobfuscate handle_futex_death() plist: Add priority list test plist: Shrink struct plist_head futex,plist: Remove debug lock assignment from plist_node futex,plist: Pass the real head of the priority list to plist_del() futex: Sanitize futex ops argument types futex: Sanitize cmpxchg_futex_value_locked API futex: Remove redundant pagefault_disable in futex_atomic_cmpxchg_inatomic() futex: Avoid redudant evaluation of task_pid_vnr() futex: Update futex_wait_setup comments about locking	2011-03-15 18:23:52 -07:00
Linus Torvalds	422e6c4bc4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (57 commits) tidy the trailing symlinks traversal up Turn resolution of trailing symlinks iterative everywhere simplify link_path_walk() tail Make trailing symlink resolution in path_lookupat() iterative update nd->inode in __do_follow_link() instead of after do_follow_link() pull handling of one pathname component into a helper fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH Allow passing O_PATH descriptors via SCM_RIGHTS datagrams readlinkat(), fchownat() and fstatat() with empty relative pathnames Allow O_PATH for symlinks New kind of open files - "location only". ext4: Copy fs UUID to superblock ext3: Copy fs UUID to superblock. vfs: Export file system uuid via /proc/<pid>/mountinfo unistd.h: Add new syscalls numbers to asm-generic x86: Add new syscalls for x86_64 x86: Add new syscalls for x86_32 fs: Remove i_nlink check from file system link callback fs: Don't allow to create hardlink for deleted file vfs: Add open by file handle support ...	2011-03-15 15:48:13 -07:00
James Morris	a002951c97	Merge branch 'next' into for-linus	2011-03-16 09:41:17 +11:00
Dan Williams	5d94e81f69	x86: Introduce pci_map_biosrom() The isci driver needs to retrieve its preboot OROM image which contains necessary runtime parameters like platform specific sas addresses and phy configuration. There is no ROM BAR associated with this area, instead we will need to scan legacy expansion ROM space. 1/ Promote the probe_roms_32 implementation to x86-64 2/ Add a facility to find and map an adapter rom by pci device (according to PCI Firmware Specification Revision 3.0) Signed-off-by: Dave Jiang <dave.jiang@intel.com> LKML-Reference: <20110308183226.6246.90354.stgit@localhost6.localdomain6> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-03-15 15:34:15 -07:00
Daniel Drake	45bb1674b9	x86, olpc: Use device tree for platform identification Make OLPC fully depend on device tree, and use it to identify the OLPC platform details. Some nodes are exposed as platform devices where we plan to use device tree for device probing. Signed-off-by: Daniel Drake <dsd@laptop.org> Acked-by: Grant Likely <grant.likely@secretlab.ca> LKML-Reference: <20110313151017.C255F9D401E@zog.reactivated.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-03-15 14:17:23 -07:00
Linus Torvalds	76ca078328	Merge branch 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm * 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm: xen: suspend: remove xen_hvm_suspend xen: suspend: pull pre/post suspend hooks out into suspend_info xen: suspend: move arch specific pre/post suspend hooks into generic hooks xen: suspend: refactor non-arch specific pre/post suspend hooks xen: suspend: add "arch" to pre/post suspend hooks xen: suspend: pass extra hypercall argument via suspend_info struct xen: suspend: refactor cancellation flag into a structure xen: suspend: use HYPERVISOR_suspend for PVHVM case instead of open coding xen: switch to new schedop hypercall by default. xen: use new schedop interface for suspend xen: do not respond to unknown xenstore control requests xen: fix compile issue if XEN is enabled but XEN_PVHVM is disabled xen: PV on HVM: support PV spinlocks and IPIs xen: make the ballon driver work for hvm domains xen-blkfront: handle Xen major numbers other than XENVBD xen: do not use xen_info on HVM, set pv_info name to "Xen HVM" xen: no need to delay xen_setup_shutdown_event for hvm guests anymore	2011-03-15 10:59:09 -07:00
Linus Torvalds	27d2a8b97e	Merge branches 'stable/ia64', 'stable/blkfront-cleanup' and 'stable/cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/ia64' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: ia64 build broken due to "xen: switch to new schedop hypercall by default." * 'stable/blkfront-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: Union the blkif_request request specific fields * 'stable/cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: annotate functions which only call into __init at start of day xen p2m: annotate variable which appears unused xen: events: mark cpu_evtchn_mask_p as __refdata	2011-03-15 10:49:16 -07:00
Linus Torvalds	010b8f4e26	Merge branch 'stable/irq.cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/irq.cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: events: remove dom0 specific xen_create_msi_irq xen: events: use xen_bind_pirq_msi_to_irq from xen_create_msi_irq xen: events: push set_irq_msi down into xen_create_msi_irq xen: events: update pirq_to_irq in xen_create_msi_irq xen: events: refactor xen_create_msi_irq slightly xen: events: separate MSI PIRQ allocation from PIRQ binding to IRQ xen: events: assume PHYSDEVOP_get_free_pirq exists xen: pci: collapse apic_register_gsi_xen_hvm and xen_hvm_register_pirq xen: events: return irq from xen_allocate_pirq_msi xen: events: drop XEN_ALLOC_IRQ flag to xen_allocate_pirq_msi xen: events: do not leak IRQ from xen_allocate_pirq_msi when no pirq available. xen: pci: only define xen_initdom_setup_msi_irqs if CONFIG_XEN_DOM0	2011-03-15 10:47:56 -07:00
Linus Torvalds	397fae0818	Merge branches 'stable/irq.rework' and 'stable/pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/irq.rework' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well. xen: Use IRQF_FORCE_RESUME xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend. xen: Fix compile error introduced by "switch to new irq_chip functions" xen: Switch to new irq_chip functions xen: Remove stale irq_chip.end xen: events: do not free legacy IRQs xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges. xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq xen:events: move find_unbound_irq inside CONFIG_PCI_MSI xen: handled remapped IRQs when enabling a pcifront PCI device. genirq: Add IRQF_FORCE_RESUME * 'stable/pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code. pci/xen: Cleanup: convert int** to int[] pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq xen-pcifront: Sanity check the MSI/MSI-X values xen-pcifront: don't use flush_scheduled_work()	2011-03-15 10:47:16 -07:00
Linus Torvalds	c7146dd009	Merge branches 'stable/p2m-identity.v4.9.1' and 'stable/e820' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/p2m-identity.v4.9.1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set.. xen/m2p: No need to catch exceptions when we know that there is no RAM xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set. xen/debugfs: Add 'p2m' file for printing out the P2M layout. xen/setup: Set identity mapping for non-RAM E820 and E820 gaps. xen/mmu: WARN_ON when racing to swap middle leaf. xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN. xen/mmu: Add the notion of identity (1-1) mapping. xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY. * 'stable/e820' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/e820: Don't mark balloon memory as E820_UNUSABLE when running as guest and fix overflow. xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps.	2011-03-15 10:32:15 -07:00
Mathieu Desnoyers	0e00f7aed6	x86: stop_machine_text_poke() should issue sync_core() Intel Archiecture Software Developer's Manual section 7.1.3 specifies that a core serializing instruction such as "cpuid" should be executed on _each_ core before the new instruction is made visible. Failure to do so can lead to unspecified behavior (Intel XMC erratas include General Protection Fault in the list), so we should avoid this at all cost. This problem can affect modified code executed by interrupt handlers after interrupt are re-enabled at the end of stop_machine, because no core serializing instruction is executed between the code modification and the moment interrupts are reenabled. Because stop_machine_text_poke performs the text modification from the first CPU decrementing stop_machine_first, modified code executed in thread context is also affected by this problem. To explain why, we have to split the CPUs in two categories: the CPU that initiates the text modification (calls text_poke_smp) and all the others. The scheduler, executed on all other CPUs after stop_machine, issues an "iret" core serializing instruction, and therefore handles core serialization for all these CPUs. However, the text modification initiator can continue its execution on the same thread and access the modified text without any scheduler call. Given that the CPU that initiates the code modification is not guaranteed to be the one actually performing the code modification, it falls into the XMC errata. Q: Isn't this executed from an IPI handler, which will return with IRET (a serializing instruction) anyway? A: No, now stop_machine uses per-cpu workqueue, so that handler will be executed from worker threads. There is no iret anymore. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20110303160137.GB1590@Krystal> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: <stable@kernel.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-03-15 08:36:37 -07:00
Xiao Guangrong	25542c646a	x86, tlb, UV: Do small micro-optimization for native_flush_tlb_others() native_flush_tlb_others() is called from: flush_tlb_current_task() flush_tlb_mm() flush_tlb_page() All these functions disable preemption explicitly, so we can use smp_processor_id() instead of get_cpu() and put_cpu(). Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Cliff Wickman <cpw@sgi.com> LKML-Reference: <4D7EC791.4040003@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-15 08:30:34 +01:00
Ingo Molnar	8460b3e5bc	Merge commit 'v2.6.38' into x86/mm Conflicts: arch/x86/mm/numa_64.c Merge reason: Resolve the conflict, update the branch to .38. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-15 08:29:44 +01:00
Aneesh Kumar K.V	6aae5f2b20	x86: Add new syscalls for x86_64 This patch add new syscalls to x86_64 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V	7dadb755b0	x86: Add new syscalls for x86_32 This patch adds new syscalls to x86_32 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:44 -04:00
Rafael J. Wysocki	6831c6edc7	PM: Drop pm_flags that is not necessary The variable pm_flags is used to prevent APM from being enabled along with ACPI, which would lead to problems. However, acpi_init() is always called before apm_init() and after acpi_init() has returned, it is known whether or not ACPI will be used. Namely, if acpi_disabled is not set after acpi_init() has returned, this means that ACPI is enabled. Thus, it is sufficient to check acpi_disabled in apm_init() to prevent APM from being enabled in parallel with ACPI. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Len Brown <len.brown@intel.com>	2011-03-15 00:43:16 +01:00
Rafael J. Wysocki	1eb208aea3	PM: Make CONFIG_PM depend on (CONFIG_PM_SLEEP \|\| CONFIG_PM_RUNTIME) From the users' point of view CONFIG_PM is really only used for making it possible to set CONFIG_SUSPEND, CONFIG_HIBERNATION, CONFIG_PM_RUNTIME and (surprisingly enough) CONFIG_XEN_SAVE_RESTORE (CONFIG_PM_OPP also depends on CONFIG_PM, but quite artificially). However, both CONFIG_SUSPEND and CONFIG_HIBERNATION require platform support (independent of CONFIG_PM) and it is not quite obvious that CONFIG_PM has to be set for CONFIG_XEN_SAVE_RESTORE to be available. Thus, from the users' point of view, it would be more logical to automatically select CONFIG_PM if any of the above options depending on it are set. Make CONFIG_PM depend on (CONFIG_PM_SLEEP \|\| CONFIG_PM_RUNTIME), which will cause it to be selected when any of CONFIG_SUSPEND, CONFIG_HIBERNATION, CONFIG_PM_RUNTIME, CONFIG_XEN_SAVE_RESTORE is set and will clarify its meaning. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-03-15 00:43:15 +01:00
Linus Torvalds	869c34f520	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: ce4100: Set pci ops via callback instead of module init x86/mm: Fix pgd_lock deadlock x86/mm: Handle mm_fault_error() in kernel space x86: Don't check for BIOS corruption in first 64K when there's no need to	2011-03-14 15:19:09 -07:00
Daniel Kiper	06f521d5d6	xen/balloon: Removal of driver_pages Removal of driver_pages (I do not have seen any references to it). Signed-off-by: Daniel Kiper <dkiper@net-space.pl> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-14 11:34:19 -04:00
Stefano Stabellini	706cc9d2a4	xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set.. If there is no proper PFN value in the M2P for the MFN (so we get 0xFFFFF.. or 0x55555, or 0x0), we should consult the M2P override to see if there is an entry for this. [Note: we also consult the M2P override if the MFN is past our machine_to_phys size]. We consult the P2M with the PFN. In case the returned MFN is one of the special values: 0xFFF.., 0x5555 (which signify that the MFN can be either "missing" or it belongs to DOMID_IO) or the p2m(m2p(mfn)) != mfn, we check the M2P override. If we fail the M2P override check, we reset the PFN value to INVALID_P2M_ENTRY. Next we try to find the MFN in the P2M using the MFN value (not the PFN value) and if found, we know that this MFN is an identity value and return it as so. Otherwise we have exhausted all the posibilities and we return the PFN, which at this stage can either be a real PFN value found in the machine_to_phys.. array, or INVALID_P2M_ENTRY value. [v1: Added Review-by tag] Reviewed-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-14 11:17:14 -04:00
Konrad Rzeszutek Wilk	146c4e5117	xen/m2p: No need to catch exceptions when we know that there is no RAM .. beyound what we think is the end of memory. However there might be more System RAM - but assigned to a guest. Hence jump to the M2P override check and consult. [v1: Added Review-by tag] Reviewed-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-14 11:17:13 -04:00
Konrad Rzeszutek Wilk	fc25151d9a	xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set. Only enabled if XEN_DEBUG is enabled. We print a warning when: pfn_to_mfn(pfn) == pfn, but no VM_IO (_PAGE_IOMAP) flag set (and pfn is an identity mapped pfn) pfn_to_mfn(pfn) != pfn, and VM_IO flag is set. (ditto, pfn is an identity mapped pfn) [v2: Make it dependent on CONFIG_XEN_DEBUG instead of ..DEBUG_FS] [v3: Fix compiler warning] Reviewed-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-14 11:17:12 -04:00
Konrad Rzeszutek Wilk	2222e71bd6	xen/debugfs: Add 'p2m' file for printing out the P2M layout. We walk over the whole P2M tree and construct a simplified view of which PFN regions belong to what level and what type they are. Only enabled if CONFIG_XEN_DEBUG_FS is set. [v2: UNKN->UNKNOWN, use uninitialized_var] [v3: Rebased on top of mmu->p2m code split] [v4: Fixed the else if] Reviewed-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-14 11:17:11 -04:00
Konrad Rzeszutek Wilk	68df0da7f4	xen/setup: Set identity mapping for non-RAM E820 and E820 gaps. We walk the E820 region and start at 0 (for PV guests we start at ISA_END_ADDRESS) and skip any E820 RAM regions. For all other regions and as well the gaps we set them to be identity mappings. The reasons we do not want to set the identity mapping from 0-> ISA_END_ADDRESS when running as PV is b/c that the kernel would try to read DMI information and fail (no permissions to read that). There is a lot of gnarly code to deal with that weird region so we won't try to do a cleanup in this patch. This code ends up calling 'set_phys_to_identity' with the start and end PFN of the the E820 that are non-RAM or have gaps. On 99% of machines that means one big region right underneath the 4GB mark. Usually starts at 0xc0000 (or 0x80000) and goes to 0x100000. [v2: Fix for E820 crossing 1MB region and clamp the start] [v3: Squshed in code that does this over ranges] [v4: Moved the comment to the correct spot] [v5: Use the "raw" E820 from the hypervisor] [v6: Added Review-by tag] Reviewed-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-14 11:17:10 -04:00
Konrad Rzeszutek Wilk	c761779877	xen/mmu: WARN_ON when racing to swap middle leaf. The initial bootup code uses set_phys_to_machine quite a lot, and after bootup it would be used by the balloon driver. The balloon driver does have mutex lock so this should not be necessary - but just in case, add a WARN_ON if we do hit this scenario. If we do fail this, it is OK to continue as there is a backup mechanism (VM_IO) that can bypass the P2M and still set the _PAGE_IOMAP flags. [v2: Change from WARN to BUG_ON] [v3: Rebased on top of xen->p2m code split] [v4: Change from BUG_ON to WARN] Reviewed-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-14 11:17:09 -04:00
Konrad Rzeszutek Wilk	fb38923ead	xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN. If we find that the PFN is within the P2M as an identity PFN make sure to tack on the _PAGE_IOMAP flag. Reviewed-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-14 11:17:08 -04:00
Konrad Rzeszutek Wilk	f4cec35b0d	xen/mmu: Add the notion of identity (1-1) mapping. Our P2M tree structure is a three-level. On the leaf nodes we set the Machine Frame Number (MFN) of the PFN. What this means is that when one does: pfn_to_mfn(pfn), which is used when creating PTE entries, you get the real MFN of the hardware. When Xen sets up a guest it initially populates a array which has descending (or ascending) MFN values, as so: idx: 0, 1, 2 [0x290F, 0x290E, 0x290D, ..] so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list starts looking quite random. We graft this structure on our P2M tree structure and stick in those MFN in the leafs. But for all other leaf entries, or for the top root, or middle one, for which there is a void entry, we assume it is "missing". So pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY. We add the possibility of setting 1-1 mappings on certain regions, so that: pfn_to_mfn(0xc0000)=0xc0000 The benefit of this is, that we can assume for non-RAM regions (think PCI BARs, or ACPI spaces), we can create mappings easily b/c we get the PFN value to match the MFN. For this to work efficiently we introduce one new page p2m_identity and allocate (via reserved_brk) any other pages we need to cover the sides (1GB or 4MB boundary violations). All entries in p2m_identity are set to INVALID_P2M_ENTRY type (Xen toolstack only recognizes that and MFNs, no other fancy value). On lookup we spot that the entry points to p2m_identity and return the identity value instead of dereferencing and returning INVALID_P2M_ENTRY. If the entry points to an allocated page, we just proceed as before and return the PFN. If the PFN has IDENTITY_FRAME_BIT set we unmask that in appropriate functions (pfn_to_mfn). The reason for having the IDENTITY_FRAME_BIT instead of just returning the PFN is that we could find ourselves where pfn_to_mfn(pfn)==pfn for a non-identity pfn. To protect ourselves against we elect to set (and get) the IDENTITY_FRAME_BIT on all identity mapped PFNs. This simplistic diagram is used to explain the more subtle piece of code. There is also a digram of the P2M at the end that can help. Imagine your E820 looking as so: 1GB 2GB /-------------------+---------\/----\ /----------\ /---+-----\ \| System RAM \| Sys RAM \|\|ACPI\| \| reserved \| \| Sys RAM \| \-------------------+---------/\----/ \----------/ \---+-----/ ^- 1029MB ^- 2001MB [1029MB = 263424 (0x40500), 2001MB = 512256 (0x7D100), 2048MB = 524288 (0x80000)] And dom0_mem=max:3GB,1GB is passed in to the guest, meaning memory past 1GB is actually not present (would have to kick the balloon driver to put it in). When we are told to set the PFNs for identity mapping (see patch: "xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.") we pass in the start of the PFN and the end PFN (263424 and 512256 respectively). The first step is to reserve_brk a top leaf page if the p2m[1] is missing. The top leaf page covers 512^2 of page estate (1GB) and in case the start or end PFN is not aligned on 512^2*PAGE_SIZE (1GB) we loop on aligned 1GB PFNs from start pfn to end pfn. We reserve_brk top leaf pages if they are missing (means they point to p2m_mid_missing). With the E820 example above, 263424 is not 1GB aligned so we allocate a reserve_brk page which will cover the PFNs estate from 0x40000 to 0x80000. Each entry in the allocate page is "missing" (points to p2m_missing). Next stage is to determine if we need to do a more granular boundary check on the 4MB (or 2MB depending on architecture) off the start and end pfn's. We check if the start pfn and end pfn violate that boundary check, and if so reserve_brk a middle (p2m[x][y]) leaf page. This way we have a much finer granularity of setting which PFNs are missing and which ones are identity. In our example 263424 and 512256 both fail the check so we reserve_brk two pages. Populate them with INVALID_P2M_ENTRY (so they both have "missing" values) and assign them to p2m[1][2] and p2m[1][488] respectively. At this point we would at minimum reserve_brk one page, but could be up to three. Each call to set_phys_range_identity has at maximum a three page cost. If we were to query the P2M at this stage, all those entries from start PFN through end PFN (so 1029MB -> 2001MB) would return INVALID_P2M_ENTRY ("missing"). The next step is to walk from the start pfn to the end pfn setting the IDENTITY_FRAME_BIT on each PFN. This is done in 'set_phys_range_identity'. If we find that the middle leaf is pointing to p2m_missing we can swap it over to p2m_identity - this way covering 4MB (or 2MB) PFN space. At this point we do not need to worry about boundary aligment (so no need to reserve_brk a middle page, figure out which PFNs are "missing" and which ones are identity), as that has been done earlier. If we find that the middle leaf is not occupied by p2m_identity or p2m_missing, we dereference that page (which covers 512 PFNs) and set the appropriate PFN with IDENTITY_FRAME_BIT. In our example 263424 and 512256 end up there, and we set from p2m[1][2][256->511] and p2m[1][488][0->256] with IDENTITY_FRAME_BIT set. All other regions that are void (or not filled) either point to p2m_missing (considered missing) or have the default value of INVALID_P2M_ENTRY (also considered missing). In our case, p2m[1][2][0->255] and p2m[1][488][257->511] contain the INVALID_P2M_ENTRY value and are considered "missing." This is what the p2m ends up looking (for the E820 above) with this fabulous drawing: p2m /--------------\ /-----\ \| &mfn_list[0],\| /-----------------\ \| 0 \|------>\| &mfn_list[1],\| /---------------\ \| ~0, ~0, .. \| \|-----\| \| ..., ~0, ~0 \| \| ~0, ~0, [x]---+----->\| IDENTITY [@256] \| \| 1 \|---\ \--------------/ \| [p2m_identity]+\ \| IDENTITY [@257] \| \|-----\| \ \| [p2m_identity]+\\ \| .... \| \| 2 \|--\ \-------------------->\| ... \| \\ \----------------/ \|-----\| \ \---------------/ \\ \| 3 \|\ \ \\ p2m_identity \|-----\| \ \-------------------->/---------------\ /-----------------\ \| .. +->+ \| [p2m_identity]+-->\| ~0, ~0, ~0, ... \| \-----/ / \| [p2m_identity]+-->\| ..., ~0 \| / /---------------\ \| .... \| \-----------------/ / \| IDENTITY[@0] \| /-+-[x], ~0, ~0.. \| / \| IDENTITY[@256]\|<----/ \---------------/ / \| ~0, ~0, .... \| \| \---------------/ \| p2m_missing p2m_missing /------------------\ /------------\ \| [p2m_mid_missing]+---->\| ~0, ~0, ~0 \| \| [p2m_mid_missing]+---->\| ..., ~0 \| \------------------/ \------------/ where ~0 is INVALID_P2M_ENTRY. IDENTITY is (PFN \| IDENTITY_BIT) Reviewed-by: Ian Campbell <ian.campbell@citrix.com> [v5: Changed code to use ranges, added ASCII art] [v6: Rebased on top of xen->p2m code split] [v4: Squished patches in just this one] [v7: Added RESERVE_BRK for potentially allocated pages] [v8: Fixed alignment problem] [v9: Changed 1<<3X to 1<<BITS_PER_LONG-X] [v10: Copied git commit description in the p2m code + Add Review tag] [v11: Title had '2-1' - should be '1-1' mapping] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-14 11:16:41 -04:00
Sebastian Andrzej Siewior	03150171dc	x86: ce4100: Set pci ops via callback instead of module init Setting the pci ops on subsys initcall unconditionally will break multi platform kernels on anything except ce4100. Use x86_init.pci.init ops to call this only on real ce4100 platforms. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: sodaville@linutronix.de LKML-Reference: <20110314093340.GA21026@www.tglx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-14 15:13:23 +01:00
Thomas Gleixner	c0185808eb	x86: Enable forced interrupt threading support All non threadeable interrupts are marked. Enable forced irq threading support. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-12 14:12:02 +01:00
Thomas Gleixner	9bbbff25b3	x86: Mark low level interrupts IRQF_NO_THREAD These cannot be threaded. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-12 14:12:01 +01:00
Thomas Gleixner	517e498156	x86: Use generic show_interrupts Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-12 14:12:01 +01:00
Thomas Gleixner	1a0e62a49a	x86: ioapic: Avoid redundant lookup of irq_cfg The caller of ioapic_register_intr() has a pointer to the irq_cfg for the irq already. Hand it in to avoid a full lookup. In msi_compose_msg() the pointer to irq_cfg is already available. No need to look it up again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-12 14:12:01 +01:00
Thomas Gleixner	08221110e8	x86: ioapic: Use new move_irq functions Use the functions which take irq_data. We already have a pointer to irq_data. That avoids a sparse irq lookup in move_*_irq. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-12 14:12:01 +01:00
Thomas Gleixner	51c43ac6e4	x86: Use the proper accessors in fixup_irqs() Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-12 14:12:00 +01:00
Thomas Gleixner	5451ddc562	x86: ioapic: Use irq_data->state Use the state information in irq_data. That avoids a radix-tree lookup from apic_ack_level() and simplifies setup_ioapic_dest(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-12 14:12:00 +01:00
Thomas Gleixner	c60eaf25cd	x86: ioapic: Simplify irq chip and handler setup Use pointers instead of ugly multiline if/else constructs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-12 14:12:00 +01:00
Thomas Gleixner	2c778651f7	x86: Cleanup the genirq name space genirq is switching to a consistent name space for the irq related functions. Convert x86. Conversion was done with coccinelle. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-12 14:12:00 +01:00
Thomas Gleixner	cfe08bba1e	Merge branch 'x86/apic' into x86/irq Reason: Update to latest genirq code conflicts with pending apic changes Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-12 13:22:28 +01:00
Tejun Heo	56396e6823	x86-64, NUMA: Don't call numa_set_distanc() for all possible node combinations during emulation The distance transforming in numa_emulation() used to call numa_set_distance() for all MAX_NUMNODES * MAX_NUMNODES node combinations regardless of which are enabled. As numa_set_distance() ignores all out-of-bound distance settings, this doesn't cause any problem other than looping unnecessarily many times during boot. However, as MAX_NUMNODES * MAX_NUMNODES can be pretty high, update the code such that it iterates through only the enabled combinations. Yinghai Lu identified the issue and provided an initial patch to address the issue; however, the patch was incorrect in that it didn't build emulated distance table when there's no physical distance table and unnecessarily complex. http://thread.gmane.org/gmane.linux.kernel/1107986/focus=1107988 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Yinghai Lu <yinghai@kernel.org>	2011-03-12 11:41:10 +01:00
Alexander van Heukelum	371c394af2	x86, binutils, xen: Fix another wrong size directive The latest binutils (2.21.0.20110302/Ubuntu) breaks the build yet another time, under CONFIG_XEN=y due to a .size directive that refers to a slightly differently named (hence, to the now very strict and unforgiving assembler, non-existent) symbol. [ mingo: This unnecessary build breakage caused by new binutils version 2.21 gets escallated back several kernel releases spanning several years of Linux history, affecting over 130,000 upstream kernel commits (!), on CONFIG_XEN=y 64-bit kernels (i.e. essentially affecting all major Linux distro kernel configs). Git annotate tells us that this slight debug symbol code mismatch bug has been introduced in 2008 in commit `3d75e1b8`: `3d75e1b8` (Jeremy Fitzhardinge 2008-07-08 15:06:49 -0700 1231) ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct pt_regs) The 'bug' is just a slight assymetry in ENTRY()/END() debug-symbols sequences, with lots of assembly code between the ENTRY() and the END(): ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct pt_regs) ... END(do_hypervisor_callback) Human reviewers almost never catch such small mismatches, and binutils never even warned about it either. This new binutils version thus breaks the Xen build on all upstream kernels since v2.6.27, out of the blue. This makes a straightforward Git bisection of all 64-bit Xen-enabled kernels impossible on such binutils, for a bisection window of over hundred thousand historic commits. (!) This is a major fail on the side of binutils and binutils needs to turn this show-stopper build failure into a warning ASAP. ] Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jan Beulich <jbeulich@novell.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <kees.cook@canonical.com> LKML-Reference: <1299877178-26063-1-git-send-email-heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-12 09:02:29 +01:00
Konrad Rzeszutek Wilk	86b32122fd	xen/e820: Don't mark balloon memory as E820_UNUSABLE when running as guest and fix overflow. If we have a guest that asked for: memory=1024 maxmem=2048 Which means we want 1GB now, and create pagetables so that we can expand up to 2GB, we would have this E820 layout: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000080800000 (usable) Due to patch: "xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps." we would mark the memory past the 1GB mark as unusuable resulting in: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000040000000 (usable) [ 0.000000] Xen: 0000000040000000 - 0000000080800000 (unusable) which meant that we could not balloon up anymore. We could balloon the guest down. The fix is to run the code introduced by the above mentioned patch only for the initial domain. We will have to revisit this once we start introducing a modified E820 for PCI passthrough so that we can utilize the P2M identity code. We also fix an overflow by having UL instead of ULL on 32-bit machines. [v2: Ian pointed to the overflow issue] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-11 11:01:45 -05:00
Michel Lespinasse	8d7718aa08	futex: Sanitize futex ops argument types Change futex_atomic_op_inuser and futex_atomic_cmpxchg_inatomic prototypes to use u32 types for the futex as this is the data type the futex core code uses all over the place. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Darren Hart <darren@dvhart.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20110311025058.GD26122@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-11 12:23:31 +01:00
Michel Lespinasse	37a9d912b2	futex: Sanitize cmpxchg_futex_value_locked API The cmpxchg_futex_value_locked API was funny in that it returned either the original, user-exposed futex value OR an error code such as -EFAULT. This was confusing at best, and could be a source of livelocks in places that retry the cmpxchg_futex_value_locked after trying to fix the issue by running fault_in_user_writeable(). This change makes the cmpxchg_futex_value_locked API more similar to the get_futex_value_locked one, returning an error code and updating the original value through a reference argument. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> [tile] Acked-by: Tony Luck <tony.luck@intel.com> [ia64] Acked-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michal Simek <monstr@monstr.eu> [microblaze] Acked-by: David Howells <dhowells@redhat.com> [frv] Cc: Darren Hart <darren@dvhart.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20110311024851.GC26122@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-11 12:23:08 +01:00
Henrik Kretzschmar	25874a299e	x86: Clean up apic.c and apic.h This patch moves some functions and variables into init sections, makes a function static and removes some lines of cruft. Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <1299826956-8607-2-git-send-email-henne@nachtwindheim.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-11 08:13:59 +01:00
Henrik Kretzschmar	ec8df88f6b	x86: Remove superflous goal definition of tsc_sync The extra tsc_sync.o goal definition is superflous. CONFIG_X86_64_SMP depends on CONFIG_SMP and tsc_sync.o is already in the definition of CONFIG_SMP. Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <1299826956-8607-1-git-send-email-henne@nachtwindheim.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-11 08:13:59 +01:00
Linus Torvalds	b5562c9a55	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV: Initialize the broadcast assist unit base destination node id properly x86, numa: Fix numa_emulation code with memory-less node0 x86, build: Make sure mkpiggy fails on read error	2011-03-10 13:09:26 -08:00
Ian Campbell	f4d0635bf8	xen: events: refactor GSI pirq bindings functions Following the example set by xen_allocate_pirq_msi and xen_bind_pirq_msi_to_irq: xen_allocate_pirq becomes xen_allocate_pirq_gsi and now only allocates a pirq number and does not bind it. xen_map_pirq_gsi becomes xen_bind_pirq_gsi_to_irq and binds an existing pirq. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-10 14:48:37 -05:00
Ian Campbell	71eef7d1e3	xen: events: remove dom0 specific xen_create_msi_irq The function name does not distinguish it from xen_allocate_pirq_msi (which operates on domU and pvhvm domains rather than dom0). Hoist domain 0 specific functionality up into the only caller leaving functionality common to all guest types in xen_bind_pirq_msi_to_irq. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-10 14:44:45 -05:00
Ian Campbell	ca1d8fe952	xen: events: use xen_bind_pirq_msi_to_irq from xen_create_msi_irq Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-10 14:44:44 -05:00
Ian Campbell	f420e010ed	xen: events: push set_irq_msi down into xen_create_msi_irq Makes the tail end of this function look even more like xen_bind_pirq_msi_to_irq. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-10 14:44:43 -05:00
Ian Campbell	bf480d952b	xen: events: separate MSI PIRQ allocation from PIRQ binding to IRQ Split the binding aspect of xen_allocate_pirq_msi out into a new xen_bind_pirq_to_irq function. In xen_hvm_setup_msi_irq when allocating a pirq write the MSI message to signal the PIRQ as soon as the pirq is obtained. There is no way to free the pirq back so if the subsequent binding to an IRQ fails we want to ensure that we will reuse the PIRQ next time rather than leak it. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-10 14:44:40 -05:00
Ian Campbell	9a626612c2	xen: pci: collapse apic_register_gsi_xen_hvm and xen_hvm_register_pirq apic_register_gsi_xen_hvm is a tiny wrapper around xen_hvm_register_pirq. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-10 14:44:38 -05:00
Ian Campbell	4b41df7f6e	xen: events: return irq from xen_allocate_pirq_msi consistent with other similar functions. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-10 14:44:37 -05:00
Ian Campbell	bb5d079aef	xen: events: drop XEN_ALLOC_IRQ flag to xen_allocate_pirq_msi All callers pass this flag so it is pointless. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-10 14:44:35 -05:00
Ian Campbell	260a7d4cfd	xen: pci: only define xen_initdom_setup_msi_irqs if CONFIG_XEN_DOM0 Fixes: CC arch/x86/pci/xen.o arch/x86/pci/xen.c:183: warning: 'xen_initdom_setup_msi_irqs' defined but not used Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-10 14:44:33 -05:00
Konrad Rzeszutek Wilk	8448f0119a	Merge branch 'stable/pcifront-fixes' into stable/irq.cleanup * stable/pcifront-fixes: pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code. pci/xen: Cleanup: convert int** to int[] pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq xen-pcifront: Sanity check the MSI/MSI-X values xen-pcifront: don't use flush_scheduled_work()	2011-03-10 14:42:11 -05:00
Konrad Rzeszutek Wilk	8054c3634c	Merge branch 'stable/irq.rework' into stable/irq.cleanup * stable/irq.rework: xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well. xen: Use IRQF_FORCE_RESUME xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend. xen: Fix compile error introduced by "switch to new irq_chip functions" xen: Switch to new irq_chip functions xen: Remove stale irq_chip.end xen: events: do not free legacy IRQs xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges. xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq xen:events: move find_unbound_irq inside CONFIG_PCI_MSI xen: handled remapped IRQs when enabling a pcifront PCI device. genirq: Add IRQF_FORCE_RESUME	2011-03-10 14:41:43 -05:00
Steven Rostedt	722b3c7469	ftrace/graph: Trace function entry before updating index Currently the index to the ret_stack is updated and the real return address is saved in the ret_stack. Then we call the trace function. The trace function could decide that it doesn't want to trace this function (ex. set_graph_function does not match) and it will return 0 which means not to trace this call. The normal function graph tracer has this code: if (!(trace->depth \|\| ftrace_graph_addr(trace->func)) \|\| ftrace_graph_ignore_irqs()) return 0; What this states is, if the trace depth (which is curr_ret_stack) is zero (top of nested functions) then test if we want to trace this function. If this function is not to be traced, then return 0 and the rest of the function graph tracer logic will not trace this function. The problem arises when an interrupt comes in after we updated the curr_ret_stack. The next function that gets called will have a trace->depth of 1. Which fools this trace code into thinking that we are in a nested function, and that we should trace. This causes interrupts to be traced when they should not be. The solution is to trace the function first and then update the ret_stack. Reported-by: zhiping zhong <xzhong86@163.com> Reported-by: wu zhangjin <wuzhangjin@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-03-10 10:34:43 -05:00
David Sharp	d5bf2ff072	tracing: Fix event alignment: kvm:kvm_hv_hypercall Acked-by: Avi Kivity <avi@redhat.com> Signed-off-by: David Sharp <dhsharp@google.com> LKML-Reference: <1291421609-14665-8-git-send-email-dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-03-10 10:34:24 -05:00
Andrea Arcangeli	a79e53d856	x86/mm: Fix pgd_lock deadlock It's forbidden to take the page_table_lock with the irq disabled or if there's contention the IPIs (for tlb flushes) sent with the page_table_lock held will never run leading to a deadlock. Nobody takes the pgd_lock from irq context so the _irqsave can be removed. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> LKML-Reference: <201102162345.p1GNjMjm021738@imap1.linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-10 09:41:57 +01:00
Andrey Vagin	f86268549f	x86/mm: Handle mm_fault_error() in kernel space mm_fault_error() should not execute oom-killer, if page fault occurs in kernel space. E.g. in copy_from_user()/copy_to_user(). This would happen if we find ourselves in OOM on a copy_to_user(), or a copy_from_user() which faults. Without this patch, the kernels hangs up in copy_from_user(), because OOM killer sends SIG_KILL to current process, but it can't handle a signal while in syscall, then the kernel returns to copy_from_user(), reexcute current command and provokes page_fault again. With this patch the kernel return -EFAULT from copy_from_user(). The code, which checks that page fault occurred in kernel space, has been copied from do_sigbus(). This situation is handled by the same way on powerpc, xtensa, tile, ... Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> LKML-Reference: <201103092322.p29NMNPH001682@imap1.linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-10 09:41:40 +01:00
Naga Chumbalkar	1f858ef2fb	[CPUFREQ] pcc-cpufreq: don't load driver if get_freq fails during init. Return 0 on failure. This will cause the initialization of the driver to fail and prevent the driver from loading if the BIOS cannot handle the PCC interface command to "get frequency". Otherwise, the driver will load and display a very high value like "4294967274" (which is actually -EINVAL) for frequency: # cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq 4294967274 Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> CC: stable@kernel.org Signed-off-by: Dave Jones <davej@redhat.com>	2011-03-09 12:33:15 -05:00
Naga Chumbalkar	a7bd1dafdc	x86: Don't check for BIOS corruption in first 64K when there's no need to Due to commit `781c5a67f1` it is likely that the number of areas to scan for BIOS corruption is 0 -- especially when the first 64K is already reserved (X86_RESERVE_LOW is 64K by default). If that's the case then don't set up the scan. Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: <stable@kernel.org> LKML-Reference: <20110225202838.2229.71011.sendpatchset@nchumbalkar.americas.hpqcorp.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-09 16:36:41 +01:00
Cliff Wickman	5471262290	x86, UV: Initialize the broadcast assist unit base destination node id properly The BAU's initialization of the broadcast description header is lacking the coherence domain (high bits) in the nasid. This causes a catastrophic system failure when running on a system with multiple coherence domains. Signed-off-by: Cliff Wickman <cpw@sgi.com> LKML-Reference: <E1PxKBB-0005F0-3U@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-09 16:36:16 +01:00
Jan Beulich	c49aa5bd13	x86: Remove dead config option X86_CPU This isn't being referenced anywhere, and the selects done from it can be easily done together with all the other X86 ones. v2: Also adjust UML's Kconfig.x86. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4D7603DA02000078000351C1@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-09 10:39:36 +01:00
Ingo Molnar	c8b44163b7	Merge commit 'v2.6.38-rc8' into x86/asm Merge reason: Update with the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-09 10:38:59 +01:00
Sedat Dilek	2ae9d293b1	x86: Fix binutils-2.21 symbol related build failures New binutils version 2.21.0.20110302-1 started checking that the symbol parameter to the .size directive matches the entry name's symbol parameter, unearthing two mismatches: AS arch/x86/kernel/acpi/wakeup_rm.o arch/x86/kernel/acpi/wakeup_rm.S: Assembler messages: arch/x86/kernel/acpi/wakeup_rm.S:12: Error: .size expression with symbol `wakeup_code_start' does not evaluate to a constant arch/x86/kernel/entry_32.S: Assembler messages: arch/x86/kernel/entry_32.S:1421: Error: .size expression with symbol `apf_page_fault' does not evaluate to a constant The problem was discovered while using Debian's binutils (2.21.0.20110302-1) and experimenting with binutils from upstream. Thanks Alexander and H.J. for the vital help. Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Rafael J. Wysocki <rjw@sisk.pl> LKML-Reference: <1299620364-21644-1-git-send-email-sedat.dilek@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-09 10:25:45 +01:00
Jiri Olsa	2a8247a260	kprobes: Disabling optimized kprobes for entry text section You can crash the kernel (with root/admin privileges) using kprobe tracer by running: echo "p system_call_after_swapgs" > ./kprobe_events echo 1 > ./events/kprobes/enable The reason is that at the system_call_after_swapgs label, the kernel stack is not set up. If optimized kprobes are enabled, the user space stack is being used in this case (see optimized kprobe template) and this might result in a crash. There are several places like this over the entry code (entry_$BIT). As it seems there's no any reasonable/maintainable way to disable only those places where the stack is not ready, I switched off the whole entry code from kprobe optimizing. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: acme@redhat.com Cc: fweisbec@gmail.com Cc: ananth@in.ibm.com Cc: davem@davemloft.net Cc: a.p.zijlstra@chello.nl Cc: eric.dumazet@gmail.com Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <1298298313-5980-3-git-send-email-jolsa@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-08 17:22:12 +01:00
Jiri Olsa	ea7145477a	x86: Separate out entry text section Put x86 entry code into a separate link section: .entry.text. Separating the entry text section seems to have performance benefits - caused by more efficient instruction cache usage. Running hackbench with perf stat --repeat showed that the change compresses the icache footprint. The icache load miss rate went down by about 15%: before patch: 19417627 L1-icache-load-misses ( +- 0.147% ) after patch: 16490788 L1-icache-load-misses ( +- 0.180% ) The motivation of the patch was to fix a particular kprobes bug that relates to the entry text section, the performance advantage was discovered accidentally. Whole perf output follows: - results for current tip tree: Performance counter stats for './hackbench/hackbench 10' (500 runs): 19417627 L1-icache-load-misses ( +- 0.147% ) 2676914223 instructions # 0.497 IPC ( +- 0.079% ) 5389516026 cycles ( +- 0.144% ) 0.206267711 seconds time elapsed ( +- 0.138% ) - results for current tip tree with the patch applied: Performance counter stats for './hackbench/hackbench 10' (500 runs): 16490788 L1-icache-load-misses ( +- 0.180% ) 2717734941 instructions # 0.502 IPC ( +- 0.079% ) 5414756975 cycles ( +- 0.148% ) 0.206747566 seconds time elapsed ( +- 0.137% ) Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: masami.hiramatsu.pt@hitachi.com Cc: ananth@in.ibm.com Cc: davem@davemloft.net Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20110307181039.GB15197@jolsa.redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-08 17:22:11 +01:00
Ingo Molnar	86cb2ec7b2	Merge commit 'v2.6.38-rc8' into perf/core Merge reason: Merge latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-08 17:21:52 +01:00
David Howells	ee009e4a0d	KEYS: Add an iovec version of KEYCTL_INSTANTIATE Add a keyctl op (KEYCTL_INSTANTIATE_IOV) that is like KEYCTL_INSTANTIATE, but takes an iovec array and concatenates the data in-kernel into one buffer. Since the KEYCTL_INSTANTIATE copies the data anyway, this isn't too much of a problem. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2011-03-08 11:17:22 +11:00
Jan Beulich	ac23f25355	x86: Really print supported CPUs if PROCESSOR_SELECT=y I'm sure it was a mere oversight that the CONFIG_ prefixes are missing. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Dave Jones <davej@redhat.com> LKML-Reference: <4D7118D30200007800034F79@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-05 09:29:45 +01:00
Ingo Molnar	ca764aaf02	Merge branch 'x86-mm' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into x86/mm	2011-03-05 07:32:45 +01:00
Lin Ming	6909262429	perf: Avoid the percore allocations if the CPU is not HT capable Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1299119690-13991-5-git-send-email-ming.m.lin@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-05 07:12:16 +01:00
Tejun Heo	078a198906	x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() Undetermined entries in emu_nid_to_phys[] are filled with zero assuming that physical node 0 is always online; however, this might not be true depending on hardware configuration. Find a physical node which is actually online and use it instead. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: David Rientjes <rientjes@google.com> LKML-Reference: <alpine.DEB.2.00.1103020628210.31626@chino.kir.corp.google.com>	2011-03-04 16:32:37 +01:00
Yinghai Lu	3b28cf32cc	x86, numa: Fix numa_emulation code with memory-less node0 This crash happens on a system that does not have RAM on node0. When numa_emulation is compiled in, and: 1. we boot the system without numa=fake... 2. or we boot the system with numa=fake=128 to make emulation fail we will get: [ 0.076025] ------------[ cut here ]------------ [ 0.080004] kernel BUG at arch/x86/mm/numa_64.c:788! [ 0.080004] invalid opcode: 0000 [#1] SMP [...] need to use early_cpu_to_node() directly, because cpu_to_apicid and apicid_to_node will return node0 that is not onlined. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> LKML-Reference: <4D6ECF72.5010308@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-04 15:20:19 +01:00
David Rientjes	c09cedf4f7	x86-64, NUMA: Clean up initmem_init() This patch cleans initmem_init() so that it is more readable and doesn't use an unnecessary array of function pointers to convolute the flow of the code. It also makes it obvious that dummy_numa_init() will always succeed (and documents that requirement) so that the existing BUG() is never actually reached. No functional change. -tj: Updated comment for dummy_numa_init() slightly. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-03-04 15:17:21 +01:00
Yinghai Lu	51b361b400	x86-64, NUMA: Fix numa_emulation code with node0 without RAM On one system that does not have RAM on node0. When numa_emulation is compiled in, and 1. boot system without numa=fake... 2. or boot system with numa=fake=128 to make emulation fail will get: [ 0.092026] ------------[ cut here ]------------ [ 0.096005] kernel BUG at arch/x86/mm/numa_emulation.c:439! [ 0.096005] invalid opcode: 0000 [#1] SMP [ 0.096005] last sysfs file: [ 0.096005] CPU 0 [ 0.096005] Modules linked in: [ 0.096005] [ 0.096005] Pid: 0, comm: swapper Not tainted 2.6.38-rc6-tip-yh-03869-gcb0491d-dirty #684 Sun Microsystems Sun Fire X4240/Sun Fire X4240 [ 0.096005] RIP: 0010:[<ffffffff81cdc65b>] [<ffffffff81cdc65b>] numa_add_cpu+0x56/0xcf [ 0.096005] RSP: 0000:ffffffff82437ed8 EFLAGS: 00010246 ... [ 0.096005] Call Trace: [ 0.096005] [<ffffffff81cd7931>] identify_cpu+0x2d7/0x2df [ 0.096005] [<ffffffff827e54fa>] identify_boot_cpu+0x10/0x30 [ 0.096005] [<ffffffff827e5704>] check_bugs+0x9/0x2d [ 0.096005] [<ffffffff827dceda>] start_kernel+0x3d7/0x3f1 [ 0.096005] [<ffffffff827dc2cc>] x86_64_start_reservations+0x9c/0xa0 [ 0.096005] [<ffffffff827dc4ad>] x86_64_start_kernel+0x1dd/0x1e8 [ 0.096005] Code: 74 06 48 8d 04 90 eb 0f 48 c7 c0 30 d9 00 00 48 03 04 d5 90 0f 60 82 8b 00 83 f8 ff 74 0d 0f a3 05 8b 7e 92 00 19 d2 85 d2 75 02 <0f> 0b 48 98 be 00 01 00 00 48 c7 c7 e0 44 60 82 44 8b 2c 85 e0 [ 0.096005] RIP [<ffffffff81cdc65b>] numa_add_cpu+0x56/0xcf [ 0.096005] RSP <ffffffff82437ed8> [ 0.096026] ---[ end trace a7919e7f17c0a725 ]--- We need to use early_cpu_to_node() directly, because numa_cpu_node() will return node0 that is not onlined. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-03-04 14:49:28 +01:00
Andi Kleen	e994d7d23a	perf: Fix LLC-* events on Intel Nehalem/Westmere On Intel Nehalem and Westmere CPUs the generic perf LLC-* events count the L2 caches, not the real L3 LLC - this was inconsistent with behavior on other CPUs. Fixing this requires the use of the special OFFCORE_RESPONSE events which need a separate mask register. This has been implemented by the previous patch, now use this infrastructure to set correct events for the LLC-* on Nehalem and Westmere. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1299119690-13991-3-git-send-email-ming.m.lin@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-04 11:32:53 +01:00
Andi Kleen	a7e3ed1e47	perf: Add support for supplementary event registers Change logs against Andi's original version: - Extends perf_event_attr:config to config{,1,2} (Peter Zijlstra) - Fixed a major event scheduling issue. There cannot be a ref++ on an event that has already done ref++ once and without calling put_constraint() in between. (Stephane Eranian) - Use thread_cpumask for percore allocation. (Lin Ming) - Use MSR names in the extra reg lists. (Lin Ming) - Remove redundant "c = NULL" in intel_percore_constraints - Fix comment of perf_event_attr::config1 Intel Nehalem/Westmere have a special OFFCORE_RESPONSE event that can be used to monitor any offcore accesses from a core. This is a very useful event for various tunings, and it's also needed to implement the generic LLC-* events correctly. Unfortunately this event requires programming a mask in a separate register. And worse this separate register is per core, not per CPU thread. This patch: - Teaches perf_events that OFFCORE_RESPONSE needs extra parameters. The extra parameters are passed by user space in the perf_event_attr::config1 field. - Adds support to the Intel perf_event core to schedule per core resources. This adds fairly generic infrastructure that can be also used for other per core resources. The basic code has is patterned after the similar AMD northbridge constraints code. Thanks to Stephane Eranian who pointed out some problems in the original version and suggested improvements. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1299119690-13991-2-git-send-email-ming.m.lin@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-04 11:32:53 +01:00
Stephane Eranian	17e3162972	perf_events: Update PEBS event constraints This patch updates PEBS event constraints for Intel Atom, Nehalem, Westmere. This patch also reorganizes the PEBS format/constraint detection code. It is now based on processor model and not PEBS format. Two processors may use the same PEBS format without have the same list of PEBS events. In this second version, we simplified the initialization of the PEBS constraints by leveraging the existing switch() statement in perf_event_intel.c. We also renamed the constraint tables to be more consistent with regular constraints. In this 3rd version, we drop BR_INST_RETIRED.MISPRED from Intel Atom as it does not seem to work. Use MISPREDICTED_BRANCH_RETIRED instead. Also add FP_ASSIST.* o both Intel Nehalem and Westmere. I misssed those in the earlier patches. Events were tested using libpfm4 perf_examples. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4d6e6b02.815bdf0a.637b.07a7@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-04 11:32:52 +01:00
Ingo Molnar	888a8a3e9d	Merge branch 'perf/urgent' into perf/core Merge reason: Pick up updates before queueing up dependent patches. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-04 10:40:25 +01:00
Tejun Heo	f891125028	x86-64, NUMA: Revert NUMA affine page table allocation This patch reverts NUMA affine page table allocation added by commit `1411e0ec31` (x86-64, numa: Put pgtable to local node memory). The commit made an undocumented change where the kernel linear mapping strictly follows intersection of e820 memory map and NUMA configuration. If the physical memory configuration has holes or NUMA nodes are not properly aligned, this leads to using unnecessarily smaller mapping size which leads to increased TLB pressure. For details, http://thread.gmane.org/gmane.linux.kernel/1104672 Patches to fix the problem have been proposed but the underlying code needs more cleanup and the approach itself seems a bit heavy handed and it has been determined to revert the feature for now and come back to it in the next developement cycle. http://thread.gmane.org/gmane.linux.kernel/1105959 As init_memory_mapping_high() callsites have been consolidated since the commit, reverting is done manually. Also, the RED-PEN comment in arch/x86/mm/init.c is not restored as the problem no longer exists with memblock based top-down early memory allocation. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de>	2011-03-04 10:26:36 +01:00
Ian Campbell	f611f2da99	xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend. The patches missed an indirect use of IRQF_NO_SUSPEND pulled in via IRQF_TIMER. The following patch fixes the issue. With this fixlet PV guest migration works just fine. I also booted the entire series as a dom0 kernel and it appeared fine. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-03 12:00:31 -05:00
Ian Campbell	3f2a230caf	xen: handled remapped IRQs when enabling a pcifront PCI device. This happens to not be an issue currently because we take pains to try to ensure that the GSI-IRQ mapping is 1-1 in a PV guest and that regular event channels do not clash. However a subsequent patch is going to break this 1-1 mapping. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org>	2011-03-03 11:56:57 -05:00
Konrad Rzeszutek Wilk	6eaa412f27	xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY. With this patch, we diligently set regions that will be used by the balloon driver to be INVALID_P2M_ENTRY and under the ownership of the balloon driver. We are OK using the __set_phys_to_machine as we do not expect to be allocating any P2M middle or entries pages. The set_phys_to_machine has the side-effect of potentially allocating new pages and we do not want that at this stage. We can do this because xen_build_mfn_list_list will have already allocated all such pages up to xen_max_p2m_pfn. We also move the check for auto translated physmap down the stack so it is present in __set_phys_to_machine. [v2: Rebased with mmu->p2m code split] Reviewed-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-03 11:52:48 -05:00
Borislav Petkov	84fd1d35cc	x86, amd-nb: Misc cleanliness fixes Make functions used strictly in bool context return bool. Also, fixup used types and comments, and make a local function static, while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Borislav Petkov <bp@amd64.org> LKML-Reference: <20110303115932.GA8603@aftab> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-03 13:06:20 +01:00
Jan Beulich	d04c579f97	x86: Work around old gas bug Add extra parentheses around a couple of definitions introduced by "x86: Cleanup vector usage" and used in assembly macro arguments, and remove spaces. Without that old (2.16.1) gas would see more macro arguments than were actually specified. Reported-and-tested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <4D6F81B10200007800034B0B@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-03 12:47:08 +01:00
Linus Torvalds	f7d222ea2a	Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6 * 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6: of/promtree: allow DT device matching by fixing 'name' brokenness (v5) x86: OLPC: have prom_early_alloc BUG rather than return NULL of/flattree: Drop an uninteresting message to pr_debug level of: Add missing of_address.h to xilinx ehci driver	2011-03-02 20:01:57 -08:00
Linus Torvalds	ebff7c92ab	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] p4-clockmod: print EST-capable warning message only once [CPUFREQ] fix BUG on cpufreq policy init failure [CPUFREQ] Fix another notifier leak in powernow-k8. [CPUFREQ] Missing "unregister_cpu_notifier" in powernow-k8.c	2011-03-02 19:58:31 -08:00
Linus Torvalds	c7b01d3dc2	Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: intel_idle: disable Atom/Lincroft HW C-state auto-demotion intel_idle: disable NHM/WSM HW C-state auto-demotion	2011-03-02 18:08:03 -08:00
Andres Salomon	60cba5a57b	x86: OLPC: have prom_early_alloc BUG rather than return NULL ..similar to what sparc's prom_early_alloc does. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2011-03-02 13:45:18 -07:00
Tejun Heo	eb8c1e2c83	x86-64, NUMA: Better explain numa_distance handling Handling of out-of-bounds distances and allocation failure can use better documentation. Add it. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Acked-by: David Rientjes <rientjes@google.com>	2011-03-02 16:34:21 +01:00
Yinghai Lu	ce0033307f	x86-64, NUMA: Fix distance table handling NUMA distance table handling has the following problems. * numa_reset_distance() uses numa_distance * sizeof(numa_distance[0]) as the table size when it should be using the square of numa_distance. * The same size miscalculation when allocation space for phys_dist in numa_emulation(). * In numa_emulation(), phys_dist must be reserved; otherwise, the new emulated distance table may overlap it. Fix them and, while at it, take numa_distance_cnt resetting in numa_reset_distance() out of the if block to simplify the code a bit. David Rientjes reported incorrect handling of distance table during emulation. -tj: Edited out numa_alloc_distance() related changes which weren't necessary and rewrote patch description. -v2: Ingo was unhappy with 80-column limit induced linebreaks. Let lines run over 80-column. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reported-by: David Rientjes <rientjes@google.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: David Rientjes <rientjes@google.com>	2011-03-02 16:34:09 +01:00
Lin Ming	b06b3d4969	perf, x86: Add Intel SandyBridge CPU support This patch adds basic SandyBridge support, including hardware cache events and PEBS events support. It has been tested on SandyBridge CPUs with perf stat and also with PEBS based profiling - both work fine. The patch does not affect other models. v2 -> v3: - fix PEBS event 0xd0 with right umask combinations - move snb pebs constraint assignment to intel_pmu_init v1 -> v2: - add more raw and PEBS events constraints - use offcore events for LLC-* cache events - remove the call to Nehalem workaround enable_all function Signed-off-by: Lin Ming <ming.m.lin@intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <andi@firstfloor.org> LKML-Reference: <1299072424.2175.24.camel@localhost> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-02 14:37:02 +01:00
Jan Beulich	e938c287ea	x86: Fix a bogus unwind annotation in lib/semaphore_32.S 'simple' would have required specifying current frame address and return address location manually, but that's obviously not the case (and not necessary) here. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4D6D1082020000780003454C@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-02 08:16:44 +01:00
Daniel J Blueman	6670e9cdaf	x86, build: Make sure mkpiggy fails on read error Ensure build doesn't silently continue despite read failure, addressing a warning due to the unchecked call. Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> LKML-Reference: <AANLkTimxxTMU3=4ry-_zbY6v1xiDi+hW9y1RegTr8vLK@mail.gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-03-01 16:32:03 -08:00
Naga Chumbalkar	853cee26e2	[CPUFREQ] p4-clockmod: print EST-capable warning message only once Print the message only once. I see it 16 times on a 2P box with 16 logical CPUs. Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>	2011-03-01 18:49:45 -05:00
Dave Jones	a536b126f2	[CPUFREQ] Fix another notifier leak in powernow-k8. Do the notifier registration later, so we don't have to worry about freeing it if we fail the msr allocation. Signed-off-by: Dave Jones <davej@redhat.com>	2011-03-01 18:49:44 -05:00
Neil Brown	ac81831449	[CPUFREQ] Missing "unregister_cpu_notifier" in powernow-k8.c It appears that when powernow-k8 finds that No compatible ACPI _PSS objects found. and suggests Try again with latest BIOS. it fails the module load, but does not unregister the cpu_notifier that was registered in powernowk8_init This ends up leaving freed memory on the cpu notifier list for some other poor module (e.g. md/raid5) to come along and trip over. The following might be a partial fix, but I suspect there is probably other clean-up that is needed. ( https://bugzilla.novell.com/show_bug.cgi?id=655215 has full dmesg traces). Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Neil Brown <neilb@suse.de>	2011-03-01 18:49:44 -05:00
Jan Beulich	039e13890b	x86: Remove unused bits from lib/thunk_*.S Some of the items removed were apparently never used, others simply didn't get removed with their last user. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4D6BD3A002000078000341F1@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-28 18:06:22 +01:00
Jan Beulich	60cf637a13	x86: Use {push,pop}_cfi in more places Cleaning up and shortening code... Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4D6BD35002000078000341DA@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-28 18:06:22 +01:00
Jan Beulich	39f2205e1a	x86-64: Add CFI annotations to lib/rwsem_64.S These weren't part of the initial commit of this code. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4D6BCDFF02000078000341B0@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-28 18:06:21 +01:00
Don Zickus	299c56966a	x86: Use u32 instead of long to set reset vector back to 0 A customer of ours, complained that when setting the reset vector back to 0, it trashed other data and hung their box. They noticed when only 4 bytes were set to 0 instead of 8, everything worked correctly. Mathew pointed out: \| \| We're supposed to be resetting trampoline_phys_low and \| trampoline_phys_high here, which are two 16-bit values. \| Writing 64 bits is definitely going to overwrite space \| that we're not supposed to be touching. \| So limit the area modified to u32. Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Matthew Garrett <mjg@redhat.com> Cc: <stable@kernel.org> LKML-Reference: <1297139100-424-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-28 16:22:18 +01:00
Christoph Lameter	b9ec40af0e	percpu, x86: Add arch-specific this_cpu_cmpxchg_double() support Support this_cpu_cmpxchg_double() using the cmpxchg16b and cmpxchg8b instructions. -tj: s/percpu_cmpxchg16b/percpu_cmpxchg16b_double/ for consistency and other cosmetic changes. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-02-28 11:20:49 +01:00
Stratos Psomadakis	7bf04be8f4	x86, asm: Cleanup unnecssary macros in asm-offsets.c PAGE_SIZE_asm, PAGE_SHIFT_asm, THREAD_SIZE_asm can be safely removed from asm-offsets.c, and be replaced by their non-'_asm' counterparts in the code that uses them, since the _AC macro defined in include/linux/const.h makes PAGE_SIZE/PAGE_SHIFT/THREAD_SIZE work with as. Signed-off-by: Stratos Psomadakis <psomas@cslab.ece.ntua.gr> LKML-Reference: <1298666774-17646-2-git-send-email-psomas@cslab.ece.ntua.gr> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-02-25 16:37:32 -08:00
Linus Torvalds	958ede7f1b	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systems x86/mrst: Fix apb timer rating when lapic timer is used x86: Fix reboot problem on VersaLogic Menlow boards	2011-02-25 14:02:33 -08:00
Ian Campbell	03c8142bd2	xen: suspend: add "arch" to pre/post suspend hooks xen_pre_device_suspend is unused on ia64. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-02-25 16:43:12 +00:00
Ian Campbell	a8b7458363	xen: switch to new schedop hypercall by default. Rename old interface to sched_op_compat and rename sched_op_new to simply sched_op. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-02-25 16:43:10 +00:00
Ian Campbell	8e15597fa4	xen: use new schedop interface for suspend Take the opportunity to comment on the semantics of the PV guest suspend hypercall arguments. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-02-25 16:43:10 +00:00
Stefano Stabellini	e057a4b6e0	xen: fix compile issue if XEN is enabled but XEN_PVHVM is disabled Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2011-02-25 16:43:06 +00:00
Stefano Stabellini	99bbb3a84a	xen: PV on HVM: support PV spinlocks and IPIs Initialize PV spinlocks on boot CPU right after native_smp_prepare_cpus (that switch to APIC mode and initialize APIC routing); on secondary CPUs on CPU_UP_PREPARE. Enable the usage of event channels to send and receive IPIs when running as a PV on HVM guest. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2011-02-25 16:43:06 +00:00
Stefano Stabellini	cff520b9c2	xen: do not use xen_info on HVM, set pv_info name to "Xen HVM" Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>	2011-02-25 16:43:04 +00:00
Thomas Gleixner	a906fdaacc	x86: dt: Cleanup local apic setup Up to now we force enable the local apic in the devicetree setup uncoditionally and set smp_found_config unconditionally to 1 when a devicetree blob is available. This breaks, when local apic is disabled in the Kconfig. Make it consistent by initializing device tree explicitely before smp_get_config() so a non lapic configuration could be used as well. To be functional that would require to implement PIT as an interrupt host, but the only user of this code until now is ce4100 which requires apics to be available. So we leave this up to those who need it. Tested-by: Sebastian Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-02-25 16:18:52 +01:00
David Rientjes	1f565a896e	x86-64, NUMA: Fix size of numa_distance array numa_distance should be sized like the SLIT, an NxN matrix where N is the highest node id + 1. This patch fixes the calculation to avoid overflowing the array on the subsequent iteration. -tj: The original patch used last index to calculate size. Yinghai pointed out it should be incremented so it is the number of elements instead of the last index to calculate the size of the table. Updated accordingly. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-02-25 10:10:54 +01:00
Linus Torvalds	86e2fe9ff3	Merge branch 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Advance instruction pointer in dr_intercept	2011-02-24 12:22:14 -08:00
Andreas Herrmann	7f74f8f28a	x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systems On some SB800 systems polarity for IOAPIC pin2 is wrongly specified as low active by BIOS. This caused system hangs after resume from S3 when HPET was used in one-shot mode on such systems because a timer interrupt was missed (HPET signal is high active). For more details see: http://marc.info/?l=linux-kernel&m=129623757413868 Tested-by: Manoj Iyer <manoj.iyer@canonical.com> Tested-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: stable@kernel.org # 37.x, 32.x LKML-Reference: <20110224145346.GD3658@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-24 20:30:21 +01:00
Rafael J. Wysocki	f1a2003e22	ACPI / PM: Merge do_suspend_lowlevel() into acpi_save_state_mem() The function do_suspend_lowlevel() is specific to x86 and defined in assembly code, so it should be called from the x86 low-level suspend code rather than from acpi_suspend_enter(). Merge do_suspend_lowlevel() into the x86's acpi_save_state_mem() and change the name of the latter to acpi_suspend_lowlevel(), so that the function's purpose is better reflected by its name. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-02-24 19:58:54 +01:00
Rafael J. Wysocki	c41b93fb85	ACPI / PM: Drop acpi_restore_state_mem() The function acpi_restore_state_mem() has never been and most likely never will be used, so remove it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-02-24 19:58:54 +01:00
Yinghai Lu	d1b19426b0	x86: Rename e820_table_* to pgt_buf_* e820_table_{start\|end\|top}, which are used to buffer page table allocation during early boot, are now derived from memblock and don't have much to do with e820. Change the names so that they reflect what they're used for. This patch doesn't introduce any behavior change. -v2: Ingo found that earlier patch "x86: Use early pre-allocated page table buffer top-down" caused crash on 32bit and needed to be dropped. This patch was updated to reflect the change. -tj: Updated commit description. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-02-24 14:52:18 +01:00
Sebastian Andrzej Siewior	4a66b1d95a	x86: dt: Fix OLPC=y/INTEL_CE=n build Both OLPC and CE4100 activate CONFIG_OF. OLPC uses PROMTREE while CE uses FLATTREE. Compiling for OLPC only breaks due to missing flat tree functions and variables. Use proper wrappers and provide an empty x86_flattree_get_config() inline so OF=y FLATTREE=n builds and works. [ tglx: Make it work with HPET_TIMER=n and make a function static ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-02-24 13:01:59 +01:00
Jacob Pan	7b62dbec90	x86/mrst: Fix apb timer rating when lapic timer is used Need to adjust the clockevent device rating for the structure that will be registered with clockevent system instead of the temporary structure. Without this fix, APB timer rating will be higher than LAPIC timer such that it can not be released later to be used as the broadcast timer. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> LKML-Reference: <1298506046-439-1-git-send-email-jacob.jun.pan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-24 08:22:43 +01:00
Sebastian Andrzej Siewior	3bcbaf6e08	rtc: cmos: Add OF bindings This allows to load the OF driver based informations from the device tree. Systems without BIOS may need to perform some initialization. PowerPC creates a PNP device from the OF information and performs this kind of initialization in their private PCI quirk. This looks more generic. This patch also avoids registering the platform RTC driver on X86 if we have a device tree blob. Otherwise we would setup the device based on the hardcoded information in arch/x86 rather than the device tree based one. [ tglx: Changed "int of_have_populated_dt()" to bool as recommended by Grant ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org Cc: rtc-linux@googlegroups.com Cc: Alessandro Zummo <a.zummo@towertech.it> LKML-Reference: <1298405266-1624-12-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-02-23 22:27:55 +01:00
Sebastian Andrzej Siewior	1fa4163bdc	x86: ce4100: Use OF to setup devices Use device tree information to setup IO_APIC configuration, interrupt routing, HPET and everything else which cannot be enumerated by other means. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org LKML-Reference: <1298405266-1624-11-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-02-23 22:27:54 +01:00
Sebastian Andrzej Siewior	bcc7c1244f	x86: ioapic: Add OF bindings for IO_APIC ioapic_xlate provides a translation from the information in device tree to ioapic related informations. This includes - obtaining hw irq which is the vector number "=> pin number + gsi" - obtaining type (level/edge/..) - programming this information into ioapic ioapic_add_ofnode adds an irq_domain based on informations from the device tree. This information (irq_domain) is required in order to map a device to its proper interrupt controller. [ tglx: Adapted to the io_apic changes, which let us move that whole code to devicetree.c ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org LKML-Reference: <1298405266-1624-10-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-02-23 22:27:54 +01:00
Sebastian Andrzej Siewior	9079b35364	x86: dtb: Add generic bus probe For now we probe these busses and we change this to board dependent probes once we have to. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org LKML-Reference: <1298405266-1624-9-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-02-23 22:27:54 +01:00
Sebastian Andrzej Siewior	96e0a0797e	x86: dtb: Add support for PCI devices backed by dtb nodes x86_of_pci_init() does two things: - it provides a generic irq enable and disable function. enable queries the device tree for the interrupt information, calls ->xlate on the irq host and updates the pci->irq information for the device. - it walks through PCI bus(es) in the device tree and adds its children (device) nodes to appropriate pci_dev nodes in kernel. So the dtb node information is available at probe time of the PCI device. Adding a PCI bus based on the information in the device tree is currently not supported. Right now direct access via ioports is used. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org LKML-Reference: <1298405266-1624-8-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-02-23 22:27:53 +01:00
Sebastian Andrzej Siewior	ffb9fc68df	x86: dtb: Add device tree support for HPET Set hpet_address based on information provied form DTB Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Grant Likely <grant.likely@secretlab.ca> Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org Cc: Dirk Brandewie <dirk.brandewie@gmail.com> LKML-Reference: <1298405266-1624-7-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-02-23 22:27:53 +01:00
Sebastian Andrzej Siewior	3879a6f329	x86: dtb: Add early parsing of IO_APIC APIC and IO_APIC have to be added to the system early because native_init_IRQ() requires it. In order to obtain the address of the ioapic the device tree has to be unflattened so of_address_to_resource() works. The device tree is relocated to ensure it is always covered by the kernel mapping. That way the boot loader does not have to make any assumptions about kernel's memory layout. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Grant Likely <grant.likely@secretlab.ca> Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org Cc: Dirk Brandewie <dirk.brandewie@gmail.com> LKML-Reference: <1298405266-1624-6-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-02-23 22:27:53 +01:00
Sebastian Andrzej Siewior	19c4f5f7f7	x86: dtb: Add irq domain abstraction The here introduced irq_domain abstraction represents a generic irq controller. It is a subset of powerpc's irq_host which is going to be renamed to irq_domain and then become generic. This implementation will be removed once it is generic. The xlate callback is resposible to parse irq informations like irq type and number and returns the hardware irq number which is reported by the hardware as active. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org LKML-Reference: <1298405266-1624-5-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-02-23 22:27:53 +01:00
Sebastian Andrzej Siewior	df2634f43f	x86: dtb: Add a device tree for CE4100 History: v1..v2: - dropped device_type except for cpu & pci. I have the compatible string for pci so I can drop the device_type once it is possible - I lowercased all compatible types. I will need to resend some patches which have upper case intel - The cpu had the same compatible string as the soc node. So I added to the soc node -immr for internel memory mapped registers. - I added generic names for all parts. - I reworked the i2c bars matching the way you suggested. I added a compatible node for the PCI device which only the PCI ids in its compatible string. The bars (each represents a complete i2c controller) have a "intel,ce4100-i2c-controller" compatible node. It is not used by the driver. The driver is probed via PCI ids (by the pci subsystem not OF) and matches the bar address against the ressource in the child node. Once there is a hit the node is attached. - The SPI driver is also probed via pci. However I also attached a compatible property based on PCI ids v2..v3: - intel,ce4100-immr become intel,ce4100-cp. cp stands for core peripherals. The Atom data sheet talks here about ACPI devices. Since we don't have ACPI this does not apply here. - The interrupt map is gone. There are now plenty of device nodes. - The "unit address string" got fixed, it uses not DD,V format. v3..v4: - added descriptions for compatible nodes introduced here: - intel,ce4100-ioapic - intel,ce4100-lapic - intel,ce4100-hpet - intel,ce4100 - intel,ce4100-cp - intel,ce4100-pci - added a description about I2C controller magic. - Added gpio-controller and gpio-cells property to gpio devices. Those properties are not (yet) used. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org LKML-Reference: <1298405266-1624-4-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-02-23 22:27:52 +01:00

... 4 5 6 7 8 ...

13148 Commits