linux

Author	SHA1	Message	Date
Ingo Molnar	5d0e600d90	[PATCH] x86: fix laptop bootup hang in init_acpi() During kernel bootup, a new T60 laptop (CoreDuo, 32-bit) hangs about 10%-20% of the time in acpi_init(): Calling initcall 0xc055ce1a: topology_init+0x0/0x2f() Calling initcall 0xc055d75e: mtrr_init_finialize+0x0/0x2c() Calling initcall 0xc05664f3: param_sysfs_init+0x0/0x175() Calling initcall 0xc014cb65: pm_sysrq_init+0x0/0x17() Calling initcall 0xc0569f99: init_bio+0x0/0xf4() Calling initcall 0xc056b865: genhd_device_init+0x0/0x50() Calling initcall 0xc056c4bd: fbmem_init+0x0/0x87() Calling initcall 0xc056dd74: acpi_init+0x0/0x1ee() It's a hard hang that not even an NMI could punch through! Frustratingly, adding printks or function tracing to the ACPI code made the hangs go away ... After some time an additional detail emerged: disabling the NMI watchdog made these occasional hangs go away. So i spent the better part of today trying to debug this and trying out various theories when i finally found the likely reason for the hang: if acpi_ns_initialize_devices() executes an _INI AML method and an NMI happens to hit that AML execution in the wrong moment, the machine would hang. (my theory is that this must be some sort of chipset setup method doing stores to chipset mmio registers?) Unfortunately given the characteristics of the hang it was sheer impossible to figure out which of the numerous AML methods is impacted by this problem. As a workaround i wrote an interface to disable chipset-based NMIs while executing _INI sections - and indeed this fixed the hang. I did a boot-loop of 100 separate reboots and none hung - while without the patch it would hang every 5-10 attempts. Out of caution i did not touch the nmi_watchdog=2 case (it's not related to the chipset anyway and didnt hang). I implemented this for both x86_64 and i686, tested the i686 laptop both with nmi_watchdog=1 [which triggered the hangs] and nmi_watchdog=2, and tested an Athlon64 box with the 64-bit kernel as well. Everything builds and works with the patch applied. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-02-13 13:26:24 +01:00
Muli Ben-Yehuda	310adfdd91	[PATCH] x86-64: robustify bad_dma_address handling - set bad_dma_address explicitly to 0x0 - reserve 32 pages from bad_dma_address and up - WARN_ON() a driver feeding us bad_dma_address Thanks to Leo Duran <leo.duran@amd.com> for the suggestion. Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Leo Duran <leo.duran@amd.com> Cc: Job Mason <jdmason@kudzu.us>	2007-02-13 13:26:24 +01:00
Andi Kleen	fc986db4fc	[PATCH] x86-64: Don't reserve ROMs We trust the e820 table, so explicitely reserving ROMs shouldn't be needed. Signed-off-by: Andi Kleen <ak@suse.de>	2007-02-13 13:26:24 +01:00
Andi Kleen	00edefae05	[PATCH] x86-64: Fix off by one error in IOMMU boundary checking Should be harmless because there is normally no memory there, but technically it was incorrect. Pointed out by Leo Duran Signed-off-by: Andi Kleen <ak@suse.de>	2007-02-13 13:26:24 +01:00
Zachary Amsden	ffb6017563	[PATCH] x86-64: x86_64 - Fix FS/GS registers for VT execution Initialize FS and GS to __KERNEL_DS as well. The actual value of them is not important, but it is important to reload them in protected mode. At this time, they still retain the real mode values from initial boot. VT disallows execution of code under such conditions, which means hardware virtualization can not be used to boot the kernel on Intel platforms, making the boot time painfully slow. This requires moving the GS load before the load of GS_BASE, so just move all the segments loads there to keep them together in the code. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-02-13 13:26:24 +01:00
Andi Kleen	9a11ff6827	[PATCH] x86-64: Unexport __supported_pte_mask The symbol is needed to manipulate page tables, and modules shouldn't do that. Leftover from 2.4, but no in tree module should need it now. Signed-off-by: Andi Kleen <ak@suse.de>	2007-02-13 13:26:24 +01:00
Andi Kleen	f49481bc50	[PATCH] x86-64: Check return value of putreg in PTRACE_SETREGS This means if an illegal value is set for the segment registers there ptrace will error out now with an errno instead of silently ignoring it. Signed-off-by: Andi Kleen <ak@suse.de>	2007-02-13 13:26:24 +01:00
Jack Steiner	2f7a2a79c3	[PATCH] x86-64: - Ignore long SMI interrupts in clock calibration code - update 1 Add failsafe mechanism to HPET/TSC clock calibration. Signed-off-by: Jack Steiner <steiner@sgi.com> Updated to include failsafe mechanism & additional community feedback. Patch built on latest 2.6.20-rc4-mm1 tree. Signed-off-by: Andi Kleen <ak@suse.de>	2007-02-13 13:26:24 +01:00
Andi Kleen	a98f0dd34d	[PATCH] x86-64: Allow to run a program when a machine check event is detected When a machine check event is detected (including a AMD RevF threshold overflow event) allow to run a "trigger" program. This allows user space to react to such events sooner. The trigger is configured using a new trigger entry in the machinecheck sysfs interface. It is currently shared between all CPUs. I also fixed the AMD threshold handler to run the machine check polling code immediately to actually log any events that might have caused the threshold interrupt. Also added some documentation for the mce sysfs interface. Signed-off-by: Andi Kleen <ak@suse.de>	2007-02-13 13:26:23 +01:00
Jan Beulich	24ce0e96f2	[PATCH] x86-64: Tighten mce_amd driver MSR reads while debugging an unrelated problem in Xen, I noticed odd reads from non-existent MSRs. Having now found time to look why these happen, I came up with below patch, which - prevents accessing MCi_MISCj with j > 0 when the block pointer in MCi_MISC0 is zero - accesses only contiguous MCi_MISCj until a non-implemented one is found - doesn't touch unimplemented blocks in mce_threshold_interrupt at all - gives names to two bits previously derived from MASK_VALID_HI (it took me some time to understand the code without this) The first three items, besides being apparently closer to the spec, should namely help cutting down on the time mce_threshold_interrupt() takes. Signed-off-by: Andi Kleen <ak@suse.de>	2007-02-13 13:26:23 +01:00
Venkatesh Pallipadi	1676193937	[PATCH] x86-64: Handle 32 bit PerfMon Counter writes cleanly in x86_64 nmi_watchdog P6 CPUs and Core/Core 2 CPUs which has 'architectural perf mon' feature, only supports write of low 32 bits in Performance Monitoring Counters. Bits 32..39 are sign extended based on bit 31 and bits 40..63 are reserved and should be zero. This patch: Change x86_64 nmi handler to handle this case cleanly. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-02-13 13:26:22 +01:00
Glauber de Oliveira Costa	4c3cbf75b2	[PATCH] x86-64: Use constant instead of raw number in x86_64 ioperm.c This is a tiny cleanup to increase readability Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-02-13 13:26:22 +01:00
Glauber de Oliveira Costa	c49c5330c9	[PATCH] x86-64: Remove fastcall references in x86_64 code Unlike x86, x86_64 already passes arguments in registers. The use of regparm attribute makes no difference in produced code, and the use of fastcall just bloats the code. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-02-13 13:26:22 +01:00
Rohit Seth	53fee04f31	[PATCH] x86-64: Fix fake numa for x86_64 machines with big IO hole This patch resolves the issue of running with numa=fake=X on kernel command line on x86_64 machines that have big IO hole. While calculating the size of each node now we look at the total hole size in that range. Previously there were nodes that only had IO holes in them causing kernel boot problems. We now use the NODE_MIN_SIZE (64MB) as the minimum size of memory that any node must have. We reduce the number of allocated nodes if the number of nodes specified on kernel command line results in any node getting memory smaller than NODE_MIN_SIZE. This change allows the extra memory to be incremented in NODE_MIN_SIZE granule and uniformly distribute among as many nodes (called big nodes) as possible. [akpm@osdl.org: build fix] Signed-off-by: David Rientjes <reintjes@google.com> Signed-off-by: Paul Menage <menage@google.com> Signed-off-by: Rohit Seth <rohitseth@google.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-02-13 13:26:22 +01:00
Catalin Marinas	006e84ee3a	[PATCH] x86-64: do not always end the stack trace with ULONG_MAX It makes more sense to end the stack trace with ULONG_MAX only if nr_entries < max_entries. Otherwise, we lose one entry in the long stack traces and cannot know whether the trace was complete or not. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-02-13 13:26:21 +01:00
Karsten Weiss	5558870bfb	[PATCH] x86-64: improved iommu documentation - add SWIOTLB config help text - mention Documentation/x86_64/boot-options.txt in Documentation/kernel-parameters.txt - remove the duplication of the iommu kernel parameter documentation. - Better explanation of some of the iommu kernel parameter options. - "32MB<<order" instead of "32MB^order". - Mention the default "order" value. - list the four existing PCI-DMA mapping implementations of arch x86_64 - group the iommu= option keywords by PCI-DMA mapping implementation. - Distinguish iommu= option keywords from number arguments. - Explain the meaning of DAC and SAC. Signed-off-by: Karsten Weiss <knweiss@science-computing.de> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-02-13 13:26:21 +01:00
Amul Shah	076422d2af	[PATCH] x86-64: Allocate the NUMA hash function nodemap dynamically Remove the statically allocated memory to NUMA node hash map in favor of a dynamically allocated memory to node hash map (it is cache aligned). This patch has the nice side effect in that it allows the hash map to grow for systems with large amounts of memory (256GB - 1TB), but suffer from having small PCI space tacked onto the boot node (which is somewhere between 192MB to 512MB on the ES7000). Signed-off-by: Amul Shah <amul.shah@unisys.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Rohit Seth <rohitseth@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-02-13 13:26:19 +01:00
Andi Kleen	0812a579c9	[PATCH] x86-64: Add __copy_from_user_nocache This does user copies in fs write() into the page cache with write combining. This pushes the destination out of the CPU's cache, but allows higher bandwidth in some case. The theory is that the page cache data is usually not touched by the CPU again and it's better to not pollute the cache with it. Also it is a little faster. Signed-off-by: Andi Kleen <ak@suse.de>	2007-02-13 13:26:19 +01:00
Arjan van de Ven	5dfe4c964a	[PATCH] mark struct file_operations const 2 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. [akpm@osdl.org: sparc64 fix] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:44 -08:00
Alon Bar-Lev	7a3a06d0e1	[PATCH] Dynamic kernel command-line: fixups Remove in-source externs, linux/init.h is included in all cases. This is a fixups for "Dynamic kernel command-line" patch. It also includes some uml __init fixups so that we can __initdata also its command_line. Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:39 -08:00
Alon Bar-Lev	adf48856db	[PATCH] Dynamic kernel command-line: x86_64 1. Rename saved_command_line into boot_command_line. 2. Set command_line as __initdata. Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:39 -08:00
Robert P. J. Day	3de3af130b	[PATCH] Remove unnecessary memset(0) calls after kzalloc() calls. Delete the few remaining unnecessary calls to memset(0) after a call to kzalloc(). Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: Andi Kleen <ak@suse.de> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: Adam Belay <ambx1@neo.rr.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:31 -08:00
Jean-Paul Saman	67d38229df	[PATCH] disable init/initramfs.c: architectures Update all arch/*/kernel/vmlinux.lds.S to not include space for initramfs when CONFIG_BLK_DEV_INITRAMFS is not selected. This saves another 4 kbytes on most platfoms (some reserve PAGE_SIZE for initramfs). Signed-off-by: Jean-Paul Saman <jean-paul.saman@nxp.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:25 -08:00
Linus Torvalds	78149df6d5	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (41 commits) Revert "PCI: remove duplicate device id from ata_piix" msi: Make MSI useable more architectures msi: Kill the msi_desc array. msi: Remove attach_msi_entry. msi: Fix msi_remove_pci_irq_vectors. msi: Remove msi_lock. msi: Kill msi_lookup_irq MSI: Combine pci_(save\|restore)_msi/msix_state MSI: Remove pci_scan_msi_device() MSI: Replace pci_msi_quirk with calls to pci_no_msi() PCI: remove duplicate device id from ipr PCI: remove duplicate device id from ata_piix PCI: power management: remove noise on non-manageable hw PCI: cleanup MSI code PCI: make isa_bridge Alpha-only PCI: remove quirk_sis_96x_compatible() PCI: Speed up the Intel SMBus unhiding quirk PCI Quirk: 1k I/O space IOBL_ADR fix on P64H2 shpchp: delete trailing whitespace shpchp: remove DBG_XXX_ROUTINE ...	2007-02-07 19:23:44 -08:00
Eric W. Biederman	f7feaca77d	msi: Make MSI useable more architectures The arch hooks arch_setup_msi_irq and arch_teardown_msi_irq are now responsible for allocating and freeing the linux irq in addition to setting up the the linux irq to work with the interrupt. arch_setup_msi_irq now takes a pci_device and a msi_desc and returns an irq. With this change in place this code should be useable by all platforms except those that won't let the OS touch the hardware like ppc RTAS. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-07 15:50:08 -08:00
Linus Torvalds	21d37bbc65	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (140 commits) ACPICA: reduce table header messages to fit within 80 columns asus-laptop: merge with ACPICA table update ACPI: bay: Convert ACPI Bay driver to be compatible with sysfs update. ACPI: bay: new driver is EXPERIMENTAL ACPI: bay: make drive_bays static ACPI: bay: make bay a platform driver ACPI: bay: remove prototype procfs code ACPI: bay: delete unused variable ACPI: bay: new driver adding removable drive bay support ACPI: dock: check if parent is on dock ACPICA: fix gcc build warnings Altix: Add ACPI SSDT PCI device support (hotplug) Altix: ACPI SSDT PCI device support ACPICA: reduce conflicts with Altix patch series ACPI_NUMA: fix HP IA64 simulator issue with extended memory domain ACPI: fix HP RX2600 IA64 boot ACPI: build fix for IBM x440 - CONFIG_X86_SUMMIT ACPICA: Update version to 20070126 ACPICA: Fix for incorrect parameter passed to AcpiTbDeleteTable during table load. ACPICA: Update copyright to 2007. ...	2007-02-07 15:36:08 -08:00
Jan Beulich	563aaf064f	[IA64] swiotlb cleanup - add proper __init decoration to swiotlb's init code (and the code calling it, where not already the case) - replace uses of 'unsigned long' with dma_addr_t where appropriate - do miscellaneous simplicfication and cleanup Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2007-02-05 18:51:25 -08:00
Alexey Starikovskiy	15a58ed121	ACPICA: Remove duplicate table definitions (non-conflicting), cont Signed-off-by: Len Brown <len.brown@intel.com>	2007-02-02 21:14:29 -05:00
Alexey Starikovskiy	cee324b145	ACPICA: use new ACPI headers. Signed-off-by: Len Brown <len.brown@intel.com>	2007-02-02 21:14:28 -05:00
Alexey Starikovskiy	ad71860a17	ACPICA: minimal patch to integrate new tables into Linux Signed-off-by: Len Brown <len.brown@intel.com>	2007-02-02 21:14:22 -05:00
Venkatesh Pallipadi	58d9ce7d75	[PATCH] Revert nmi_known_cpu() check during boot option parsing Commit `f2802e7f57` and its x86 version (`b7471c6da9`) adds nmi_known_cpu() check while parsing boot options in x86_64 and i386. With that, "nmi_watchdog=2" stops working for me on Intel Core 2 CPU based system. The problem is, setup_nmi_watchdog is called while parsing the boot option and identify_cpu is not done yet. So, the return value of nmi_known_cpu() is not valid at this point. So revert that check. This should not have any adverse effect as the nmi_known_cpu() check is done again later in enable_lapic_nmi_watchdog(). Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-23 07:52:05 -08:00
Muli Ben-Yehuda	b92cc55923	[PATCH] x86-64: tighten up printks Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-01-11 01:52:44 +01:00
Jack Steiner	ed5316d445	[PATCH] x86-64: - Ignore long SMI interrupts in clock calibration Ensure that no SMI interrupts occur between the read of the HPET & TSC in the clock calibration loop. I noticed that a 2.66GHz system incorrectly detected the processor clock speed about 1/7 of the time: time.c: Detected 2660.005 MHz processor. (most of the time) time.c: Detected 2988.203 MHz processor. (sometime) The problem is caused by an SMI interrupt occuring in hpet_calibrate_tsc() between the read of the HPET & TSC. Prior to switching the BIOS into ACPI mode, it appears that every 27msec an SMI interrupt occurs. The SMI interrupt takes 4.8 msec to process. Note: On my test system, TICK_MIN had to be >380. I picked 5000 to minimize risk of having a value that is too small for other platforms. Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andi Kleen <ak@suse.de> arch/x86_64/kernel/time.c \| 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)	2007-01-11 01:52:44 +01:00
Linus Torvalds	fea5f1e196	Revert "[PATCH] x86-64: Try multiple timer variants in check_timer" This reverts commit `b026872601`, which has been linked to several problem reports with IO-APIC and the timer. Machines either don't boot because the timer doesn't happen, or we get double timer interrupts because we end up double-routing the timer irq through multiple interfaces. See for example http://lkml.org/lkml/2006/12/16/101 http://lkml.org/lkml/2007/1/3/9 http://bugzilla.kernel.org/show_bug.cgi?id=7789 about some of the discussion. Patches to fix this cleanup exist (and have been confirmed to work fine at least for some of the affected cases) and we'll revisit it for 2.6.21, but this late in the -rc series we're better off just reverting the incomplete commit that caused the problems. Suggested-by: Adrian Bunk <bunk@stusta.de> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Yinghai Lu <yinghai.lu@amd.com> Cc: Andrew Morton <akpm@osdl.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-08 15:04:46 -08:00
Linus Torvalds	de9e957f12	Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq * master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] longhaul: Kill off warnings introduced by recent changes. [CPUFREQ] Uninitialized use of cmd.val in arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c:acpi_cpufreq_target() [CPUFREQ] Longhaul - Always guess FSB [CPUFREQ] Longhaul - Fix up powersaver assumptions. [CPUFREQ] longhaul: Fix up unreachable code. [CPUFREQ] speedstep-centrino: missing space and bracket [CPUFREQ] Bug fix for acpi-cpufreq and cpufreq_stats oops on frequency change notification [CPUFREQ] select consistently	2007-01-03 17:34:12 -08:00
OGAWA Hirofumi	7523c4dd99	[PATCH] x86_64: Fix dump_trace() If caller passed the tsk, we should use it to validate a stack ptr. Otherwise, sysrq-t and other debugging stuff doesn't work. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-03 08:49:59 -08:00
Randy Dunlap	917325d30a	[CPUFREQ] select consistently Make x86_64 ACPI_CPU_FREQ select CPU_FREQ_TABLE like other methods do. (although we should still eliminate as much use of 'select' as possible) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Dave Jones <davej@redhat.com>	2006-12-22 22:45:41 -05:00
Ingo Molnar	0888f06ac9	[PATCH] sched: fix bad missed wakeups in the i386, x86_64, ia64, ACPI and APM idle code Fernando Lopez-Lezcano reported frequent scheduling latencies and audio xruns starting at the 2.6.18-rt kernel, and those problems persisted all until current -rt kernels. The latencies were serious and unjustified by system load, often in the milliseconds range. After a patient and heroic multi-month effort of Fernando, where he tested dozens of kernels, tried various configs, boot options, test-patches of mine and provided latency traces of those incidents, the following 'smoking gun' trace was captured by him: _------=> CPU# / _-----=> irqs-off \| / _----=> need-resched \|\| / _---=> hardirq/softirq \|\|\| / _--=> preempt-depth \|\|\|\| / \|\|\|\|\| delay cmd pid \|\|\|\|\| time \| caller \ / \|\|\|\|\| \ \| / IRQ_19-1479 1D..1 0us : __trace_start_sched_wakeup (try_to_wake_up) IRQ_19-1479 1D..1 0us : __trace_start_sched_wakeup <<...>-5856> (37 0) IRQ_19-1479 1D..1 0us : __trace_start_sched_wakeup (c01262ba 0 0) IRQ_19-1479 1D..1 0us : resched_task (try_to_wake_up) IRQ_19-1479 1D..1 0us : __spin_unlock_irqrestore (try_to_wake_up) ... <idle>-0 1...1 11us!: default_idle (cpu_idle) ... <idle>-0 0Dn.1 602us : smp_apic_timer_interrupt (c0103baf 1 0) ... <...>-5856 0D..2 618us : __switch_to (__schedule) <...>-5856 0D..2 618us : __schedule <<idle>-0> (20 162) <...>-5856 0D..2 619us : __spin_unlock_irq (__schedule) <...>-5856 0...1 619us : trace_stop_sched_switched (__schedule) <...>-5856 0D..1 619us : trace_stop_sched_switched <<...>-5856> (37 0) what is visible in this trace is that CPU#1 ran try_to_wake_up() for PID:5856, it placed PID:5856 on CPU#0's runqueue and ran resched_task() for CPU#0. But it decided to not send an IPI that no CPU - due to TS_POLLING. But CPU#0 never woke up after its NEED_RESCHED bit was set, and only rescheduled to PID:5856 upon the next lapic timer IRQ. The result was a 600+ usecs latency and a missed wakeup! the bug turned out to be an idle-wakeup bug introduced into the mainline kernel this summer via an optimization in the x86_64 tree: commit `495ab9c045` Author: Andi Kleen <ak@suse.de> Date: Mon Jun 26 13:59:11 2006 +0200 [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status During some profiling I noticed that default_idle causes a lot of memory traffic. I think that is caused by the atomic operations to clear/set the polling flag in thread_info. There is actually no reason to make this atomic - only the idle thread does it to itself, other CPUs only read it. So I moved it into ti->status. the problem is this type of change: if (!hlt_counter && boot_cpu_data.hlt_works_ok) { - clear_thread_flag(TIF_POLLING_NRFLAG); + current_thread_info()->status &= ~TS_POLLING; smp_mb__after_clear_bit(); while (!need_resched()) { local_irq_disable(); this changes clear_thread_flag() to an explicit clearing of TS_POLLING. clear_thread_flag() is defined as: clear_bit(flag, &ti->flags); and clear_bit() is a LOCK-ed atomic instruction on all x86 platforms: static inline void clear_bit(int nr, volatile unsigned long * addr) { __asm__ __volatile__( LOCK_PREFIX "btrl %1,%0" hence smp_mb__after_clear_bit() is defined as a simple compile barrier: #define smp_mb__after_clear_bit() barrier() but the explicit TS_POLLING clearing introduced by the patch: + current_thread_info()->status &= ~TS_POLLING; is not an atomic op! So the clearing of the TS_POLLING bit is freely reorderable with the reading of the NEED_RESCHED bit - and both now reside in different memory addresses. CPU idle wakeup very much depends on ordered memory ops, the clearing of the TS_POLLING flag must always be done before we test need_resched() and hit the idle instruction(s). [Symmetrically, the wakeup code needs to set NEED_RESCHED before it tests the TS_POLLING flag, so memory ordering is paramount.] Fernando's dual-core Athlon64 system has a sufficiently advanced memory ordering model so that it triggered this scenario very often. ( And it also turned out that the reason why these latencies never triggered on my testsystems is that i routinely use idle=poll, which was the only idle variant not affected by this bug. ) The fix is to change the smp_mb__after_clear_bit() to an smp_mb(), to act as an absolute barrier between the TS_POLLING write and the NEED_RESCHED read. This affects almost all idling methods (default, ACPI, APM), on all 3 x86 architectures: i386, x86_64, ia64. Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:51 -08:00
Ingo Molnar	136f1e7a8c	[PATCH] x86_64: fix boot time hang in detect_calgary() if CONFIG_CALGARY_IOMMU is built into the kernel via CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT, or is enabled via the iommu=calgary boot option, then the detect_calgary() function runs to detect the presence of a Calgary IOMMU. detect_calgary() first searches the BIOS EBDA area for a "rio_table_hdr" BIOS table. It has this parsing algorithm for the EBDA: while (offset) { ... /* The next offset is stored in the 1st word. 0 means no more / offset = ((unsigned short *)(ptr + offset)); } got that? Lets repeat it slowly: we've got a BIOS-supplied data structure, plus Linux kernel code that will only break out of an infinite parsing loop once the BIOS gives a zero offset. Ok? Translation: what an excellent opportunity for BIOS writers to lock up the Linux boot process in an utterly hard to debug place! Indeed the BIOS jumped on that opportunity on my box, which has the following EBDA chaining layout: 384, 65282, 65535, 65535, 65535, 65535, 65535, 65535 ... see the pattern? So my, definitely non-Calgary system happily locks up in detect_calgary()! the patch below fixes the boot hang by trusting the BIOS-supplied data structure a bit less: the parser always has to make forward progress, and if it doesnt, we break out of the loop and i get the expected kernel message: Calgary: Unable to locate Rio Grande Table in EBDA - bailing! Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-21 00:08:28 -08:00
Linus Torvalds	d1526e2cda	Remove stack unwinder for now It has caused more problems than it ever really solved, and is apparently not getting cleaned up and fixed. We can put it back when it's stable and isn't likely to make warning or bug events worse. In the meantime, enable frame pointers for more readable stack traces. Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-15 08:47:51 -08:00
Dave Jones	c4366889dd	Merge ../linus Conflicts: drivers/cpufreq/cpufreq.c	2006-12-12 17:41:41 -05:00
Alexey Dobriyan	1f29bcd739	[PATCH] sysctl: remove unused "context" param Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andi Kleen <ak@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Howells <dhowells@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:41 -08:00
Andi Kleen	1bac3b383a	[PATCH] x86: Work around gcc 4.2 over aggressive optimizer The new PDA code uses a dummy _proxy_pda variable to describe memory references to the PDA. It is never referenced in inline assembly, but exists as input/output arguments. gcc 4.2 in some cases can CSE references to this which causes unresolved symbols. Define it to zero to avoid this. Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-09 21:33:36 +01:00
Ravikiran G Thirumalai	92715e282b	[PATCH] x86: Fix boot hang due to nmi watchdog init code 2.6.19 stopped booting (or booted based on build/config) on our x86_64 systems due to a bug introduced in 2.6.19. check_nmi_watchdog schedules an IPI on all cpus to busy wait on a flag, but fails to set the busywait flag if NMI functionality is disabled. This causes the secondary cpus to spin in an endless loop, causing the kernel bootup to hang. Depending upon the build, the busywait flag got overwritten (stack variable) and caused the kernel to bootup on certain builds. Following patch fixes the bug by setting the busywait flag before returning from check_nmi_watchdog. I guess using a stack variable is not good here as the calling function could potentially return while the busy wait loop is still spinning on the flag. AK: I redid the patch significantly to be cleaner Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-09 21:33:35 +01:00
Jeremy Fitzhardinge	c31a0bf3e1	[PATCH] Generic BUG for x86-64 This makes x86-64 use the generic BUG machinery. The main advantage in using the generic BUG machinery for x86-64 is that the inlined overhead of BUG is just the ud2a instruction; the file+line information are no longer inlined into the instruction stream. This reduces cache pollution. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Andi Kleen <ak@muc.de> Cc: Hugh Dickens <hugh@veritas.com> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:39 -08:00
Linus Torvalds	4522d58275	Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (156 commits) [PATCH] x86-64: Export smp_call_function_single [PATCH] i386: Clean up smp_tune_scheduling() [PATCH] unwinder: move .eh_frame to RODATA [PATCH] unwinder: fully support linker generated .eh_frame_hdr section [PATCH] x86-64: don't use set_irq_regs() [PATCH] x86-64: check vector in setup_ioapic_dest to verify if need setup_IO_APIC_irq [PATCH] x86-64: Make ix86 default to HIGHMEM4G instead of NOHIGHMEM [PATCH] i386: replace kmalloc+memset with kzalloc [PATCH] x86-64: remove remaining pc98 code [PATCH] x86-64: remove unused variable [PATCH] x86-64: Fix constraints in atomic_add_return() [PATCH] x86-64: fix asm constraints in i386 atomic_add_return [PATCH] x86-64: Correct documentation for bzImage protocol v2.05 [PATCH] x86-64: replace kmalloc+memset with kzalloc in MTRR code [PATCH] x86-64: Fix numaq build error [PATCH] x86-64: include/asm-x86_64/cpufeature.h isn't a userspace header [PATCH] unwinder: Add debugging output to the Dwarf2 unwinder [PATCH] x86-64: Clarify error message in GART code [PATCH] x86-64: Fix interrupt race in idle callback (3rd try) [PATCH] x86-64: Remove unwind stack pointer alignment forcing again ... Fixed conflict in include/linux/uaccess.h manually Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:59:11 -08:00
Magnus Damm	85916f8166	[PATCH] Kexec / Kdump: Unify elf note code The elf note saving code is currently duplicated over several architectures. This cleanup patch simply adds code to a common file and then replaces the arch-specific code with calls to the newly added code. The only drawback with this approach is that s390 doesn't fully support kexec-on-panic which for that arch leads to introduction of unused code. Signed-off-by: Magnus Damm <magnus@valinux.co.jp> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:46 -08:00
Ingo Molnar	0231606785	[PATCH] hotplug CPU: clean up hotcpu_notifier() use There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn, prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus generating compiler warnings of unused symbols, hence forcing people to add #ifdefs. the compiler can skip truly unused functions just fine: text data bss dec hex filename 1624412 728710 3674856 6027978 5bfaca vmlinux.before 1624412 728710 3674856 6027978 5bfaca vmlinux.after [akpm@osdl.org: topology.c fix] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:39 -08:00
Andrew Morton	a38a44c1a9	[PATCH] smp_call_function_single() check that local interrupts are enabled smp_call_function_single() can deadlock if the caller disabled local interrupts (the target CPU could be spinning on call_lock). Check for that. Why on earth do these functions use spin_lock_bh()?? Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:39 -08:00
Masami Hiramatsu	b4c6c34a53	[PATCH] kprobes: enable booster on the preemptible kernel When we are unregistering a kprobe-booster, we can't release its instruction buffer immediately on the preemptive kernel, because some processes might be preempted on the buffer. The freeze_processes() and thaw_processes() functions can clean most of processes up from the buffer. There are still some non-frozen threads who have the PF_NOFREEZE flag. If those threads are sleeping (not preempted) at the known place outside the buffer, we can ensure safety of freeing. However, the processing of this check routine takes a long time. So, this patch introduces the garbage collection mechanism of insn_slot. It also introduces the "dirty" flag to free_insn_slot because of efficiency. The "clean" instruction slots (dirty flag is cleared) are released immediately. But the "dirty" slots which are used by boosted kprobes, are marked as garbages. collect_garbage_slots() will be invoked to release "dirty" slots if there are more than INSNS_PER_PAGE garbage slots or if there are no unused slots. Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: "bibo,mao" <bibo.mao@intel.com> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Yumiko Sugita <yumiko.sugita.yf@hitachi.com> Cc: Satoshi Oshima <soshima@redhat.com> Cc: Hideo Aoki <haoki@redhat.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:38 -08:00

1 2 3 4 5 ...

934 Commits