Provide a PT map hook for HIGHPTE kernels to designate where they are mapping
page tables. This information is required so the physical address of PTE
updates can be determined; otherwise, the mm layer would have to carry the
physical address all the way to each PTE modification callsite, which is even
more hideous that the macros required to provide the proper hooks.
So lets not mess up arch neutral code to achieve this, but keep the horror in
an #ifdef HIGHPTE in include/asm-i386/pgtable.h. I had to use macros here
because some types are not yet defined in all the include paths for this
header.
This patch is absolutely required for HIGHPTE kernels to operate properly with
VMI.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In order to share the common code in tsc.c which does CPU Khz calibration, we
need to make an accurate value of CPU speed available to the tsc.c code. This
value loses a lot of precision in a VM because of the timing differences with
real hardware, but we need it to be as precise as possible so the guest can
make accurate time calculations with the cycle counters.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The custom_sched_clock hook is broken. The result from sched_clock needs to
be in nanoseconds, not in CPU cycles. The TSC is insufficient for this
purpose, because TSC is poorly defined in a virtual environment, and mostly
represents real world time instead of scheduled process time (which can be
interrupted without notice when a virtual machine is descheduled).
To make the scheduler consistent, we must expose a different nature of time,
that is scheduled time. So deprecate this custom_sched_clock hack and turn it
into a paravirt-op, as it should have been all along. This allows the tsc.c
code which converts cycles to nanoseconds to be shared by all paravirt-ops
backends.
It is unfortunate to add a new paravirt-op, but this is a very distinct
abstraction which is clearly different for all virtual machine
implementations, and it gets rid of an ugly indirect function which I
ashamedly admit I hacked in to try to get this to work earlier, and then even
got in the wrong units.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Critical bugfixes for the VMI-Timer code.
1) Do not setup a one shot alarm if we are keeping the periodic alarm
armed. Additionally, since the periodic alarm can be run at a lower rate
than HZ, let's fixup the guard to the no-idle-hz mode appropriately. This
fixes the bug where the no-idle-hz mode might have a higher interrupt rate
than the non-idle case.
2) The interrupt handler can no longer adjust xtime due to nested lock
acquisition. Drop this. We don't need to check for wallclock time at
every tick, it can be done in userspace instead.
3) Add a bypass to disable noidle operation. This is useful as a last
minute workaround, or testing measure.
4) The code to skip the IO_APIC timer testing (no_timer_check) should be
conditional on IO_APIC, not SMP, since UP kernels can have this configured
in as well.
Signed-off-by: Dan Hecht <dhecht@vmware.com>
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When it goes to free1_out, dev->dma_mem has not been freed.
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When removing set_native_irq I missed the fact that it was
called in a couple of places that were compiled even when
SMP support is disabled. And since the irq_desc[].affinity
field only exists in SMP things broke.
Thanks to Simon Arlott <simon@arlott.org> for spotting this.
There are a couple of ways to fix this but the simplest one
is to just remove the assignments. The affinity field is only
used to display a value to the user, and nothing on either i386
or x86_64 reads it or depends on it being any particlua value,
so skipping the assignment is safe. The assignment that
is being removed is just for the initial affinity value before
the user explicitly sets it. The irq_desc array initializes
this field to CPU_MASK_ALL so the field is initialized to
a reasonable value in the SMP case without being set.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit aeeddc1435, which was
half-baked and broken. It just resulted in compile errors, since
cpufreq_register_driver() still changes the 'driver_data' by setting
bits in the flags field. So claiming it is 'const' _really_ doesn't
work.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeremy Fitzhardinge suggested the use of -freg-struct-return, which does
structure-returns (such as when using pte_t) in registers instead of on
the stack.
that is indeed so, and this option reduced the kernel size a bit:
text data bss dec hex filename
4799506 543456 3760128 9103090 8ae6f2 vmlinux.before
4798117 543456 3760128 9101701 8ae185 vmlinux.after
the resulting kernel booted fine on my testbox. Lets go for it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch replaces all instances of "set_native_irq_info(irq, mask)"
with "irq_desc[irq].affinity = mask". The latter form is clearer
uses fewer abstractions, and makes access to this field uniform
accross different architectures.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 2ff2d3d747.
Uwe Bugla reports that he cannot mount a floppy drive any more, and Jiri
Slaby bisected it down to this commit.
Benjamin LaHaise also points out that this is a big hot-path, and that
interrupt delivery while idle is very common and should not go through
all these expensive gyrations.
Fix up conflicts in arch/i386/kernel/apic.c and arch/i386/kernel/irq.c
due to other unrelated irq changes.
Cc: Stephane Eranian <eranian@hpl.hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Uwe Bugla <uwe.bugla@gmx.de>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was OpenVZ specific bug rendering some cpufreq drivers unusable on SMP.
In short, when cpufreq code thinks it confined itself to needed cpu by means
of set_cpus_allowed() to execute rdmsr, some "virtual cpu" feature can migrate
process to anywhere. This triggers bugons and does wrong things in general.
This got fixed by introducing rdmsr_on_cpu and wrmsr_on_cpu executing rdmsr
and wrmsr on given physical cpu by means of smp_call_function_single().
Dave Jones mentioned cpufreq might be not only user of rdmsr_on_cpu() and
wrmsr_on_cpu(), so I'm putting them into arch/{i386,x86_64}/lib/ .
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
The broadcast functionality is only necessary when a local APIC is
available. Make the config switch depend on X86_LOCAL_APIC. This
resolves the mach-voyager breakage introduced by the tick managament
code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
Documentation/kernel-docs.txt update.
arch/cris: typo in KERN_INFO
Storage class should be before const qualifier
kernel/printk.c: comment fix
update I/O sched Kconfig help texts - CFQ is now default, not AS.
Remove duplicate listing of Cris arch from README
kbuild: more doc. cleanups
doc: make doc. for maxcpus= more visible
drivers/net/eexpress.c: remove duplicate comment
add a help text for BLK_DEV_GENERIC
correct a dead URL in the IP_MULTICAST help text
fix the BAYCOM_SER_HDX help text
fix SCSI_SCAN_ASYNC help text
trivial documentation patch for platform.txt
Fix typos concerning hierarchy
Fix comment typo "spin_lock_irqrestore".
Fix misspellings of "agressive".
drivers/scsi/a100u2w.c: trivial typo patch
Correct trivial typo in log2.h.
Remove useless FIND_FIRST_BIT() macro from cardbus.c.
...
Change the explicit code in the relocs.c file to use ARRAY_SIZE()
and add a definition of ARRAY_SIZE() since this is a userspace program
and wouldn't include kernel.h.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
This is an additional list of systems that exhibit the PCI device ordering
issue that prompted the following patch:
commit 6b4b78fed4
Author: Matt Domsch <Matt_Domsch@dell.com>
Date: Fri Sep 29 15:23:23 2006 -0500
PCI: optionally sort device lists breadth-first
Adding these systems to the list prevents the need for the additional
kernel command line argument.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Now that disable_irq() defaults to delayed-disable semantics, the IRQ_DISABLED
flag is not needed anymore.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Never mask interrupts immediately upon request. Disabling interrupts in
high-performance codepaths is rare, and on the other hand this change could
recover lost edges (or even other types of lost interrupts) by conservatively
only masking interrupts after they happen. (NOTE: with this change the
highlevel irq-disable code still soft-disables this IRQ line - and if such an
interrupt happens then the IRQ flow handler keeps the IRQ masked.)
Mark i8529A controllers as 'never loses an edge'.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In preparation for supporting generic timekeeping, this patch cleans up
x86-64's use of vxtime.hpet_address, changing it to just hpet_address as is
also used in i386. This is necessary since the vxtime structure will be going
away.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The NMI watchdog implementation assumes that the local APIC timer interrupt is
happening. This assumption is not longer true when high resolution timers and
dynamic ticks come into play, as they may switch off the local APIC timer
completely. Take the PIT/HPET interrupts into account too, to avoid false
positives.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The local apic timer calibration has two problem cases:
1. The calibration is based on readout of the PIT/HPET timer to detect the
wrap of the periodic tick. It happens that a box gets stuck in the
calibration loop due to a PIT with a broken readout function.
2. CoreDuo boxen show a sporadic PIT runs too slow defect, which results
in a wrong lapic calibration. The PIT goes back to normal operation once
the lapic timer is switched to periodic mode.
Both are existing and unfixed problems in the current upstream kernel and
prevent certain laptops and other systems from booting Linux.
Rework the code to address both problems:
- Make the calibration interrupt driven. This removes the wait_timer_tick
magic hackery from lapic.c and time_hpet.c. The clockevents framework
allows easy substitution of the global tick event handler for the
calibration. This is more accurate than monitoring jiffies. At this point
of the boot process, nothing disturbes the interrupt delivery, so the
results are very accurate.
- Verify the calibration against the PM timer, when available by using the
early access function. When the measured calibration period is outside of
an one percent window, then the lapic timer calibration is adjusted to the
pm timer result.
- Verify the calibration by running the lapic timer with the calibration
handler. Disable lapic timer in case of deviation.
This also removes the "synchronization" of the local apic timer to the global
tick. This synchronization never worked, as there is no way to synchronize
PIT(HPET) and local APIC timer. The synchronization by waiting for the tick
just alignes the local APIC timer for the first events, but later the events
drift away due to the different clocks. Removing the "sync" is just
randomizing the asynchronous behaviour at setup time.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add clockevent drivers for i386: lapic (local) and PIT/HPET (global). Update
the timer IRQ to call into the PIT/HPET driver's event handler and the
lapic-timer IRQ to call into the lapic clockevent driver. The assignement of
timer functionality is delegated to the core framework code and replaces the
compile and runtime evalution in do_timer_interrupt_hook()
Use the clockevents broadcast support and implement the lapic_broadcast
function for ACPI.
No changes to existing functionality.
[ kdump fix from Vivek Goyal <vgoyal@in.ibm.com> ]
[ fixes based on review feedback from Arjan van de Ven <arjan@infradead.org> ]
Cleanups-from: Adrian Bunk <bunk@stusta.de>
Build-fixes-from: Andrew Morton <akpm@osdl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The apic code is quite unstructured and missing a lot of comments.
- Restructure the code into helper functions, timer, setup/shutdown,
interrupt and power management blocks.
- Fixup comments.
- Namespace fixups
- Inline helpers for version and is_integrated
- Combine the ack_bad_irq functions
No functional changes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow early access to the power management timer by exposing the verified read
function and providing a helper function which checks the pmtmr_ioport
variable and returns either the pm timer readout or 0 in case the pm timer is
not available.
Create a new header file and replace also the ifdef'ed extern definition in
arch/i386/kernel/acpi/boot.c
This is a preperatory patch for the rework of the local apic timer
calibration.
No functional changes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Geode can safely use the TSC for highres, since:
1) Does not support frequency scaling,
2) The TSC _does_ count when the CPU is halted. Furthermore, the Geode
supports a mode called "suspension on halt", where Suspend mode (which
interacts with the power management states) is entered. TSC counting
during suspend mode is controlled by bit 8 of the Bus Controller
Configuration Register #0 (thanks Tom!).
3) no SMP :)
Check if "RTSC counts during suspension" and remove the requirement for
verification, so the clocksource code can safely select it as an timesource
for the highres timers subsystem.
Signed-off-by: Marcelo Tosatti <marcelo@kvack.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The TSC needs to be verified against another clocksource. Instead of using
hardwired assumptions of available hardware, provide a generic verification
mechanism. The verification uses the best available clocksource and handles
the usability for high resolution timers / dynticks of the clocksource which
needs to be verified.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The clocksource code allows direct updates of the rating of a given
clocksource now. Change TSC unstable tracking to use this interface and
remove the update callback.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using a flag filed allows to encode more than one information into a variable.
Preparatory patch for the generic clocksource verification.
[mingo@elte.hu: convert vmitime.c to the new clocksource flag]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
make the TSC synchronization code more robust, and unify it between x86_64 and
i386.
The biggest change is the removal of the 'fix up TSCs' code on x86_64 and
i386, in some rare cases it was /causing/ time-warps on SMP systems.
The new code only checks for TSC asynchronity - and if it can prove a
time-warp (if it can observe the TSC going backwards when going from one CPU
to another within a critical section), then the TSC clock-source is turned
off.
The TSC synchronization-checking code also got moved into a separate file.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Enqueue clocksources in rating order to make selection of the clocksource
easier. Also check the match with an user override at enqueue time.
Preparatory patch for the generic clocksource verification.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The delayed work code in arch/i386/kernel/tsc.c is an unused leftover of the
GTOD conversion. Remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a flag so we can prevent the irq balancing of an interrupt. Move the
bits, so we have room for more :)
Necessary for the ability to setup clocksources more flexible (e.g. use the
different HPET channels per CPU)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/i386/kernel/built-in.o: In function `vmi_stop_hz_timer':
: undefined reference to `next_timer_interrupt'
If CONFIG_NO_HZ, next_timer_interrupt() doesn't exist (and presumably doesn't
make sense).
Perhaps VMI shouildn't be playing with timer internals at this level.
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Start using v2 version of Longhaul when available. It provides
voltage scaling and can use ACPI C3 state. That's curious. CPU
will not change frequency on ACPI C3 when v1 is in use, but it will
when v2 is used. Driver will return max frequency all the time if
this isn't true for all processors. There is strange thing with
mobile voltage. Looks like only Nehemiah (C3-M) supports it.
Earlier processors have different mobile VRM (in docs), but I can't
find any which is using it. Looks like all are using VRM 8.5. So
fail for non Nehemiah with mobile VRM.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Solution for small, but nasty bug: access beyond end of f_table for C7 brand.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
After updating several machines to 2.6.20, I can't boot anymore the single
one of them that supports the NX bit and is configured as a 32-bit system.
My understanding is that the VDSO changes in 2.6.20-rc7 were not fully
cooked, in that with that config option enabled VDSO_SYM(x) now equals
x, meaning that an address in the fixmap area is now being passed to
apps via AT_SYSINFO. However, the page is mapped with PAGE_READONLY
rather than PAGE_READONLY_EXEC.
I'm not certain whether having app code go through the fixmap area is
intended, but in case it is here is the simple patch that makes things work
again.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Extern declarations belong in headers. Times, they are a'changin.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
===================================================================
When I implemented the DECLARE_PER_CPU(var) macros, I was careful that
people couldn't use "var" in a non-percpu context, by prepending
percpu__. I never considered that this would allow them to overload
the same name for a per-cpu and a non-percpu variable.
It is only one of many horrors in the i386 boot code, but let's rename
the non-perpcu cpu_gdt_descr to early_gdt_descr (not boot_gdt_descr,
that's something else...)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
===================================================================
The current code simply calls "start_kernel" directly if we're under a
hypervisor and no paravirt_ops backend wants us, because paravirt.c
registers that as a backend.
This was always a vain hope; start_kernel won't get far without setup.
It's also impossible for paravirt_ops backends which don't sit in the
arch/i386/kernel directory: they can't link before paravirt.o anyway.
Keep it simple: if we pass all the registered paravirt probes, BUG().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
The old Cyrix 5520 CPU detection code relied upon the PCI layer setup being
done earlier than the CPU setup, which is no longer true. Fortunately we
know that if the processor is a MediaGX we can do type 1 pci config
accesses to check the companion chip. We thus do those directly and from
this find the 5520 and implement the workarounds for the timer problem
Original report from takada@mbf.nifty.com, I sent a proposed patch which
Takara then corrected, tested and sent back to the list on 10th January.
Submitting for merging as it seems to have been missed
AK: Changed to use pci-direct.h and fix warning for !CONFIG_PCI (later
AK: originally from akpm)
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: <takada@mbf.nifty.com>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix bogus warning
linux/arch/i386/kernel/cpu/transmeta.c:12: warning: ‘cpu_freq’ may be used uninitialized in this function
Signed-off-by: Andi Kleen <ak@suse.de>
Fix bogus gcc warning
linux/arch/i386/kernel/microcode.c:387: warning: ‘new_mc’ may be used uninitialized in this function
Signed-off-by: Andi Kleen <ak@suse.de>
Just various new acronyms. The new popcnt bit is in the middle
of Intel space. This looks a little weird, but I've been assured
it's ok.
Also I fixed RDTSCP for i386 which was at the wrong place.
For i386 and x86-64.
Signed-off-by: Andi Kleen <ak@suse.de>
Original code doesn't write back to CCR4 register. This patch reflects a
value of a register.
Cc: Jordan Crouse <jordan.crouse@amd.com>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Sometimes developers need to see more object code in an oops report,
e.g. when kernel may be corrupted at runtime.
Add the "code_bytes" option for this.
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Remove the unused kernel config option X86_XADD, which is unused in any
source or header file.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Annotate i386/kernel/entry.S with END/ENDPROC to assist disassemblers and
other analysis tools.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
I hope to support "classic" MediaGXm in kernel.
The DIR1 register of MediaGXm( or Geode) shows the following values for
identify CPU. For example, My MediaGXm shows 0x42.
We can read National Semiconductor's datasheet without any NDAs.
http://www.national.com/pf/GX/GXLV.html
from datasheets:
DIR1
0x30 - 0x33 GXm rev. 1.0 - 2.3
0x34 - 0x4f GXm rev. 2.4 - 3.x
0x5x GXm rev. 5.0 - 5.4
0x6x GXLV
0x7x (unknow)
0x8x Gx1
In nsc driver of X, accept 0x30 through 0x82. What will 0x7x mean?
Cc: Jordan Crouse <jordan.crouse@amd.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
setcc() in math-emu is written as a gcc extension statement expression
macro that returns a value. However, it's not used that way and it's not
needed like that, so just make it a inline function so that we
don't use an extension when it's not needed.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
All Transmeta CPUs ever produced have constant-rate TSCs.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
During kernel bootup, a new T60 laptop (CoreDuo, 32-bit) hangs about
10%-20% of the time in acpi_init():
Calling initcall 0xc055ce1a: topology_init+0x0/0x2f()
Calling initcall 0xc055d75e: mtrr_init_finialize+0x0/0x2c()
Calling initcall 0xc05664f3: param_sysfs_init+0x0/0x175()
Calling initcall 0xc014cb65: pm_sysrq_init+0x0/0x17()
Calling initcall 0xc0569f99: init_bio+0x0/0xf4()
Calling initcall 0xc056b865: genhd_device_init+0x0/0x50()
Calling initcall 0xc056c4bd: fbmem_init+0x0/0x87()
Calling initcall 0xc056dd74: acpi_init+0x0/0x1ee()
It's a hard hang that not even an NMI could punch through! Frustratingly,
adding printks or function tracing to the ACPI code made the hangs go away
...
After some time an additional detail emerged: disabling the NMI watchdog
made these occasional hangs go away.
So i spent the better part of today trying to debug this and trying out
various theories when i finally found the likely reason for the hang: if
acpi_ns_initialize_devices() executes an _INI AML method and an NMI
happens to hit that AML execution in the wrong moment, the machine would
hang. (my theory is that this must be some sort of chipset setup method
doing stores to chipset mmio registers?)
Unfortunately given the characteristics of the hang it was sheer
impossible to figure out which of the numerous AML methods is impacted
by this problem.
As a workaround i wrote an interface to disable chipset-based NMIs while
executing _INI sections - and indeed this fixed the hang. I did a
boot-loop of 100 separate reboots and none hung - while without the patch
it would hang every 5-10 attempts. Out of caution i did not touch the
nmi_watchdog=2 case (it's not related to the chipset anyway and didnt
hang).
I implemented this for both x86_64 and i686, tested the i686 laptop both
with nmi_watchdog=1 [which triggered the hangs] and nmi_watchdog=2, and
tested an Athlon64 box with the 64-bit kernel as well. Everything builds
and works with the patch applied.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mtrr: fix size_or_mask and size_and_mask
This fixes two bugs in /proc/mtrr interface:
o If physical address size crosses the 44 bit boundary
size_or_mask is evaluated wrong.
o size_and_mask limits width of physical base
address for an MTRR to be less than 44 bits.
TBD: later patch had one more change, but I think that was bogus.
TBD: need to double check
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Remove all parameters from this function that aren't really variable.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Use adding __init to romsignature() (it's only called from probe_roms()
which is itself __init) as an excuse to submit a pedantic cleanup.
Signed-off-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Clean up sched_clock() on i686: it will use the TSC if available and falls
back to jiffies only if the user asked for it to be disabled via notsc or
the CPU calibration code didnt figure out the right cpu_khz.
This generally makes the scheduler timestamps more finegrained, on all
hardware. (the current scheduler is pretty resistant against asynchronous
sched_clock() values on different CPUs, it will allow at most up to a jiffy
of jitter.)
Also simplify sched_clock()'s check for TSC availability: propagate the
desire and ability to use the TSC into the tsc_disable flag, previously
this flag only indicated whether the notsc option was passed. This makes
the rare low-res sched_clock() codepath a single branch off a read-mostly
flag.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Add a notifier mechanism to the low level idle loop. You can register a
callback function which gets invoked on entry and exit from the low level idle
loop. The low level idle loop is defined as the polling loop, low-power call,
or the mwait instruction. Interrupts processed by the idle thread are not
considered part of the low level loop.
The notifier can be used to measure precisely how much is spent in useless
execution (or low power mode). The perfmon subsystem uses it to turn on/off
monitoring.
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Every file should include the headers containing the prototypes for
it's global functions.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
o Entry startup_32 was in .text section but it was accessing some init
data too and it prompts MODPOST to generate compilation warnings.
WARNING: vmlinux - Section mismatch: reference to .init.data:boot_params from
.text between '_text' (at offset 0xc0100029) and 'startup_32_smp'
WARNING: vmlinux - Section mismatch: reference to .init.data:boot_params from
.text between '_text' (at offset 0xc0100037) and 'startup_32_smp'
WARNING: vmlinux - Section mismatch: reference to
.init.data:init_pg_tables_end from .text between '_text' (at offset
0xc0100099) and 'startup_32_smp'
o Can't move startup_32 to .init.text as this entry point has to be at the
start of bzImage. Hence moved startup_32 to a new section .text.head and
instructed MODPOST to not to generate warnings if init data is being
accessed from .text.head section. This code has been audited.
o SMP boot up code (startup_32_smp) can go into .init.text if CPU hotplug
is not supported. Otherwise it generates more warnings
WARNING: vmlinux - Section mismatch: reference to .init.data:new_cpu_data from
.text between 'checkCPUtype' (at offset 0xc0100126) and 'is486'
WARNING: vmlinux - Section mismatch: reference to .init.data:new_cpu_data from
.text between 'checkCPUtype' (at offset 0xc0100130) and 'is486'
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Deliberate register clobber around performance critical inline code is great for
testing, bad to leave on by default. Many people ship with DEBUG_KERNEL turned
on, so stop making DEBUG_PARAVIRT default on.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Because timer code moves around, and we might eventually move our init to a
late_time_init hook, save and restore IRQs around this code because it is
definitely not interrupt safe.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Kprobes bugfix for paravirt compatibility - RPL on the CS when inserting
BPs must match running kernel.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Eric Biederman <ebiederm@xmission.com>
Profile_pc was broken when using paravirtualization because the
assumption the kernel was running at CPL 0 was violated, causing
bad logic to read a random value off the stack.
The only way to be in kernel lock functions is to be in kernel
code, so validate that assumption explicitly by checking the CS
value. We don't want to be fooled by BIOS / APM segments and
try to read those stacks, so only match KERNEL_CS.
I moved some stuff in segment.h to make it prettier.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
VMI timer code. It works by taking over the local APIC clock when APIC is
configured, which requires a couple hooks into the APIC code. The backend
timer code could be commonized into the timer infrastructure, but there are
some pieces missing (stolen time, in particular), and the exact semantics of
when to do accounting for NO_IDLE need to be shared between different
hypervisors as well. So for now, VMI timer is a separate module.
[Adrian Bunk: cleanups]
Subject: VMI timer patches
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Add VMI SMP boot hook. We emulate a regular boot sequence and use the same
APIC IPI initiation, we just poke magic values to load into the CPU state when
the startup IPI is received, rather than having to jump through a real mode
trampoline.
This is all that was needed to get SMP to work.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
I found a clever way to make the extra IOPL switching invisible to
non-paravirt compiles - since kernel_rpl is statically defined to be zero
there, and only non-zero rpl kernel have a problem restoring IOPL, as popf
does not restore IOPL flags unless run at CPL-0.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
The VMI ROM has a mode where hypercalls can be queued and batched. This turns
out to be a significant win during context switch, but must be done at a
specific point before side effects to CPU state are visible to subsequent
instructions. This is similar to the MMU batching hooks already provided.
The same hooks could be used by the Xen backend to implement a context switch
multicall.
To explain a bit more about lazy modes in the paravirt patches, basically, the
idea is that only one of lazy CPU or MMU mode can be active at any given time.
Lazy MMU mode is similar to this lazy CPU mode, and allows for batching of
multiple PTE updates (say, inside a remap loop), but to avoid keeping some
kind of state machine about when to flush cpu or mmu updates, we just allow
one or the other to be active. Although there is no real reason a more
comprehensive scheme could not be implemented, there is also no demonstrated
need for this extra complexity.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
The VMI backend uses explicit page type notification to track shadow page
tables. The allocation of page table roots is especially tricky. We need to
clone the root for non-PAE mode while it is protected under the pgd lock to
correctly copy the shadow.
We don't need to allocate pgds in PAE mode, (PDPs in Intel terminology) as
they only have 4 entries, and are cached entirely by the processor, which
makes shadowing them rather simple.
For base page table level allocation, pmd_populate provides the exact hook
point we need. Also, we need to allocate pages when splitting a large page,
and we must release pages before returning the page to any free pool.
Despite being required with these slightly odd semantics for VMI, Xen also
uses these hooks to determine the exact moment when page tables are created or
released.
AK: All nops for other architectures
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Every file should #include the headers containing the prototypes for
its global functions.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
This is just cleanup. It moves to e820 check into pci_mmcfg_reject_broken().
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Currently, unreachable_devices() compares value of mmconfig and value
of conf1. But it doesn't check the device is reachable or not.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
This rejects broken MCFG tables on Asus. When the table
looks bogus just disable mmconfig
Arjan and Andi suggested this.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Put back the resource reservation as per
4c6e052adf but use it *only* when the range(s)
come from a chipset probe instead of the bios.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
It seems that the only way to reliably support mmconfig in the presence of
funky biosen is to detect the hostbridge and read where the window is mapped
from its registers. Do that for the E7520 and the 945G/GZ/P/PL for a start.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
unreachable_devices compares between the results of pci configuration accesses
through type1 and mmconfig, so it should be called only if type1 actually
works in the first place.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
i386 and x86-64 pci mmconfig code have a lot in common. So share what's
shareable between the two.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
The "fasteoi" IRQ handler is named "fasteio" incorrectly. This is a fix.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Convert the PDA code to use %fs rather than %gs as the segment for
per-processor data. This is because some processors show a small but
measurable performance gain for reloading a NULL segment selector (as %fs
generally is in user-space) versus a non-NULL one (as %gs generally is).
On modern processors the difference is very small, perhaps undetectable.
Some old AMD "K6 3D+" processors are noticably slower when %fs is used
rather than %gs; I have no idea why this might be, but I think they're
sufficiently rare that it doesn't matter much.
This patch also fixes the math emulator, which had not been adjusted to
match the changed struct pt_regs.
[frederik.deweerdt@gmail.com: fixit with gdb]
[mingo@elte.hu: Fix KVM too]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Ian Campbell <Ian.Campbell@XenSource.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Zachary Amsden <zach@vmware.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
[akpm@osdl.org: sparc64 fix]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Catch illegally nested kmap_atomic()s even if the page that is mapped by
the 'inner' instance is from lowmem.
This avoids spuriously zapped kmap-atomic ptes and turns hard to find
crashes into clear asserts at the bug site.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Part of long forgotten patch
http://groups.google.com/group/fa.linux.kernel/msg/e98e941ce1cf29f6?dmode=source
Since then, m32r grabbed two copies.
Leave s390 copy because of important absence of CONFIG_VT, but remove
references to non-existent timerlist_lock. ia64 also loses timerlist_lock.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update all arch/*/kernel/vmlinux.lds.S to not include space for initramfs
when CONFIG_BLK_DEV_INITRAMFS is not selected. This saves another 4 kbytes
on most platfoms (some reserve PAGE_SIZE for initramfs).
Signed-off-by: Jean-Paul Saman <jean-paul.saman@nxp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As Andi pointed out: CONFIG_GENERIC_ISA_DMA only disables the ISA DMA
channel management. Other functionality may still expect GFP_DMA to
provide memory below 16M. So we need to make sure that CONFIG_ZONE_DMA is
set independent of CONFIG_GENERIC_ISA_DMA. Undo the modifications to
mm/Kconfig where we made ZONE_DMA dependent on GENERIC_ISA_DMA and set
theses explicitly in each arches Kconfig.
Reviews must occur for each arch in order to determine if ZONE_DMA can be
switched off. It can only be switched off if we know that all devices
supported by a platform are capable of performing DMA transfers to all of
memory (Some arches already support this: uml, avr32, sh sh64, parisc and
IA64/Altix).
In order to switch ZONE_DMA off conditionally, one would have to establish
a scheme by which one can assure that no drivers are enabled that are only
capable of doing I/O to a part of memory, or one needs to provide an
alternate means of performing an allocation from a specific range of memory
(like provided by alloc_pages_range()) and insure that all drivers use that
call. In that case the arches alloc_dma_coherent() may need to be modified
to call alloc_pages_range() instead of relying on GFP_DMA.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This change should make Longhaul more compatible with
both ver. 2 and Powersaver processors. Voltage transitions
will be done before or after frequency transition. That depends
on direction of change. I don't know how to force conservative
governor when voltage scaling is enabled, so there is only
a warning for user. Minimal voltage is calculated in different
way now because in this way more power is saved at lower
multipliers.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
This is driver for Enhanced Powersaver which is present in VIA C7
processors. Beta tested by Jorgen (jorgen (at) greven dot dk).
Thanks! Based on documentation provided by Dave Jones (Thanks!)
and C7 Eden datasheet available from www.via.com.tw. Looks like all
these C7 Eden CPU's don't have P-states in BIOS. I know that 2
p-states is low, but Jorgen finds it usefull anyway because board
is passive cooled.
There are 3 different types of C7 processors (called brands):
0. C7-M - these processors can set any maultiplier between min and
max, any voltage between min and max.
1. C7 - only min and max states are supported. Voltage is different
for min and max states.
2. Eden - only min and max states are supported. Looks like this
brand can only change multiplier. Voltage seems to be the same for
min and max frequency.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
The arch hooks arch_setup_msi_irq and arch_teardown_msi_irq are now
responsible for allocating and freeing the linux irq in addition to
setting up the the linux irq to work with the interrupt.
arch_setup_msi_irq now takes a pci_device and a msi_desc and returns
an irq.
With this change in place this code should be useable by all platforms
except those that won't let the OS touch the hardware like ppc RTAS.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I don't know why it is working and how, but it is working. On my
Epia transition time is by default set to 100us. I'm changing it to
200us. After that I can change frequency from min (x4.0) to max (x7.5)
without lockup. Many times.
There is a paranoid check at a beginning of a patch. Probably dead
code, but I don't have better ideas for CL10000 case at the moment.
Only way to to detect broken chip seems to be looking in log for
spurious interrupts.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
This is bug reported by John-Marc Chandonia:
> Detected 1002.292 MHz processor.
> longhaul: VIA C3 'Nehemiah B' [C5N] CPU detected. Powersaver supported.
> longhaul: Using throttling support.
> longhaul: Invalid (reserved) FSB!
FSB is correcly guessed for 999.554 MHz CPU.
To fix this error:
- ROUNDING should be range, not mask - at it's current value it is +7 -8,
- more precise calculations inside guess_fsb - 7.5x133MHz is 1000MHz now.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
When calling into the EFI firmware, the parameters need to be passed on
the stack. The recent change to use -mregparm=3 breaks x86 EFI support.
This patch is needed to allow the new Intel-based Macs to suspend to ram
(efi.get_time is called during the suspend phase).
Signed-off-by: Frederic Riss <frederic.riss@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now there is no need to depend on -1 in Nehemiah tables. After
previous change code is eliminating multipliers lower then 5.0
by minmult for Nehemiah A and B.
Signed-off-by: Rafa Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Looks like some time ago I introduced a bug to Longhaul.
I had report that 9x133Mhz CPU is seen as 5x133MHz. So I
changed multipliers table. That was a mistake. According to
documentation table was correct. So only way to avoid 5 or 9
dilema is not use MaxMHzBR for PowerSaver 1.0. One code that
works on all processors. To do it I need also separate flag for
Nehemiah C (min = x4.0) and Nehemiah (min = x5.0).
Signed-off-by: Rafa Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
This fixes the cpuinfo_cur_freq value by using the correct
find_khz_freq_from_fiddid() when the CPU uses hardware p-states.
Signed-off-by: Joachim Deguara <joachim.deguara@amd.com>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
There is no need to have this option in Longhaul anymore.
It was for laptop with CLE266 chipset in times, when only
ACPI C3 was used to switch frequency. Now we have native
support not only for CLE266, but CN400 too. Would be good
to have support for PN266, but I can't find datasheet for it.
Looks like BIOS for CPU's faster then 1GHz don't support
ACPI C2 nor C3.
Signed-off-by: Rafa Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
i386 srat.c broke due to re-names from ACPICA table-manager re-write.
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This reverts commit e4f0ae0ea6.
It's not wrong, but it's not right either, and everybody seems to agree
that the right fix is probably to do the ccr3 write after the ccr4 one
(and that we also should clean it up a bit). And after that we need to
really validate that all the bits that we write to ccr4 actually do
work.
The old 2.6.19 code was insane, and basically didn't change ccr4 at all
(even though it certainly looks like it was the *intent* to do so). So
let's revert the change that may fix things, just because it's not what
was actually ever tested when the code was written, even if it _was_ the
intent.
There's a discussion on http://lkml.org/lkml/2007/1/9/63 that was
started by the patch that now gets reverted, and that discussion may
well contain the proper long-term fix.
Suggested-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
i386 boot/compressed/relocs checks for absolute symbols and warns about
unexpected ones. If you build with modversions, you get ~2500 warnings
about __crc_<symbol>. These suckers are really absolute symbols - we
do _not_ want to modify them on relocation.
They are generated by genksyms - EXPORT_... generates a weak alias, then
genksyms produces an ld script with __crc_<symbol> = <checksum> and it's
fed to ld to produce the final object file. Their only use is to match
kernel and module at modprobe time; they _must_ be absolute.
boot/compressed/relocs has a whitelist of known absolute symbols, but
it doesn't know about __crc_... stuff. As the result, we get shitloads
of false positives on any ld(1) version.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the world was a simple and static place setting up irqs was easy.
It sufficed to allocate a linux irq number and a find a free cpu
vector we could receive that linux irq on. In those days it was
a safe assumption that any allocated vector was actually in use
so after one global pass through all of the vectors we would have
none left.
These days things are much more dynamic with interrupt controllers
(in the form of MSI or MSI-X) appearing on plug in cards and linux
irqs appearing and disappearing. As these irqs come and go vectors
are allocated and freed, invalidating the ancient assumption that all
allocated vectors stayed in use forever.
So this patch modifies the vector allocator to walk through every
possible vector before giving up, and to check to see if a vector
is in use before assigning it. With these changes we stop leaking
freed vectors and it becomes possible to allocate and free irq vectors
all day long.
This changed was modeled after the vector allocator on x86_64 where
this limitation has already been removed. In essence we don't update
the static variables that hold the position of the last vector we
allocated until have successfully allocated another vector. This
allows us to detect if we have completed one complete scan through
all of the possible vectors.
Acked-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This workaround unnecessarily cripples functionality to work
around an errata that doesn't seem possible to hit due to
us using the automatic clock throttling in the p4 mcheck code.
See http://lkml.org/lkml/2006/10/28/148 for complete reasoning
and lack of disconsent.
Signed-off-by: Dave Jones <davej@redhat.com>
This patch fixes core dumps to include the vDSO vma, which is left out now.
It removes the special-case core writing macros, which were not doing the
right thing for the vDSO vma anyway. Instead, it uses VM_ALWAYSDUMP in the
vma; there is no need for the fixmap page to be installed. It handles the
CONFIG_COMPAT_VDSO case by making elf_core_dump use the fake vma from
get_gate_vma after real vmas in the same way the /proc/PID/maps code does.
This changes core dumps so they no longer include the non-PT_LOAD phdrs from
the vDSO. I made the change to add them in the first place, but in turned out
that nothing ever wanted them there since the advent of NT_AUXV. It's cleaner
to leave them out, and just let the phdrs inside the vDSO image speak for
themselves.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I wouldn't mind if CONFIG_COMPAT_VDSO went away entirely. But if it's there,
it should work properly. Currently it's quite haphazard: both real vma and
fixmap are mapped, both are put in the two different AT_* slots, sysenter
returns to the vma address rather than the fixmap address, and core dumps yet
are another story.
This patch makes CONFIG_COMPAT_VDSO disable the real vma and use the fixmap
area consistently. This makes it actually compatible with what the old vdso
implementation did.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I'm trying to remove drivers/acpi/motherboard.c, which is mostly
redundant with drivers/pnp/system.c. So make sure that we include the
PNP driver in the default config. Most distros enable this already.
Turning on CONFIG_PNP also causes the following options to be enabled:
CONFIG_PNPACPI
CONFIG_SERIAL_8250_PNP
CONFIG_SERIAL_8250_PNP causes legacy serial ports to be discovered
twice, which is ugly but harmless:
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:07: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The paravirt subsystem is still in flux so all exports from it are
definitely internal use only. The APIs around this /will/ change.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit f2802e7f57 and its x86 version
(b7471c6da9) adds nmi_known_cpu() check
while parsing boot options in x86_64 and i386.
With that, "nmi_watchdog=2" stops working for me on Intel Core 2 CPU
based system.
The problem is, setup_nmi_watchdog is called while parsing the boot
option and identify_cpu is not done yet. So, the return value of
nmi_known_cpu() is not valid at this point.
So revert that check. This should not have any adverse effect as the
nmi_known_cpu() check is done again later in enable_lapic_nmi_watchdog().
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current PDA code, which went in in post 2.6.19 has a flaw in that it
doesn't correctly cycle the GDT and %GS segment through the boot PDA,
the CPU PDA and finally the per-cpu PDA.
The bug generally doesn't show up if the boot CPU id is zero, but
everything falls apart for a non zero boot CPU id. The basically kills
voyager which is perfectly capable of doing non zero CPU id boots, so
voyager currently won't boot without this.
The fix is to be careful and actually do the GDT setups correctly.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>