Commit Graph

3132 Commits

Author SHA1 Message Date
Andreas Herrmann
64fe44c38b x86: pat.c introduce function to check for conflicts with existing memtypes
... to strip down loop body in reserve_memtype.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:05:57 +02:00
Andreas Herrmann
f6887264de x86: pat.c consolidate list_add handling in reserve_memtype
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:05:55 +02:00
Andreas Herrmann
3e9c83b309 x86: pat.c consolidate error/debug messages in reserve_memtype
... and move last debug message out of locked section.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:05:51 +02:00
Andreas Herrmann
69e26be9b1 x86: pat.c more trivial changes
- add BUG statement to catch invalid start and end parameters
 - No need to track the actual type in both req_type and actual_type --
   keep req_type unchanged.
 - removed (IMHO) superfluous comments

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:05:50 +02:00
Andreas Herrmann
ac97991ec9 x86: pat.c choose more crisp variable names
- parse/ml => entry (within list_for_each and friends)
 - new_entry => new
 - ret_type => new_type (to avoid confusion with req_type)

(... to make it more readable...)

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:05:49 +02:00
Andreas Herrmann
bcc643dc28 x86: introduce macro to check whether an address range is in the ISA range
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:05:48 +02:00
Avi Kivity
a9b21b6229 KVM: VMX: Fix host msr corruption with preemption enabled
Switching msrs can occur either synchronously as a result of calls to
the msr management functions (usually in response to the guest touching
virtualized msrs), or asynchronously when preempting a kvm thread that has
guest state loaded.  If we're unlucky enough to have the two at the same
time, host msrs are corrupted and the machine goes kaput on the next syscall.

Most easily triggered by Windows Server 2008, as it does a lot of msr
switching during bootup.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 12:26:17 +03:00
Avi Kivity
6bf6a9532f KVM: MMU: Fix oops on guest userspace access to guest pagetable
KVM has a heuristic to unshadow guest pagetables when userspace accesses
them, on the assumption that most guests do not allow userspace to access
pagetables directly. Unfortunately, in addition to unshadowing the pagetables,
it also oopses.

This never triggers on ordinary guests since sane OSes will clear the
pagetables before assigning them to userspace, which will trigger the flood
heuristic, unshadowing the pagetables before the first userspace access. One
particular guest, though (Xenner) will run the kernel in userspace, triggering
the oops.  Since the heuristic is incorrect in this case, we can simply
remove it.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 12:20:12 +03:00
Marcelo Tosatti
3094538739 KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)
kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed
guests properly. It will instantiate two 2MB sptes pointing to the same
physical 2MB page when a guest large pte update is trapped.

Instead of duplicating code to handle this, disallow directory level
updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes
emulating one guest 4MB pte can be correctly created by the page fault
handling path.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 12:18:18 +03:00
Marcelo Tosatti
6597ca09e6 KVM: MMU: Fix rmap_write_protect() hugepage iteration bug
rmap_next() does not work correctly after rmap_remove(), as it expects
the rmap chains not to change during iteration.  Fix (for now) by restarting
iteration from the beginning.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 12:17:10 +03:00
Marcelo Tosatti
06e0564566 KVM: close timer injection race window in __vcpu_run
If a timer fires after kvm_inject_pending_timer_irqs() but before
local_irq_disable() the code will enter guest mode and only inject such
timer interrupt the next time an unrelated event causes an exit.

It would be simpler if the timer->pending irq conversion could be done
with IRQ's disabled, so that the above problem cannot happen.

For now introduce a new vcpu requests bit to cancel guest entry.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 12:16:59 +03:00
Marcelo Tosatti
d4acf7e7ab KVM: Fix race between timer migration and vcpu migration
A guest vcpu instance can be scheduled to a different physical CPU
between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable().

If that happens, the timer will only be migrated to the current pCPU on
the next exit, meaning that guest LAPIC timer event can be delayed until
a host interrupt is triggered.

Fix it by cancelling guest entry if any vcpu request is pending.  This
has the side effect of nicely consolidating vcpu->requests checks.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 12:16:52 +03:00
Ingo Molnar
6ff10de374 x86: fix "x86: use cpu_khz for loops_per_jiffy calculation"
fix:

arch/x86/kernel/tsc_32.c: In function ‘tsc_init':
arch/x86/kernel/tsc_32.c:421: error: ‘lpj_tsc' undeclared (first use in this function)
arch/x86/kernel/tsc_32.c:421: error: (Each undeclared identifier is reported only once
arch/x86/kernel/tsc_32.c:421: error: for each function it appears in.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 09:29:03 +02:00
Alok Kataria
3da757daf8 x86: use cpu_khz for loops_per_jiffy calculation
On the x86 platform we can use the value of tsc_khz computed during tsc
calibration to calculate the loops_per_jiffy value. Its very important
to keep the error in lpj values to minimum as any error in that may
result in kernel panic in check_timer. In virtualization environment, On
a highly overloaded host the guest delay calibration may sometimes
result in errors beyond the ~50% that timer_irq_works can handle,
resulting in the guest panicking.

Does some formating changes to lpj_setup code to now have a single
printk to print the bogomips value.

We do this only for the boot processor because the AP's can have
different base frequencies or the BIOS might boot a AP at a different
frequency.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Daniel Hecht <dhecht@vmware.com>
Cc: Tim Mann <mann@vmware.com>
Cc: Zach Amsden <zach@vmware.com>
Cc: Sahil Rihan <srihan@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-23 22:51:33 +02:00
Yinghai Lu
0754557d72 x86: change early_gart_iommu_check() back to any_mapped
Kevin Winchester reported a GART related direct rendering failure against
linux-next-20080611, which shows up via these log entries:

 PCI: Using ACPI for IRQ routing
 PCI: Cannot allocate resource region 0 of device 0000:00:00.0
 agpgart: Detected AGP bridge 0
 agpgart: Aperture conflicts with PCI mapping.
 agpgart: Aperture from AGP @ e0000000 size 128 MB
 agpgart: Aperture conflicts with PCI mapping.
 agpgart: No usable aperture found.
 agpgart: Consider rebooting with iommu=memaper=2 to get a good aperture.

instead of the expected:

 PCI: Using ACPI for IRQ routing
 agpgart: Detected AGP bridge 0
 agpgart: Aperture from AGP @ e0000000 size 128 MB

Kevin bisected it down to this change in tip/x86/gart:
"x86: checking aperture size order".

agp check is using request_mem_region(), and could fail if e820 is reserved...

change it back to e820_any_mapped().

Reported-and-bisected-by: "Kevin Winchester" <kjwinchester@gmail.com>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-23 13:12:04 +02:00
Ingo Molnar
a1d5a8691f x86: unify __set_fixmap, fix
fix build failure:

 arch/x86/mm/pgtable.c:280: warning: ‘enum fixed_addresses’ declared inside parameter list
 arch/x86/mm/pgtable.c:280: warning: its scope is only this definition or declaration, which is probably not what you want
 arch/x86/mm/pgtable.c:280: error: parameter 1 (‘idx’) has incomplete type

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 15:36:57 +02:00
Jeremy Fitzhardinge
aeaaa59c7e x86/paravirt/xen: add set_fixmap pv_mmu_ops
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 15:09:56 +02:00
Jeremy Fitzhardinge
d494a96125 x86: implement set_pte_vaddr
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 15:09:54 +02:00
Jeremy Fitzhardinge
7c7e6e07e2 x86: unify __set_fixmap
In both cases, I went with the 32-bit behaviour.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 15:09:51 +02:00
Jeremy Fitzhardinge
ebb9cfe20f xen: don't drop NX bit
Because NX is now enforced properly, we must put the hypercall page
into the .text segment so that it is executable.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: the arch/x86 maintainers <x86@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 14:56:41 +02:00
Jeremy Fitzhardinge
05345b0f00 xen: mask unwanted pte bits in __supported_pte_mask
[ Stable: this isn't a bugfix in itself, but it's a pre-requiste
  for "xen: don't drop NX bit" ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: the arch/x86 maintainers <x86@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 14:56:36 +02:00
Jeremy Fitzhardinge
a987b16cc6 xen: don't drop NX bit
Because NX is now enforced properly, we must put the hypercall page
into the .text segment so that it is executable.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: the arch/x86 maintainers <x86@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 14:55:13 +02:00
Jeremy Fitzhardinge
eb179e443d xen: mask unwanted pte bits in __supported_pte_mask
[ Stable: this isn't a bugfix in itself, but it's a pre-requiste
  for "xen: don't drop NX bit" ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: the arch/x86 maintainers <x86@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 14:55:11 +02:00
Jan Beulich
5f0120b578 x86-64: remove unnecessary ptregs call stubs
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: "Andi Kleen" <andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 14:25:11 +02:00
Jordan Crouse
ffe6e1da86 x86, geode: add a VSA2 ID for General Software
General Software writes their own VSA2 module for their version
of the Geode BIOS, which returns a different ID then the standard
VSA2.  This was causing the framebuffer driver to break for most
GSW boards.

Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Cc: tglx@linutronix.de
Cc: linux-geode@lists.infradead.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 14:19:03 +02:00
Andreas Herrmann
dd0c7c4903 x86: shrink pat_x_mtrr_type to its essentials
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 12:57:40 +02:00
Glauber Costa
b8e0418b2a x86: fix typo CONFIX -> CONFIG
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 11:59:11 +02:00
Bernhard Walle
d3942cff62 x86: use BOOTMEM_EXCLUSIVE on 32-bit
This patch uses the BOOTMEM_EXCLUSIVE for crashkernel reservation also for
i386 and prints a error message on failure.

The patch is still for 2.6.26 since it is only bug fixing. The unification
of reserve_crashkernel() between i386 and x86_64 should be done for 2.6.27.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
2008-06-19 10:08:48 +02:00
Mikael Pettersson
df17b1d990 x86, 32-bit: fix boot failure on TSC-less processors
Booting 2.6.26-rc6 on my 486 DX/4 fails with a "BUG: Int 6"
(invalid opcode) and a kernel halt immediately after the
kernel has been uncompressed. The BUG shows EIP pointing
to an rdtsc instruction in native_read_tsc(), invoked from
native_sched_clock().

(This error occurs so early that not even the serial console
can capture it.)

A bisection showed that this bug first occurs in 2.6.26-rc3-git7,
via commit 9ccc906c97:

>x86: distangle user disabled TSC from unstable
>
>tsc_enabled is set to 0 from the command line switch "notsc" and from
>the mark_tsc_unstable code. Seperate those functionalities and replace
>tsc_enable with tsc_disable. This makes also the native_sched_clock()
>decision when to use TSC understandable.
>
>Preparatory patch to solve the sched_clock() issue on 32 bit.
>
>Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

The core reason for this bug is that native_sched_clock() gets
called before tsc_init().

Before the commit above, tsc_32.c used a "tsc_enabled" variable
which defaulted to 0 == disabled, and which only got enabled late
in tsc_init(). Thus early calls to native_sched_clock() would skip
the TSC and use jiffies instead.

After the commit above, tsc_32.c uses a "tsc_disabled" variable
which defaults to 0, meaning that the TSC is Ok to use. Early calls
to native_sched_clock() now erroneously try to use the TSC on
!cpu_has_tsc processors, leading to invalid opcode exceptions.

My proposed fix is to initialise tsc_disabled to a "soft disabled"
state distinct from the hard disabled state set up by the "notsc"
kernel option. This fixes the native_sched_clock() problem. It also
allows tsc_init() to be simplified: instead of setting tsc_disabled = 1
on every error return, we just set tsc_disabled = 0 once when all
checks have succeeded.

I've verified that this lets my 486 boot again. I've also verified
that a Core2 machine still uses the TSC as clocksource after the patch.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 10:08:47 +02:00
Suresh Siddha
75118a82e2 x86: fix NULL pointer deref in __switch_to
Patrick McHardy reported a crash:

> > I get this oops once a day, its apparently triggered by something
> > run by cron, but the process is a different one each time.
> >
> > Kernel is -git from yesterday shortly before the -rc6 release
> > (last commit is the usb-2.6 merge, the x86 patches are missing),
> > .config is attached.
> >
> > I'll retry with current -git, but the patches that have gone in
> > since I last updated don't look related.
> >
> > [62060.043009] BUG: unable to handle kernel NULL pointer dereference at
> > 000001ff
> > [62060.043009] IP: [<c0102a9b>] __switch_to+0x2f/0x118
> > [62060.043009] *pde = 00000000
> > [62060.043009] Oops: 0002 [#1] PREEMPT

Vegard Nossum analyzed it:

> This decodes to
>
>    0:   0f ae 00                fxsave (%eax)
>
> so it's related to the floating-point context. This is the exact
> location of the crash:
>
> $ addr2line -e arch/x86/kernel/process_32.o -i ab0
> include/asm/i387.h:232
> include/asm/i387.h:262
> arch/x86/kernel/process_32.c:595
>
> ...so it looks like prev_task->thread.xstate->fxsave has become NULL.
> Or maybe it never had any other value.

Somehow (as described below) TS_USEDFPU is set but the fpu is not
allocated or freed.

Another possible FPU pre-emption issue with the sleazy FPU optimization
which was benign before but not so anymore, with the dynamic FPU allocation
patch.

New task is getting exec'd and it is prempted at the below point.

flush_thread() {
	...
	/*
	* Forget coprocessor state..
	*/
	clear_fpu(tsk);
		<----- Preemption point
	clear_used_math();
	...
}

Now when it context switches in again, as the used_math() is still set
and fpu_counter can be > 5, we will do a math_state_restore() which sets
the task's TS_USEDFPU. After it continues from the above preemption point
it does clear_used_math() and much later free_thread_xstate().

Now, at the next context switch, it is quite possible that xstate is
null, used_math() is not set and TS_USEDFPU is still set. This will
trigger unlazy_fpu() causing kernel oops.

Fix this  by clearing tsk's fpu_counter before clearing task's fpu.

Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-19 10:08:45 +02:00
Paolo Ciarrocchi
219835f10e x86: coding style fixes to x86/kernel/cpu/cpufreq/cpufreq-nforce2.c
Before:
total: 22 errors, 8 warnings, 440 lines checked

After:
total: 0 errors, 8 warnings, 442 lines checked

paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/cpufreq-nforce2.o.*
3d4330a5d188fe904446e5948a618b48  /tmp/cpufreq-nforce2.o.after
1477e6b0dcd6f59b1fb6b4490042eca6  /tmp/cpufreq-nforce2.o.before
^^^ I guess this is because I fixed a few "do not initialise statics to 0 or NULL"

paolo@paolo-desktop:~/linux.trees.git$ size /tmp/cpufreq-nforce2.o.*
   text    data     bss     dec     hex filename
   1923      72      16    2011     7db /tmp/cpufreq-nforce2.o.after
   1923      72      16    2011     7db /tmp/cpufreq-nforce2.o.before

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-18 15:00:14 +02:00
Paolo Ciarrocchi
f016e15c11 x86: coding style fixes to arch/x86/math-emu/reg_constant
Before:
total: 6 errors, 1 warnings, 117 lines checked

After:
total: 0 errors, 1 warnings, 117 lines checked

paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/reg_constant.o.*
780388a3056d58fb759efaf190d5d3d1  /tmp/reg_constant.o.after
780388a3056d58fb759efaf190d5d3d1  /tmp/reg_constant.o.before

paolo@paolo-desktop:~/linux.trees.git$ size /tmp/reg_constant.o.*
   text    data     bss     dec     hex filename
    457       0       0     457     1c9 /tmp/reg_constant.o.after
    457       0       0     457     1c9 /tmp/reg_constant.o.before

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-18 15:00:13 +02:00
Paolo Ciarrocchi
5175676a2d x86: coding style fixes to arch/x86/kernel/cpu/mcheck/k7.c
Before:
total: 6 errors, 13 warnings, 105 lines checked

After:
total: 0 errors, 0 warnings, 105 lines checked

paolo@paolo-desktop:~/linux.trees.git$ size /tmp/k7*
   text    data     bss     dec     hex filename
   1135       0       0    1135     46f /tmp/k7.o.after
   1135       0       0    1135     46f /tmp/k7.o.before

paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/k7*
87b14954045aa37dbaee6fb7e022ed9a  /tmp/k7.o.after
87b14954045aa37dbaee6fb7e022ed9a  /tmp/k7.o.before

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-18 15:00:12 +02:00
Paolo Ciarrocchi
fe94ae995d x86: coding style fixes to arch/x86/kernel/cpu/mcheck/p4.c
Before:
total: 16 errors, 34 warnings, 257 lines checked

After:
total: 0 errors, 2 warnings, 257 lines checked

No changes in the compiled code:

paolo@paolo-desktop:~/linux.trees.git$ size /tmp/p4*
   text    data     bss     dec     hex filename
   2644       4       4    2652     a5c /tmp/p4.o.after
   2644       4       4    2652     a5c /tmp/p4.o.before

paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/p4*
13f1b21c4246b31a28aaff38184586ca  /tmp/p4.o.after
13f1b21c4246b31a28aaff38184586ca  /tmp/p4.o.before

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-18 15:00:11 +02:00
Hugh Dickins
6cf514fce1 x86: PAT: make pat_x_mtrr_type() more readable
Clean up over-complications in pat_x_mtrr_type().
And if reserve_memtype() ignores stray req_type bits when
pat_enabled, it's better to mask them off when not also.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-18 11:51:08 +02:00
Daniel Rahn
b4b3bd96f2 x86: correctly report NR_BANKS in mce_64.c
attached is a no-brainer that makes kernel correctly report
NR_BANKS for MCE. We are right now limited to NR_BANKS==6, but the
error message will use the available number of banks instead of the
defined maximum.

For a Nehalem based system it will print:

"MCE: warning: using only 9 banks"

while the correct message would be

"MCE: warning: using only 6 banks"

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-18 10:29:58 +02:00
Linus Torvalds
42a886af72 x86-64: Fix "bytes left to copy" return value for copy_from_user()
Most users by far do not care about the exact return value (they only
really care about whether the copy succeeded in its entirety or not),
but a few special core routines actually care deeply about exactly how
many bytes were copied from user space.

And the unrolled versions of the x86-64 user copy routines would
sometimes report that it had copied more bytes than it actually had.

Very few uses actually have partial copies to begin with, but to make
this bug even harder to trigger, most x86 CPU's use the "rep string"
instructions for normal user copies, and that version didn't have this
issue.

To make it even harder to hit, the one user of this that really cared
about the return value (and used the uncached version of the copy that
doesn't use the "rep string" instructions) was the generic write
routine, which pre-populated its source, once more hiding the problem by
avoiding the exception case that triggers the bug.

In other words, very special thanks to Bron Gondwana who not only
triggered this, but created a test-program to show it, and bisected the
behavior down to commit 08291429cf ("mm:
fix pagecache write deadlocks") which changed the access pattern just
enough that you can now trigger it with 'writev()' with multiple
iovec's.

That commit itself was not the cause of the bug, it just allowed all the
stars to align just right that you could trigger the problem.

[ Side note: this is just the minimal fix to make the copy routines
  (with __copy_from_user_inatomic_nocache as the particular version that
  was involved in showing this) have the right return values.

  We really should improve on the exceptional case further - to make the
  copy do a byte-accurate copy up to the exact page limit that causes it
  to fail.  As it is, the callers have to do extra work to handle the
  limit case gracefully. ]

Reported-by: Bron Gondwana <brong@fastmail.fm>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

 (which didn't have this problem), and since
most users that do the carethis was very hard to trigger, but
2008-06-17 17:47:50 -07:00
Jiri Hladky
e01b70ef3e x86: fix bug in arch/i386/lib/delay.c file, delay_loop function
when trying to understand how Bogomips are implemented I have found a
bug in arch/i386/lib/delay.c file, delay_loop function.

The function fails for loops > 2^31+1. It because SF is set when dec
returns numbers > 2^31.

The fix is to use jnz instruction instead of jns (and add one decl
instruction to the end to have exactly the same number of loops as in
original version).

Martin Mares observed:

> It is a long time since I have hacked that file, but you should definitely
> make sure that the function is never called with a zero argument. In such
> case, the original version made just a single pass, but your version
> makes 2^32 of them.

fixed that.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-17 10:55:47 +02:00
Ingo Molnar
7aaaec38fc Merge branch 'linus' into x86/kconfig 2008-06-16 11:28:04 +02:00
Ingo Molnar
c54f9da1c8 Merge branch 'linus' into x86/irqstats 2008-06-16 11:27:53 +02:00
Ingo Molnar
d939d2851f Merge branch 'linus' into x86/irq 2008-06-16 11:27:45 +02:00
Ingo Molnar
33ee375b2e Merge branch 'linus' into x86/gart 2008-06-16 11:27:18 +02:00
Ingo Molnar
688d22e23a Merge branch 'linus' into x86/xen 2008-06-16 11:21:27 +02:00
Ingo Molnar
fd2c17e177 Merge branch 'linus' into x86/timers 2008-06-16 11:20:57 +02:00
Ingo Molnar
faeca31d06 Merge branch 'linus' into x86/pat 2008-06-16 11:20:28 +02:00
Ingo Molnar
064a32d82c Merge branch 'linus' into x86/memtest 2008-06-16 11:19:53 +02:00
Ingo Molnar
1791a78c0b Merge branch 'linus' into x86/cleanups 2008-06-16 11:17:50 +02:00
Ingo Molnar
28638ea4f8 Merge branch 'linus' into x86/nmi
Conflicts:

	arch/x86/kernel/nmi_32.c
2008-06-16 10:17:15 +02:00
Linus Torvalds
0269c5c6d9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: fixup write combine comment in pci_mmap_resource
  x86: PAT export resource_wc in pci sysfs
  x86, pci-dma.c: don't always add __GFP_NORETRY to gfp
  suspend-vs-iommu: prevent suspend if we could not resume
  x86: pci-dma.c: use __GFP_NO_OOM instead of __GFP_NORETRY
  pci, x86: add workaround for bug in ASUS A7V600 BIOS (rev 1005)
  PCI: use dev_to_node in pci_call_probe
  PCI: Correct last two HP entries in the bfsort whitelist
2008-06-14 13:32:56 -07:00
Stas Sergeev
1da2e3d679 provide rtc_cmos platform device
Recently (around 2.6.25) I've noticed that RTC no longer works for me.  It
turned out this is because I use pnpacpi=off kernel option to work around
the parport_pc bugs.  I always did so, but RTC used to work fine in the
past, and now it have regressed.

The patch fixes the problem by creating the platform device for the RTC
when PNP is disabled.  This may also help running the PNP-enabled kernel
on an older PCs.

Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Cc: David Brownell <david-b@pacbell.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-12 18:05:42 -07:00
Jesse Barnes
883eed1b3e Merge branch 'pci-for-jesse' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip into for-linus 2008-06-12 13:51:05 -07:00
Linus Torvalds
cbfa66b88d Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix pointer type warning in arch/x86/mm/init_64.c:early_memtest
  x86, lockdep: fix "WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()"
  x86: fix an incompatible pointer type warning on 64-bit compilations
  x86: fix lockdep warning during suspend-to-ram
  x86: fix unused variable 'loops' warning in arch/x86/boot/a20.c
  Revert "x86: fix ioapic bug again"
  x86: fix asm warning in head_32.S
  x86: fix endless page faults in mount_block_root for Linux 2.6
  geode: fix modular build
2008-06-12 12:55:32 -07:00
Kevin Winchester
f8a45704f5 x86: fix pointer type warning in arch/x86/mm/init_64.c:early_memtest
Changed the call to find_e820_area_size to pass u64 instead of unsigned long.

Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:36:23 +02:00
Vegard Nossum
4461145ef1 x86, lockdep: fix "WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()"
Alessandro Suardi reported:
> Recently upgraded my FC6 desktop to Fedora 9; with the
>  latest nautilus RPM updates my VNC session went nuts
>  with nautilus pegging the CPU for everything that breathed.
>
> I now reverted to an earlier nautilus package, but during
>  the peak CPU period my kernel spat this:
>
> [314185.623294] ------------[ cut here ]------------
> [314185.623414] WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()
> [314185.623514] Modules linked in: iptable_filter ip_tables x_tables
> sunrpc ipv6 fuse snd_via82xx snd_ac97_codec ac97_bus snd_mpu401_uart
> snd_rawmidi via686a hwmon parport_pc sg parport uhci_hcd ehci_hcd
> [314185.623924] Pid: 12314, comm: nautilus Not tainted 2.6.26-rc5-git2 #4
> [314185.624021]  [<c0115b95>] warn_on_slowpath+0x41/0x7b
> [314185.624021]  [<c010de70>] ? do_page_fault+0x2c1/0x5fd
> [314185.624021]  [<c0128396>] ? up_read+0x16/0x28
> [314185.624021]  [<c010de70>] ? do_page_fault+0x2c1/0x5fd
> [314185.624021]  [<c012fa33>] ? __lock_acquire+0xbb4/0xbc3
> [314185.624021]  [<c012d0a0>] check_flags+0x4c/0x128
> [314185.624021]  [<c012fa73>] lock_acquire+0x31/0x7d
> [314185.624021]  [<c0128cf6>] __atomic_notifier_call_chain+0x30/0x80
> [314185.624021]  [<c0128cc6>] ? __atomic_notifier_call_chain+0x0/0x80
> [314185.624021]  [<c0128d52>] atomic_notifier_call_chain+0xc/0xe
> [314185.624021]  [<c0128d81>] notify_die+0x2d/0x2f
> [314185.624021]  [<c01043b0>] do_int3+0x1f/0x4d
> [314185.624021]  [<c02f2d3b>] int3+0x27/0x2c
> [314185.624021]  =======================
> [314185.624021] ---[ end trace 1923f65a2d7bb246 ]---
> [314185.624021] possible reason: unannotated irqs-off.
> [314185.624021] irq event stamp: 488879
> [314185.624021] hardirqs last  enabled at (488879): [<c0102d67>]
> restore_nocheck+0x12/0x15
> [314185.624021] hardirqs last disabled at (488878): [<c0102dca>]
> work_resched+0x19/0x30
> [314185.624021] softirqs last  enabled at (488876): [<c011a1ba>]
> __do_softirq+0xa6/0xac
> [314185.624021] softirqs last disabled at (488865): [<c010476e>]
> do_softirq+0x57/0xa6
>
> I didn't seem to find it with some googling, so here it is.
>
> I was incidentally ltracing that process to try and find out
>  what was gulping down that much CPU (sorry, no idea
>  whether ltrace and the WARNING happened at the same
>  time or which came first) and:

Yeah, this is extremely likely to be the source of the warning.

The warning should be harmless, however.

> Box is my trusty noname K7-800, 512MB RAM; if there's
>  anything else useful I might be able to provide, just ask.

It would be interesting to see where the int3 comes from.  Too bad,
lockdep doesn't provide the register dump. The stacktrace also doesn't
go further than the int3(), I wonder if this int3 came from userspace?
The ltrace readme says "software breakpoints, like gdb", so I guess
this is the case. Yep, seems like it.

This looks relevant:

| commit fb1dac909d
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date:   Wed Jan 16 09:51:59 2008 +0100
|
|     lockdep: more hardirq annotations for notify_die()

I'm attaching a similarly-looking patch for this case (DO_VM86_ERROR),
though I suspect it might be missing for the other cases
(DO_ERROR/DO_ERROR_INFO) as well.

Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:27:19 +02:00
David Howells
eb53e9f3ea x86: fix an incompatible pointer type warning on 64-bit compilations
Fix an incompatible pointer type warning on x86_64 compilations.
early_memtest() is passing a u64* to find_e820_area_size() which is expecting
an unsigned long.  Change t_start and t_size to unsigned long as those are
also 64-bit types on x88_64.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:27:15 +02:00
Peter Zijlstra
e32e58a96d x86: fix lockdep warning during suspend-to-ram
Andrew Morton wrote:

> I've been seeing the below for a long time during suspend-to-ram on the Vaio.
>
>
> PM: Syncing filesystems ... done.
> PM: Preparing system for mem sleep
> Freezing user space processes ... <4>------------[ cut here ]------------
> WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x127()
> Modules linked in: i915 drm ipw2200 sonypi ipv6 autofs4 hidp l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables acpi_cpufreq nvram ohci1394 ieee1394 ehci_hcd uhci_hcd sg joydev snd_hda_intel snd_seq_dummy sr_mod snd_seq_oss cdrom snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ieee80211 pcspkr ieee80211_crypt snd_pcm i2c_i801 snd_timer i2c_core ide_pci_generic piix snd soundcore snd_page_alloc button ext3 jbd ide_disk ide_core [last unloaded: ipw2200]
> Pid: 3250, comm: zsh Not tainted 2.6.26-rc5 #1
>  [<c011c5f5>] warn_on_slowpath+0x41/0x6d
>  [<c01080e6>] ? native_sched_clock+0x82/0x96
>  [<c013789c>] ? mark_held_locks+0x41/0x5c
>  [<c0315688>] ? _spin_unlock_irqrestore+0x36/0x58
>  [<c0137a29>] ? trace_hardirqs_on+0xe6/0x10d
>  [<c0138637>] ? __lock_acquire+0xae3/0xb2b
>  [<c0313413>] ? schedule+0x39b/0x3b4
>  [<c0135596>] check_flags+0x4c/0x127
>  [<c01386b9>] lock_acquire+0x3a/0x86
>  [<c0315075>] _spin_lock+0x26/0x53
>  [<c0140660>] ? refrigerator+0x13/0xc3
>  [<c0140660>] refrigerator+0x13/0xc3
>  [<c012684a>] get_signal_to_deliver+0x3c/0x31e
>  [<c0102fe7>] do_notify_resume+0x91/0x6ee
>  [<c01359fd>] ? lock_release_holdtime+0x50/0x56
>  [<c0315688>] ? _spin_unlock_irqrestore+0x36/0x58
>  [<c0235d24>] ? read_chan+0x0/0x58c
>  [<c0137a29>] ? trace_hardirqs_on+0xe6/0x10d
>  [<c0315694>] ? _spin_unlock_irqrestore+0x42/0x58
>  [<c0230afa>] ? tty_ldisc_deref+0x5c/0x63
>  [<c0233104>] ? tty_read+0x66/0x98
>  [<c014b3f0>] ? audit_syscall_exit+0x2aa/0x2c5
>  [<c0109430>] ? do_syscall_trace+0x6b/0x16f
>  [<c0103a9c>] work_notifysig+0x13/0x1b
>  =======================
> ---[ end trace 25b49fe59a25afa5 ]---
> possible reason: unannotated irqs-off.
> irq event stamp: 58919
> hardirqs last  enabled at (58919): [<c0103afd>] syscall_exit_work+0x11/0x26

Joy - I so love entry.S

Best I can make of it:

syscall_exit_work
  resume_userspace
    DISABLE_INTERRUPTS
    (no TRACE_IRQS_OFF)
      work_pending
        work_notifysig
          do_notify_resume()
            do_signal()
              get_signal_to_deliver()
                try_to_freeze()
                  refrigerator()
                    task_lock() -> check_flags() -> BANG

The normal path is:

syscall_exit_work
  resume_userspace
    DISABLE_INTERRUPTS
    restore_all
      TRACE_IRQS_IRET
      iret

No idea why that would not warn..

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:27:09 +02:00
Manish Katiyar
52aaa12fbe x86: fix unused variable 'loops' warning in arch/x86/boot/a20.c
Following patch fixes the below warning message :
arch/x86/boot/a20.c:118: warning: unused variable 'loops'

Signed-off-by : Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:27:05 +02:00
Ingo Molnar
0b6a39f7eb Revert "x86: fix ioapic bug again"
This reverts commit 6e908947b4.

Németh Márton reported:

| there is a problem in 2.6.26-rc3 which was not there in case of
| 2.6.25: the CPU wakes up ~90,000 times per sec instead of ~60 per sec.
|
| I also "git bisected" the problem, the result is:
|
| 6e908947b4 is first bad commit
| commit 6e908947b4
| Author: Ingo Molnar <mingo@elte.hu>
| Date:   Fri Mar 21 14:32:36 2008 +0100
|
|     x86: fix ioapic bug again

the original problem is fixed by Maciej W. Rozycki in the tip/x86/apic
branch (confirmed by Márton), but those changes are too intrusive for
v2.6.26 so we'll go for the less intrusive (repeated) revert now.

Reported-and-bisected-by: Németh Márton <nm127@freemail.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:26:28 +02:00
Joe Korty
86b2b70e15 x86: fix asm warning in head_32.S
On Mon, May 19, 2008 at 04:10:02PM -0700, Linus Torvalds wrote:
> It also causes these warnings on 32-bit PAE:
>
> 	  AS      arch/x86/kernel/head_32.o
> 	arch/x86/kernel/head_32.S: Assembler messages:
> 	arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
> 	arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed
>
> and I do not see why (the end result seems to be identical).

Fix head_32.S gcc bignum warnings when CONFIG_PAE=y.

    arch/x86/kernel/head_32.S: Assembler messages:
    arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
    arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed

The assembler was stumbling over the 64-bit constant 0x100000000 in the
KPMDS #define.

Testing: a cmp(1) on head_32.o before and after shows the binary is unchanged.

Signed-off-by: Joe Korty <joe.korty@ccur.com
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Gabriel C <nix.or.die@googlemail.com>
Cc: Keith Packard <keithp@keithp.com>
Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: "Siddha Suresh B" <suresh.b.siddha@intel.com>
Cc: bugme-daemon@bugzilla.kernel.org
Cc: airlied@linux.ie
Cc: "Barnes Jesse" <jesse.barnes@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:26:12 +02:00
Henry Nestler
b29c701dea x86: fix endless page faults in mount_block_root for Linux 2.6
Page faults in kernel address space between PAGE_OFFSET up to
VMALLOC_START should not try to map as vmalloc.

Fix rarely endless page faults inside mount_block_root for root
filesystem at boot time.

All 32bit kernels up to 2.6.25 can fail into this hole.
I can not present this under native linux kernel. I see, that the 64bit
has fixed the problem. I copied the same lines into 32bit part.

Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation):
http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recursive-page-fault-endless.txt
The physicaly memory was trimmed down to 192MB to better catch the bug.
More memory gets the bug more rarely.

Details, how every x86 32bit system can fail:

Start from "mount_block_root",
http://lxr.linux.no/linux/init/do_mounts.c#L297
There the variable "fs_names" got one memory page with 4096 bytes.
Variable "p" walks through the existing file system types. The first
string is no problem.
But, with the second loop in mount_block_root the offset of "p" is not
at beginning of page, the offset is for example +9, if "reiserfs" is the
first in list.
Than calls do_mount_root, and lands in sys_mount.
Remember: Variable "type_page" contains now "fs_type+9" and not contains
a full page.
The sys_mount copies 4096 bytes with function "exact_copy_from_user()":
http://lxr.linux.no/linux/fs/namespace.c#L1540

Mostly exist pages after the buffer "fs_names+4096+9" and the page fault
handler was not called. No problem.

In the case, if the page after "fs_names+4096" is not mapped, the page
fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320

The do_page_fault gots an address 0xc03b4000.
It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's
from "__getname()" alias "kmem_cache_alloc".
The "error_code" is 0. "vmalloc_fault" will be call:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332

"vmalloc_fault" tryed to find the physical page for a non existing
virtual memory area. The macro "pte_present" in vmalloc_fault()
got a next page fault for 0xc0000ed0 at:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282

No PTE exist for such virtual address. The page fault handler was trying
to sync the physical page for the PTE lockup.

This called vmalloc_fault() again for address 0xc000000, and that also
was not existing. The endless began...

In normal case the cpu would still loop with disabled interrrupts. Under
coLinux this was catched by a stack overflow inside printk debugs.

Signed-off-by: Henry Nestler <henry.nestler@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-12 21:26:07 +02:00
Ingo Molnar
3703f39965 geode: fix modular build
-tip testing found this build bug:

 MODPOST 331 modules
 ERROR: "geode_mfgpt_toggle_event" [drivers/watchdog/geodewdt.ko] undefined!
 ERROR: "geode_mfgpt_alloc_timer" [drivers/watchdog/geodewdt.ko] undefined!
 make[1]: *** [__modpost] Error 1
 make: *** [modules] Error 2

with this config:

  http://redhat.com/~mingo/misc/config-Wed_Jun__4_18_01_59_CEST_2008.bad

export those symbols.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:25:51 +02:00
Linus Torvalds
dc10885d68 Merge branch 'core/iter-div' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/iter-div' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  always_inline timespec_add_ns
  add an inlined version of iter_div_u64_rem
  common implementation of iterative div/mod
2008-06-12 07:47:44 -07:00
Cyrill Gorcunov
f781b03c4b x86: touch_nmi_watchdog(): reset alert counters for supported nmi_watchdog modes only
The checking 'if nmi_watchdog > 0' (ie NMI_NONE) is quite fast but it
has a side effect - it's taken even if nmi_watchdog = NMI_DISABLED.

Nowadays nmi_watchdog is set up to NMI_NONE by default so this condition
is properly taken most the time but we better show this explicitly.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 15:14:54 +02:00
Rafael J. Wysocki
6703f6d10d x86, gart: add resume handling
If GART IOMMU is used on an AMD64 system, the northbridge registers
related to it should be restored during resume so that memory is not
corrupted.  Make gart_resume() handle that as appropriate.

Ref. http://lkml.org/lkml/2008/5/25/96 and the following thread.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 14:11:25 +02:00
Ingo Molnar
bb6dfb32f9 Merge branch 'linus' into x86/gart 2008-06-12 11:27:22 +02:00
Jeremy Fitzhardinge
f595ec964d common implementation of iterative div/mod
We have a few instances of the open-coded iterative div/mod loop, used
when we don't expcet the dividend to be much bigger than the divisor.
Unfortunately modern gcc's have the tendency to strength "reduce" this
into a full mod operation, which isn't necessarily any faster, and
even if it were, doesn't exist if gcc implements it in libgcc.

The workaround is to put a dummy asm statement in the loop to prevent
gcc from performing the transformation.

This patch creates a single implementation of this loop, and uses it
to replace the open-coded versions I know about.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:47:56 +02:00
Andreas Herrmann
499f8f84b8 x86: rename pat_wc_enabled to pat_enabled
BTW, what does pat_wc_enabled stand for? Does it mean
"write-combining"?

Currently it is used to globally switch on or off PAT support.
Thus I renamed it to pat_enabled.
I think this increases readability (and hope that I didn't miss
something).

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:27 +02:00
Andreas Herrmann
cd7a4e936d x86: PAT: fixed checkpatch errors (and whitespaces)
x86: PAT: fixed checkpatch errors (and whitespaces)

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:24 +02:00
Andreas Herrmann
97cfab6ac4 x86: PAT: fix ambiguous paranoia check in pat_init()
Starting with commit 8d4a430085 (x86:
cleanup PAT cpu validation) the PAT CPU feature flag is not cleared
anymore. Now the error message

  "PAT enabled, but CPU feature cleared"

in pat_init() is misleading.

Furthermore the current code does not check for existence of the PAT
CPU feature flag if a CPU is whitelisted in validate_pat_support.

This patch clears pat_wc_enabled if boot CPU has no PAT feature flag
and adapts the paranoia check.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:22 +02:00
Andreas Herrmann
ee863ba7ab x86: unconditionally enable PAT for AMD CPUs
If PAT support is advertised it should just work. No errata known.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:14:20 +02:00
Venki Pallipadi
c26421d019 x86: fix Xorg crash with xf86MapVidMem error
Clarify the usage of mtrr_lookup() in PAT code, and to make PAT code
resilient to mtrr lookup problems.

Specifically, pat_x_mtrr_type() is restructured to highlight, under what
conditions we look for mtrr hint. pat_x_mtrr_type() uses a default type
when there are any errors in mtrr lookup (still maintaining the pat
consistency). And, reserve_memtype() highlights its usage ot mtrr_lookup
for request type of '-1' and also defaults in a sane way on any mtrr
lookup failure.

pat.c looks at mtrr type of a range to get a hint on what mapping type
to request when user/API: (1) hasn't specified any type (/dev/mem
mapping) and we do not want to take performance hit by always mapping
UC_MINUS. This will be the case for /dev/mem mappings used to map BIOS
area or ACPI region which are WB'able. In this case, as long as MTRR is
not WB, PAT will request UC_MINUS for such mappings.

(2) user/API requests WB mapping while in reality MTRR may have UC or
WC. In this case, PAT can map as WB (without checking MTRR) and still
effective type will be UC or WC. But, a subsequent request to map same
region as UC or WC may fail, as the region will get trackked as WB in
PAT list. Looking at MTRR hint helps us to track based on effective type
rather than what user requested. Again, here mtrr_lookup is only used as
hint and we fallback to WB mapping (as requested by user) as default.

In both cases, after using the mtrr hint, we still go through the
memtype list to make sure there are no inconsistencies among multiple
users.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Rufus & Azrael <rufus-azrael@numericable.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:03:55 +02:00
Linus Torvalds
da50ccc6a0 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (23 commits)
  ACPICA: fix stray va_end() caused by mis-merge
  ACPI: Reject below-freezing temperatures as invalid critical temperatures
  ACPICA: Fix for access to deleted object <regression>
  ACPICA: Fix to make _SST method optional
  ACPICA: Fix for Load operator, load table at the namespace root
  ACPICA: Ignore ACPI table signature for Load() operator
  ACPICA: Fix to allow zero-length ASL field declarations
  ACPI: use memory_read_from_buffer()
  bay: exit if notify handler cannot be installed
  dock.c remove trailing printk whitespace
  proper prototype for acpi_processor_tstate_has_changed()
  ACPI: handle invalid ACPI SLIT table
  PNPACPI: use _CRS IRQ descriptor length for _SRS
  pnpacpi: fix shareable IRQ encode/decode
  pnpacpi: fix IRQ flag decoding
  MAINTAINERS: update ACPI homepage
  ACPI 2.6.26-rc2: Add missing newline to DSDT/SSDT warning message
  ACPI: EC: Use msleep instead of udelay while waiting for event.
  thinkpad-acpi: fix LED handling on older ThinkPads
  thinkpad-acpi: fix initialization error paths
  ...
2008-06-11 17:16:32 -07:00
Fenghua Yu
39b8931b5c ACPI: handle invalid ACPI SLIT table
This is a SLIT sanity checking patch.  It moves slit_valid() function to
generic ACPI code and does sanity checking for both x86 and ia64.  It sets up
node_distance with LOCAL_DISTANCE and REMOTE_DISTANCE when hitting invalid
SLIT table on ia64.  It also cleans up unused variable localities in
acpi_parse_slit() on x86.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-06-11 19:13:46 -04:00
Linus Torvalds
a4df1ac12d Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: MMU: Fix is_empty_shadow_page() check
  KVM: MMU: Fix printk() format string
  KVM: IOAPIC: only set remote_irr if interrupt was injected
  KVM: MMU: reschedule during shadow teardown
  KVM: VMX: Clear CR4.VMXE in hardware_disable
  KVM: migrate PIT timer
  KVM: ppc: Report bad GFNs
  KVM: ppc: Use a read lock around MMU operations, and release it on error
  KVM: ppc: Remove unmatched kunmap() call
  KVM: ppc: add lwzx/stwz emulation
  KVM: ppc: Remove duplicate function
  KVM: s390: Fix race condition in kvm_s390_handle_wait
  KVM: s390: Send program check on access error
  KVM: s390: fix interrupt delivery
  KVM: s390: handle machine checks when guest is running
  KVM: s390: fix locking order problem in enable_sie
  KVM: s390: use yield instead of schedule to implement diag 0x44
  KVM: x86 emulator: fix hypercall return value on AMD
  KVM: ia64: fix zero extending for mmio ld1/2/4 emulation in KVM
2008-06-11 10:35:44 -07:00
Thomas Gleixner
00dba56465 x86: move more common idle functions/variables to process.c
more unification. Should cause no change in functionality.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:29 +02:00
Thomas Gleixner
09fd4b4ef5 x86: use cpuid to check MWAIT support for C1
cpuid(0x05) provides extended information about MWAIT in EDX when bit
0 of ECX is set. Bit 4-7 of EDX determine whether MWAIT is supported
for C1. C1E enabled CPUs have these bits set to 0.

Based on an earlier patch from Andi Kleen.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:21 +02:00
Thomas Gleixner
732d7be17b x86: use cpuinfo to check for interrupt pending message msr
Simplify code: no need to do a cpuid(1) again. The cpuinfo structure
has all necessary information already.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:14 +02:00
Thomas Gleixner
aa83f3f2cf x86: cleanup C1E enabled detection
Rename the "MSR_K8_ENABLE_C1E" MSR to INT_PENDING_MSG, which is the
name in the data sheet as well. Move the C1E mask to the header file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:07 +02:00
Thomas Gleixner
6ddd2a2794 x86: simplify idle selection
default_idle is selected in cpu_idle(), when no other idle routine is
selected. Select it in select_idle_routine() when mwait is not
selected.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 15:52:01 +02:00
Paolo Ciarrocchi
5b80fe8bd7 x86: coding style fixes to arch/x86/kernel/sys_i386_32.c
Before:
total: 16 errors, 25 warnings, 246 lines checked

After:
total: 0 errors, 7 warnings, 246 lines checked

Compile tested.

paolo@paolo-desktop:/tmp$ size sys*
   text    data     bss     dec     hex filename
   1209       0       0    1209     4b9 sys_i386_32.o.after
   1209       0       0    1209     4b9 sys_i386_32.o.before

paolo@paolo-desktop:/tmp$ md5sum sys*
6144f6d6ce7342c3e192681a1ccaa1c1  sys_i386_32.o.after
6144f6d6ce7342c3e192681a1ccaa1c1  sys_i386_32.o.before

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:34:54 +02:00
Paolo Ciarrocchi
7058b06188 x86: coding style fixes to arch/x86/pci/irq.
Before:
total: 60 errors, 85 warnings, 1237 lines checked

After:
total: 1 errors, 82 warnings, 1226 lines checked

WARNING: line over 80 characters

Compile tested.

paolo@paolo-desktop:/tmp$ size irq.o.*
   text    data     bss     dec     hex filename
   6128     440      76    6644    19f4 irq.o.after
   6128     440      76    6644    19f4 irq.o.before

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:33:59 +02:00
Robert Richter
9e26d84273 fix build bug in "x86: add PCI extended config space access for AMD Barcelona"
Also much less code now.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:32:53 +02:00
Miquel van Smoorenburg
b7f09ae583 x86, pci-dma.c: don't always add __GFP_NORETRY to gfp
Currently arch/x86/kernel/pci-dma.c always adds __GFP_NORETRY
to the allocation flags, because it wants to be reasonably
sure not to deadlock when calling alloc_pages().

But really that should only be done in two cases:
- when allocating memory in the lower 16 MB DMA zone.
  If there's no free memory there, waiting or OOM killing is of no use
- when optimistically trying an allocation in the DMA32 zone
  when dma_mask < DMA_32BIT_MASK hoping that the allocation
  happens to fall within the limits of the dma_mask

Also blindly adding __GFP_NORETRY to the the gfp variable might
not be a good idea since we then also use it when calling
dma_ops->alloc_coherent(). Clearing it might also not be a
good idea, dma_alloc_coherent()'s caller might have set it
on purpose. The gfp variable should not be clobbered.

[ mingo@elte.hu: converted to delta patch ontop of previous version. ]

Signed-off-by: Miquel van Smoorenburg <miquels@cistron.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:22:18 +02:00
Andreas Herrmann
668231141f x86: fix compile warning in io_apic_{32,64}.c
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:21:10 +02:00
Yinghai Lu
6780711e98 x86: update mptable, fix
need to call early_reserve_e820() to preallocate mptable for 32bit

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:35:04 +02:00
Yinghai Lu
d49c428840 x86: make generic arch support NUMAQ
... so it could fall back to normal numa and we'd reduce the impact of the
NUMAQ subarch.

NUMAQ depends on GENERICARCH
also decouple genericarch numa from acpi.
also make it fall back to bigsmp if apicid > 8.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:34:42 +02:00
Ingo Molnar
4e78c91abe Revert "x86, numaq: add pci_acpi_scan_root() stub"
This reverts commit f329469097.

That bug will be fixed in a better way via:

  x86: make generic arch support NUMAQ
2008-06-10 11:33:01 +02:00
Yinghai Lu
e0da336468 x86: introduce max_physical_apicid for bigsmp switching
a multi-socket test-system with 3 or 4 ioapics, when 4 dualcore cpus or
2 quadcore cpus installed, needs to switch to bigsmp or physflat.

CPU apic id is [4,11] instead of [0,7], and we need to check max apic
id instead of cpu numbers.

also add check for 32 bit when acpi is not compiled in or acpi=off.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:32:09 +02:00
Yinghai Lu
c3ff01672a x86: fix boot failure with 64GB+ system with numa 32-bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:31:59 +02:00
Yinghai Lu
9043f00796 x86, numa, 32-bit: use find_e820_area() to find KVA RAM on node
don't assume we can use RAM near the end of every node.
Esp systems that have few memory and they could have
kva address and kva RAM all below max_low_pfn.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:31:52 +02:00
Yinghai Lu
cc1a9d86ce mm, x86: shrink_active_range() should check all
Now we are using register_e820_active_regions() instead of
add_active_range() directly. So end_pfn could be different between the
value in early_node_map to node_end_pfn.

So we need to make shrink_active_range() smarter.

shrink_active_range() is a generic MM function in mm/page_alloc.c but
it is only used on 32-bit x86. Should we move it back to some file in
arch/x86?

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:31:44 +02:00
Yinghai Lu
db3660c190 x86: remove all active memory ranges before registering them again after trimming - 64bit
this way we keep the early_node_map all right.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:30:47 +02:00
Ingo Molnar
2377494285 Revert "x86, 32-bit: SRAT fix"
This reverts commit ea57a5a6db, a better
fix will be merged.
2008-06-09 10:57:16 +02:00
Avi Kivity
3c9155106d KVM: MMU: Fix is_empty_shadow_page() check
The check is only looking at one of two possible empty ptes.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:36:33 +03:00
Avi Kivity
ebb0e6264c KVM: MMU: Fix printk() format string
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:36:20 +03:00
Linus Torvalds
256a13dd70 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86/PCI: add workaround for bug in ASUS A7V600 BIOS (rev 1005)
  PCI/x86: fix up PCI stuff so that PCI_GOANY supports OLPC
2008-06-06 11:33:08 -07:00
Avi Kivity
8d2d73b9a5 KVM: MMU: reschedule during shadow teardown
Shadows for large guests can take a long time to tear down, so reschedule
occasionally to avoid softlockup warnings.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:32:20 +03:00
Eli Collins
e693d71b46 KVM: VMX: Clear CR4.VMXE in hardware_disable
Clear CR4.VMXE in hardware_disable. There's no reason to leave it set
after doing a VMXOFF.

VMware Workstation 6.5 checks CR4.VMXE as a proxy for whether the CPU is
in VMX mode, so leaving VMXE set means we'll refuse to power on. With this
change the user can power on after unloading the kvm-intel module. I
tested on kvm-67 and kvm-69.

Signed-off-by: Eli Collins <ecollins@vmware.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:30:20 +03:00
Marcelo Tosatti
2f5997140f KVM: migrate PIT timer
Migrate the PIT timer to the physical CPU which vcpu0 is scheduled on,
similarly to what is done for the LAPIC timers, otherwise PIT interrupts
will be delayed until an unrelated event causes an exit.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:25:51 +03:00
Avi Kivity
33e3885de2 KVM: x86 emulator: fix hypercall return value on AMD
The hypercall instructions on Intel and AMD are different.  KVM allows the
guest to choose one or the other (the default is Intel), and if the guest
chooses incorrectly, KVM will patch it at runtime to select the correct
instruction.  This allows live migration between Intel and AMD machines.

This patching occurs in the x86 emulator.  The current code also executes
the hypercall.  Unfortunately, the tail end of the x86 emulator code also
executes, overwriting the return value of the hypercall with the original
contents of rax (which happens to be the hypercall number).

Fix not by executing the hypercall in the emulator context; instead let the
guest reissue the patched instruction and execute the hypercall via the
normal path.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:08:25 +03:00
Ingo Molnar
ea57a5a6db x86, 32-bit: SRAT fix
we were adding reserved BIOS ranges as general memory as well => not good.

solves this crash:

[   20.068075] hostname used greatest stack depth: 6464 bytes left
[   20.121404] BUG: unable to handle kernel <1>BUG: unable to handle kernel NULL pointer dereference at 00000b8c
[   20.121404] IP: [<c01160ae>] kmap_atomic_prot+0x2d/0x1c3
[   20.121404] *pdpt = 00000000367eb001 *pde = 0000000000000000
[   20.121404] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   20.121404]
[   20.121404] Pid: 2061, comm: rc.sysinit Not tainted (2.6.26-rc3 #2440)
[   20.121404] EIP: 0060:[<c01160ae>] EFLAGS: 00010206 CPU: 0
[   20.121404] EIP is at kmap_atomic_prot+0x2d/0x1c3

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-06 16:38:35 +02:00
Bertram Felgenhauer
9f67fd5db5 x86/PCI: add workaround for bug in ASUS A7V600 BIOS (rev 1005)
This BIOS claims the VIA 8237 south bridge to be compatible with VIA 586,
which it is not.

Without this patch, I get the following warning while booting,
among others,

| PCI: Using IRQ router VIA [1106/3227] at 0000:00:11.0
| ------------[ cut here ]------------
| WARNING: at arch/x86/pci/irq.c:265 pirq_via586_get+0x4a/0x60()
| Modules linked in:
| Pid: 1, comm: swapper Not tainted 2.6.26-rc4-00015-g1ec7d99 #1
|  [<c0119fd4>] warn_on_slowpath+0x54/0x70
|  [<c02246e0>] ? vt_console_print+0x210/0x2b0
|  [<c02244d0>] ? vt_console_print+0x0/0x2b0
|  [<c011a413>] ? __call_console_drivers+0x43/0x60
|  [<c011a482>] ? _call_console_drivers+0x52/0x80
|  [<c011aa89>] ? release_console_sem+0x1c9/0x200
|  [<c0291d21>] ? raw_pci_read+0x41/0x70
|  [<c0291e8f>] ? pci_read+0x2f/0x40
|  [<c029151a>] pirq_via586_get+0x4a/0x60
|  [<c02914d0>] ? pirq_via586_get+0x0/0x60
|  [<c029178d>] pcibios_lookup_irq+0x15d/0x430
|  [<c03b895a>] pcibios_irq_init+0x17a/0x3e0
|  [<c03a66f0>] ? kernel_init+0x0/0x250
|  [<c03a6763>] kernel_init+0x73/0x250
|  [<c03b87e0>] ? pcibios_irq_init+0x0/0x3e0
|  [<c0114d00>] ? schedule_tail+0x10/0x40
|  [<c0102dee>] ? ret_from_fork+0x6/0x1c
|  [<c03a66f0>] ? kernel_init+0x0/0x250
|  [<c03a66f0>] ? kernel_init+0x0/0x250
|  [<c010324b>] kernel_thread_helper+0x7/0x1c
|  =======================
| ---[ end trace 4eaa2a86a8e2da22 ]---

and IRQ trouble later,

| irq 10: nobody cared (try booting with the "irqpoll" option)

Now that's an VIA 8237 chip, so pirq_via586_get shouldn't be called
at all; adding this workaround to via_router_probe() fixes the
problem for me.

Amazingly I have a 2.6.23.8 kernel that somehow works fine ... I'll
never understand why.

Signed-off-by: Bertram Felgenhauer <int-e@gmx.de>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-05 15:32:15 -07:00
Andres Salomon
2bdd1b031b PCI/x86: fix up PCI stuff so that PCI_GOANY supports OLPC
Previously, one would have to specifically choose CONFIG_OLPC and
CONFIG_PCI_GOOLPC in order to enable PCI_OLPC.  That doesn't really work
for distro kernels, so this patch allows one to choose CONFIG_OLPC and
CONFIG_PCI_GOANY in order to build in OLPC support in a generic kernel (as
requested by Robert Millan).

This also moves GOOLPC before GOANY in the menuconfig list.

Finally, make pci_access_init return early if we detect OLPC hardware.
There's no need to continue probing stuff, and pci_pcbios_init
specifically trashes our settings (we didn't run into that before because
PCI_GOANY wasn't supported).

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-05 14:29:25 -07:00
Stefan Richter
16104b5504 x86: fix CONFIG_NONPROMISC_DEVMEM prompt and help text
Here is an attempt to translate the prompt and help text into something
which is legible and, as a bonus, correct.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-05 14:21:45 -07:00
mingo@elte.hu
75b9f5d2a0 x86, nmi: fix build
fix:

arch/x86/kernel/built-in.o: In function `proc_nmi_enabled':
: undefined reference to `nmi_watchdog_default'
arch/x86/kernel/built-in.o: In function `native_smp_prepare_cpus':
: undefined reference to `nmi_watchdog_default'

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-05 15:15:10 +02:00
Cyrill Gorcunov
1a1b1d1322 x86: watchdog - check for CPU is being supported
This patch does check if CPU is being recongnized
before call the unreserve(). Since enable_lapic_nmi_watchdog()
does have such a check the same is make sense here too
in a sake of code consistency (but nothing more).

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: macro@linux-mips.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:14:14 +02:00
Cyrill Gorcunov
3ed3f06295 x86: nmi - consolidate nmi_watchdog_default for 32bit mode
64bit mode bootstrap code does set nmi_watchdog to NMI_NONE
by default and doing the same on 32bit mode is safe too.
Such an action saves us from several #ifdef.

Btw, my previous commit

commit 19ec673ced
Author: Cyrill Gorcunov <gorcunov@gmail.com>
Date:   Wed May 28 23:00:47 2008 +0400

    x86: nmi - fix incorrect NMI watchdog used by default

did not fix the problem completely, moreover it
introduced additional bug - nmi_watchdog would be
set to either NMI_LOCAL_APIC or NMI_IO_APIC
_regardless_ to boot option if being enabled thru
/proc/sys/kernel/nmi_watchdog. Sorry for that.
Fix it too.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: macro@linux-mips.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:13:59 +02:00
Krzysztof Oledzki
74e411cb64 x86: add another PCI ID for ICH6 force hpet.
Tested on Asus P5GDC-V

$ lspci -n -n |grep ISA
00:1f.0 ISA bridge [0601]: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC Interface Bridge [8086:2640] (rev 03)

Force enabled HPET at base address 0xfed00000
hpet clockevent registered
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
hpet0: 3 64-bit timers, 14318180 Hz

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:11:35 +02:00
Huang, Ying
c45a707dbe x86: linked list of setup_data for i386
This patch adds linked list of struct setup_data supported for i386.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Huang, Ying
0c51a965ed x86: extract common part of head32.c and head64.c into head.c
This patch extracts the common part of head32.c and head64.c into head.c.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Huang, Ying
ecacf09f7d x86: reserve EFI memory map with reserve_early
This patch reserves the EFI memory map with reserve_early(). Because EFI
memory map is allocated by bootloader, if it is not reserved by
reserved_early(), it may be overwritten through address returned by
find_e820_area().

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Huang, Ying
d0ec2c6f2c x86: reserve highmem pages via reserve_early
This patch makes early reserved highmem pages become reserved
pages. This can be used for highmem pages allocated by bootloader such
as EFI memory map, linked list of setup_data, etc.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:02 +02:00
Huang, Ying
d3fbe5ea95 x86: split out common code into find_overlapped_early()
This patch clean up reserve_early() family functions by extracting the
common part of reserve_early(), free_early() and bad_addr() into
find_overlapped_early().

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: andi@firstfloor.org
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:10:01 +02:00
Pavel Machek
4f384f8bcd x86: aperture_64.c: corner case wrong
If

fix == 0, aper_enabled == 1, gart_fix_e820 == 0

	if (!fix && !aper_enabled)
		return;

	if (gart_fix_e820 && !fix && aper_enabled) {
		if (e820_any_mapped(aper_base, aper_base + aper_size,
				    E820_RAM)) {
			/* reserve it, so we can reuse it in second kernel */
			printk(KERN_INFO "update e820 for GART\n");
			add_memory_region(aper_base, aper_size, E820_RESERVED);
			update_e820();
		}
		return;
	}

	/* different nodes have different setting, disable them all atfirst*/

we'll fall back here and disable all the settings, even when they were
all consistent.

What about this? (I hope it compiles...)

Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 13:59:13 +02:00
Pavel Machek
fa5b8a30cf aperture_64.c: duplicated code, buggy?
Hi!

void __init early_gart_iommu_check(void)

contains

	for (num = 24; num < 32; num++) {
		if (!early_is_k8_nb(read_pci_config(0, num, 3, 0x00)))
			continue;

loop, with very similar loop duplicated in

void __init gart_iommu_hole_init(void)

. First copy of a loop seems to be buggy, too. It uses 0 as a "nothing
set" value, which may actually bite us in last_aper_enabled case
(because it may be often zero).

(Beware, it is hard to test this patch, because this code has about
2^8 different code paths, depending on hardware and cmdline settings).

Plus, the second loop does not check for consistency of
aper_enabled. Should it?

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 13:58:42 +02:00
Yinghai Lu
bd70e522af x86: e820 max_arch_pfn typo fix for 64 bit
should use right shift

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 23:15:33 +02:00
Suresh Siddha
870568b390 x86, fpu: fix CONFIG_PREEMPT=y corruption of application's FPU stack
Jürgen Mell reported an FPU state corruption bug under CONFIG_PREEMPT,
and bisected it to commit v2.6.19-1363-gacc2076, "i386: add sleazy FPU
optimization".

Add tsk_used_math() checks to prevent calling math_state_restore()
which can sleep in the case of !tsk_used_math(). This prevents
making a blocking call in __switch_to().

Apparently "fpu_counter > 5" check is not enough, as in some signal handling
and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.

It's a side effect though. This is the failing scenario:

process 'A' in save_i387_ia32() just after clear_used_math()

Got an interrupt and pre-empted out.

At the next context switch to process 'A' again, kernel tries to restore
the math state proactively and sees a fpu_counter > 0 and !tsk_used_math()

This results in init_fpu() during the __switch_to()'s math_state_restore()

And resulting in fpu corruption which will be saved/restored
(save_i387_fxsave and restore_i387_fxsave) during the remaining
part of the signal handling after the context switch.

Bisected-by: Jürgen Mell <j.mell@t-online.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Jürgen Mell <j.mell@t-online.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2008-06-04 16:21:24 +02:00
Pavel Machek
cd76374e9d suspend-vs-iommu: prevent suspend if we could not resume
iommu/gart support misses suspend/resume code, which can do bad stuff,
including memory corruption on resume.  Prevent system suspend in case we
would be unable to resume.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Tested-by: Patrick <ragamuffin@datacomm.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 13:11:47 +02:00
Andrew Morton
be524fb960 x86: section mismatch fix
Fix this:

 WARNING: vmlinux.o(.text+0x114bb): Section mismatch in reference from
 the function nopat() to the function .cpuinit.text:pat_disable()
 The function nopat() references
 the function __cpuinit pat_disable().
 This is often because nopat lacks a __cpuinit
 annotation or the annotation of pat_disable is wrong.

Reported-by: "Fabio Comolli" <fabio.comolli@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 13:11:47 +02:00
Venki Pallipadi
282c454cd3 x86: fix Xorg crash with xf86MapVidMem error
Clarify the usage of mtrr_lookup() in PAT code, and to make PAT code
resilient to mtrr lookup problems.

Specifically, pat_x_mtrr_type() is restructured to highlight, under what
conditions we look for mtrr hint. pat_x_mtrr_type() uses a default type
when there are any errors in mtrr lookup (still maintaining the pat
consistency). And, reserve_memtype() highlights its usage ot mtrr_lookup
for request type of '-1' and also defaults in a sane way on any mtrr
lookup failure.

pat.c looks at mtrr type of a range to get a hint on what mapping type
to request when user/API: (1) hasn't specified any type (/dev/mem
mapping) and we do not want to take performance hit by always mapping
UC_MINUS. This will be the case for /dev/mem mappings used to map BIOS
area or ACPI region which are WB'able. In this case, as long as MTRR is
not WB, PAT will request UC_MINUS for such mappings.

(2) user/API requests WB mapping while in reality MTRR may have UC or
WC. In this case, PAT can map as WB (without checking MTRR) and still
effective type will be UC or WC. But, a subsequent request to map same
region as UC or WC may fail, as the region will get trackked as WB in
PAT list. Looking at MTRR hint helps us to track based on effective type
rather than what user requested. Again, here mtrr_lookup is only used as
hint and we fallback to WB mapping (as requested by user) as default.

In both cases, after using the mtrr hint, we still go through the
memtype list to make sure there are no inconsistencies among multiple
users.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Rufus & Azrael <rufus-azrael@numericable.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 13:11:47 +02:00
Kevin Winchester
511631011d x86: fix pointer type warning in arch/x86/mm/init_64.c:early_memtest
Changed the call to find_e820_area_size to pass u64 instead of unsigned long.

Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 13:11:47 +02:00
Hugh Dickins
2884f110d5 x86: fix bad pmd ffff810000207xxx(9090909090909090)
OGAWA Hirofumi and Fede have reported rare pmd_ERROR messages:
mm/memory.c:127: bad pmd ffff810000207xxx(9090909090909090).

Initialization's cleanup_highmap was leaving alignment filler
behind in the pmd for MODULES_VADDR: when vmalloc's guard page
would occupy a new page table, it's not allocated, and then
module unload's vfree hits the bad 9090 pmd entry left over.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 13:11:47 +02:00
Ingo Molnar
226e9a93a2 x86: ioremap fix failing nesting check
Mika Kukkonen noticed that the nesting check in early_iounmap() is not
actually done.

Reported-by: Mika Kukkonen <mikukkon@srv1-m700-lanp.koti>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: torvalds@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: mikukkon@iki.fi
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 13:11:47 +02:00
Suresh Siddha
e8a496ac8c x86: fix broken math-emu with lazy allocation of fpu area
Fix the math emulation that got broken with the recent lazy allocation of FPU
area. init_fpu() need to be added for the math-emulation path aswell
for the FPU area allocation.

math emulation enabled kernel booted fine with this, in the presence
of "no387 nofxsr" boot param.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: hpa@zytor.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 13:11:46 +02:00
Steven Rostedt
5c1ea08215 x86: enable preemption in delay
The RT team has been searching for a nasty latency. This latency shows
up out of the blue and has been seen to be as big as 5ms!

Using ftrace I found the cause of the latency.

   pcscd-2995  3dNh1 52360300us : irq_exit (smp_apic_timer_interrupt)
   pcscd-2995  3dN.2 52360301us : idle_cpu (irq_exit)
   pcscd-2995  3dN.2 52360301us : rcu_irq_exit (irq_exit)
   pcscd-2995  3dN.1 52360771us : smp_apic_timer_interrupt (apic_timer_interrupt
)
   pcscd-2995  3dN.1 52360771us : exit_idle (smp_apic_timer_interrupt)

Here's an example of a 400 us latency. pcscd took a timer interrupt and
returned with "need resched" enabled, but did not reschedule until after
the next interrupt came in at 52360771us 400us later!

At first I thought we somehow missed a preemption check in entry.S. But
I also noticed that this always seemed to happen during a __delay call.

   pcscd-2995  3dN.2 52360836us : rcu_irq_exit (irq_exit)
   pcscd-2995  3.N.. 52361265us : preempt_schedule (__delay)

Looking at the x86 delay, I found my problem.

In git commit 35d5d08a08, Andrew Morton
placed preempt_disable around the entire delay due to TSC's not working
nicely on SMP.  Unfortunately for those that care about latencies this
is devastating! Especially when we have callers to mdelay(8).

Here I enable preemption during the loop and account for anytime the task
migrates to a new CPU. The delay asked for may be extended a bit by
the migration, but delay only guarantees that it will delay for that minimum
time. Delaying longer should not be an issue.

[
  Thanks to Thomas Gleixner for spotting that cpu wasn't updated,
    and to place the rep_nop between preempt_enabled/disable.
]

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: akpm@osdl.org
Cc: Clark Williams <clark.williams@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
Cc: Gregory Haskins <ghaskins@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi-suse@firstfloor.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 13:11:46 +02:00
Ingo Molnar
deef325086 x86: disable preemption in native_smp_prepare_cpus
Priit Laes reported the following warning:

Call Trace:
 [<ffffffff8022f1e1>] warn_on_slowpath+0x51/0x63
 [<ffffffff80282e48>] sys_ioctl+0x2d/0x5d
 [<ffffffff805185ff>] _spin_lock+0xe/0x24
 [<ffffffff80227459>] task_rq_lock+0x3d/0x73
 [<ffffffff805133c3>] set_cpu_sibling_map+0x336/0x350
 [<ffffffff8021c1b8>] read_apic_id+0x30/0x62
 [<ffffffff806d921d>] verify_local_APIC+0x90/0x138
 [<ffffffff806d84b5>] native_smp_prepare_cpus+0x1f9/0x305
 [<ffffffff806ce7b1>] kernel_init+0x59/0x2d9
 [<ffffffff80518a26>] _spin_unlock_irq+0x11/0x2b
 [<ffffffff8020bf48>] child_rip+0xa/0x12
 [<ffffffff806ce758>] kernel_init+0x0/0x2d9
 [<ffffffff8020bf3e>] child_rip+0x0/0x12

fix this by generally disabling preemption in native_smp_prepare_cpus().

Reported-and-bisected-by: Priit Laes <plaes@plaes.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 13:11:46 +02:00
Yinghai Lu
fb3bbd6a66 x86: fix APIC warning on 32bit v2
for http://bugzilla.kernel.org/show_bug.cgi?id=10613

BIOS bug, APIC version is 0 for CPU#0! fixing up to 0x10. (tell your hw vendor)

v2: fix 64 bit compilation

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Gabriel C <nix.or.die@googlemail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 13:11:46 +02:00
Yinghai Lu
7b2a0a6c48 x86: make 32-bit use e820_register_active_regions()
this way 32-bit is more similar to 64-bit, and smarter e820 and numa.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:58 +02:00
Yinghai Lu
ee0c80fadf x86: move e820_register_active() to e820.c
to prepare 32-bit to use it.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:57 +02:00
Yinghai Lu
84b56fa46b x86, numa, 32-bit: make sure get we kva space
when 1/3 user/kernel split is used, and less memory is installed, or if
we have a big hole below 4g, max_low_pfn is still using 3g-128m

try to go down from max_low_pfn until we get it. otherwise will panic.

need to make 32-bit code to use register_e820_active_regions ... later.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:55 +02:00
Yinghai Lu
ab530e1f78 x86: early check if a system is numaq
so we could fall back to one node numa.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:06 +02:00
Yinghai Lu
f19dc2f22a x86: change propagate_e820_map() back to find_max_pfn(), 32-bit, fix
add memory_present() calls for sparse and non-numa.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:00:38 +02:00
Hiroshi Shimamoto
d44b9d17fa x86: move bugs_64.c to cpu/bugs_64.c
It looks good to move bugs_64.c to cpu/bugs_64.c.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-06-03 14:43:00 -07:00
Yinghai Lu
c8c034ce79 x86: clean up max_pfn_mapped usage - 64-bit
on 64-bit we only get valid max_pfn_mapped after init_memory_mapping().

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:29 +02:00
Yinghai Lu
6af61a7614 x86: clean up max_pfn_mapped usage - 32-bit
on 32-bit in head_32.S after initial page table is done, we get initial
max_pfn_mapped, and then kernel_physical_mapping_init will give us
a final one.

We need to use that to make sure find_e820_area will get valid addresses
for boot_map and for NODE_DATA(0) on numa32.

XEN PV and lguest may need to assign max_pfn_mapped too.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:28 +02:00
Yinghai Lu
287572cb38 x86, numa, 32-bit: avoid clash between ramdisk and kva
use find_e820_area to get address space...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:28 +02:00
Yinghai Lu
2944e16b25 x86: update mptable
make mptable to be consistent with acpi routing, so we could:

1. kexec kernel with acpi=off
2. work around BIOSes where acpi routing is working, but mptable is
   not right, so can use kernel/kexec to start other OSes that don't have
   good acpi support.

command line: update_mptable

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:27 +02:00
Yinghai Lu
e8c27ac919 x86, numa, 32-bit: print out debug info on all kvas
also fix the print out of node_remap_end_vaddr

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:26 +02:00
Yinghai Lu
0596152388 x86, 32-bit: change propagate_e820_map() back to find_max_pfn()
we don't need to call memory_present that early.
numa and sparse will call memory_present later and might
even fail, it will call memory_present for the full range.

also for sparse it will call alloc_bootmem ... before we set up bootmem.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:25 +02:00
Yinghai Lu
b66cd72073 x86: set node_remap_size[0] in fallback path
... otherwise alloc_remap will not get node_mem_map from kva area, and
alloc_node_mem_map has to alloc_bootmem_node to get mem_map.
It will use two low address copies ...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:25 +02:00
Yinghai Lu
ba924c81dd x86, numa, 32-bit: increase max_elements to 1024
so every element will represent 64M instead of 256M.

AMD opteron could have HW memory hole remapping, so could have
[0, 8g + 64M) on node0. Reduce element size to 64M to keep that on node 0

Later we need to use find_e820_area() to allocate memory_node_map like
on 64-bit. But need to move memory_present out of populate_mem_map...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:24 +02:00
Ingo Molnar
471b3c1b01 x86, numaq 32-bit: build fix
fix:

drivers/built-in.o: In function `acpi_numa_init':
: undefined reference to `acpi_numa_arch_fixup'

which can happen with ACPI && NUMAQ.
2008-06-03 12:49:06 +02:00
Ingo Molnar
cf3d0cb8a1 x86: 32-bit numa, build fix
on Summit it's possible to have:

 CONFIG_ACPI_SRAT=y
 CONFIG_HAVE_ARCH_PARSE_SRAT=y

in which case acpi.h defines the acpi_numa_slit_init() and
acpi_numa_processor_affinity_init() methods as a macro.
2008-06-03 11:45:09 +02:00
Ingo Molnar
f329469097 x86, numaq: add pci_acpi_scan_root() stub
allow 32-bit numaq build to succeed with ACPI enabled.
2008-06-03 10:24:01 +02:00
Ingo Molnar
b28852d670 x86: add dummy acpi_numa_processor_affinity_init() implementation on 32-bit
allow CONFIG_ACPI_NUMA builds to succeed.
2008-06-03 10:23:53 +02:00
Ingo Molnar
2772f54bf3 x86: add acpi_numa_slit_init() dummy implementation on 32-bit
allow CONFIG_ACPI_NUMA builds to succeed on 32-bit.
2008-06-03 10:23:48 +02:00
Ingo Molnar
3d1ba1da2b x86: fix nmi.c build bug
apic.h needs to be included for the apic_write_around() definition.
2008-06-02 13:57:12 +02:00
Jeremy Fitzhardinge
7e0edc1bc3 xen: add new Xen elfnote types and use them appropriately
Define recently added XEN_ELFNOTEs, and use them appropriately.
Most significantly, this enables domain checkpointing (xm save -c).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 13:25:51 +02:00
Jeremy Fitzhardinge
d07af1f0e3 xen: resume timers on all vcpus
On resume, the vcpu timer modes will not be restored.  The timer
infrastructure doesn't do this for us, since it assumes the cpus
are offline.  We can just poke the other vcpus into the right mode
directly though.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 13:25:44 +02:00
Jeremy Fitzhardinge
9c7a794209 xen: restore vcpu_info mapping
If we're using vcpu_info mapping, then make sure its restored on all
processors before relasing them from stop_machine.

The only complication is that if this fails, we can't continue because
we've already made assumptions that the mapping is available (baked in
calls to the _direct versions of the functions, for example).

Fortunately this can only happen with a 32-bit hypervisor, which may
possibly run out of mapping space.  On a 64-bit hypervisor, this is a
non-issue.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 13:25:34 +02:00
Jeremy Fitzhardinge
e2426cf85f xen: avoid hypercalls when updating unpinned pud/pmd
When operating on an unpinned pagetable (ie, one under construction or
destruction), it isn't necessary to use a hypercall to update a
pud/pmd entry.  Jan Beulich observed that a similar optimisation
avoided many thousands of hypercalls while doing a kernel build.

One tricky part is that early in the kernel boot there's no page
structure, so we can't check to see if the page is pinned.  In that
case, we just always use the hypercall.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 13:24:40 +02:00
Ingo Molnar
15ce60056b xen: export get_phys_to_machine
-tip testing found the following xen-console symbols trouble:

  ERROR: "get_phys_to_machine" [drivers/video/xen-fbfront.ko] undefined!
  ERROR: "get_phys_to_machine" [drivers/net/xen-netfront.ko] undefined!
  ERROR: "get_phys_to_machine" [drivers/input/xen-kbdfront.ko] undefined!

with:

  http://redhat.com/~mingo/misc/config-Mon_Jun__2_12_25_13_CEST_2008.bad
2008-06-02 13:20:11 +02:00
Pavel Machek
f529626a86 suspend-vs-iommu: prevent suspend if we could not resume
iommu/gart support misses suspend/resume code, which can do bad stuff,
including memory corruption on resume.  Prevent system suspend in case we
would be unable to resume.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Tested-by: Patrick <ragamuffin@datacomm.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 13:02:48 +02:00
Jack Steiner
9f5314fb4d x86, uv: update macros used by UV platform
Update the UV address macros to better describe the
fields of UV physical addresses. Improve comments
in the header files. Add additional MMR definitions.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 12:56:00 +02:00
Hiroshi Shimamoto
88ff0a474e x86: coding style fixes for nmi.c
before
	total: 1 errors, 6 warnings, 534 lines checked
after
	total: 0 errors, 1 warnings, 532 lines checked

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 12:53:54 +02:00
Ingo Molnar
1a5726528a fix build bug in "x86: add PCI extended config space access for AMD Barcelona" 2008-06-02 12:26:21 +02:00
Miquel van Smoorenburg
db9f600b96 x86: pci-dma.c: use __GFP_NO_OOM instead of __GFP_NORETRY
On Wed, 2008-05-28 at 04:47 +0200, Andi Kleen wrote:
> > So...  why not just remove the setting of __GFP_NORETRY?  Why is it
> > wrong to oom-kill things in this case?
>
> When the 16MB zone overflows (which can be common in some workloads)
> calling the OOM killer is pretty useless because it has barely any
> real user data [only exception would be the "only 16MB" case Alan
> mentioned]. Killing random processes in this case is bad.
>
> I think for 16MB __GFP_NORETRY is ok because there should be
> nothing freeable in there so looping is useless. Only exception would be the
> "only 16MB total" case again but I'm not sure 2.6 supports that at all
> on x86.
>
> On the other hand d_a_c() does more allocations than just 16MB, especially
> on 64bit and the other zones need different strategies.

Okay, so how about this then ?

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 12:14:58 +02:00
Pavel Machek
c46e62f735 i8259: fix final ugliness
Introduce IRQx_VECTOR on 32-bit, so that #ifdef noise is kept
down. There should be no object code change.

[ mingo@elte.hu: merged to x86/irq not x86/i8259 due to x86/irq having
  restructured the vector code into asm-x86/irq_vectors.h, which this
  patch touches. ]

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 11:55:52 +02:00
Robert Richter
831d991821 x86: add PCI extended config space access for AMD Barcelona
This patch implements PCI extended configuration space access for
AMD's Barcelona CPUs. It extends the method using CF8/CFC IO
addresses. An x86 capability bit has been introduced that is set for
CPUs supporting PCI extended config space accesses.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 11:51:19 +02:00
Bertram Felgenhauer
75b19b790b pci, x86: add workaround for bug in ASUS A7V600 BIOS (rev 1005)
This BIOS claims the VIA 8237 south bridge to be compatible with VIA 586,
which it is not.

Without this patch, I get the following warning while booting,
among others,

| PCI: Using IRQ router VIA [1106/3227] at 0000:00:11.0
| ------------[ cut here ]------------
| WARNING: at arch/x86/pci/irq.c:265 pirq_via586_get+0x4a/0x60()
| Modules linked in:
| Pid: 1, comm: swapper Not tainted 2.6.26-rc4-00015-g1ec7d99 #1
|  [<c0119fd4>] warn_on_slowpath+0x54/0x70
|  [<c02246e0>] ? vt_console_print+0x210/0x2b0
|  [<c02244d0>] ? vt_console_print+0x0/0x2b0
|  [<c011a413>] ? __call_console_drivers+0x43/0x60
|  [<c011a482>] ? _call_console_drivers+0x52/0x80
|  [<c011aa89>] ? release_console_sem+0x1c9/0x200
|  [<c0291d21>] ? raw_pci_read+0x41/0x70
|  [<c0291e8f>] ? pci_read+0x2f/0x40
|  [<c029151a>] pirq_via586_get+0x4a/0x60
|  [<c02914d0>] ? pirq_via586_get+0x0/0x60
|  [<c029178d>] pcibios_lookup_irq+0x15d/0x430
|  [<c03b895a>] pcibios_irq_init+0x17a/0x3e0
|  [<c03a66f0>] ? kernel_init+0x0/0x250
|  [<c03a6763>] kernel_init+0x73/0x250
|  [<c03b87e0>] ? pcibios_irq_init+0x0/0x3e0
|  [<c0114d00>] ? schedule_tail+0x10/0x40
|  [<c0102dee>] ? ret_from_fork+0x6/0x1c
|  [<c03a66f0>] ? kernel_init+0x0/0x250
|  [<c03a66f0>] ? kernel_init+0x0/0x250
|  [<c010324b>] kernel_thread_helper+0x7/0x1c
|  =======================
| ---[ end trace 4eaa2a86a8e2da22 ]---

and IRQ trouble later,

| irq 10: nobody cared (try booting with the "irqpoll" option)

Now that's an VIA 8237 chip, so pirq_via586_get shouldn't be called
at all; adding this workaround to via_router_probe() fixes the
problem for me.

Amazingly I have a 2.6.23.8 kernel that somehow works fine ... I'll
never understand why.

Signed-off-by: Bertram Felgenhauer <int-e@gmx.de>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 11:29:10 +02:00
Yinghai Lu
9a73aa81ff x86: 32bit numa srat fix early_ioremap leak
on two node system (16g RAM) with numa config I got this crash:

get_memcfg_from_srat: assigning address to rsdp
RSD PTR  v0 [ACPIAM]
ACPI: Too big length in RSDT: 92
failed to get NUMA memory information from SRAT table
NUMA - single node, flat memory mode
Node: 0, start_pfn: 0, end_pfn: 153
 Setting physnode_map array to node 0 for pfns:
 0
...
Pid: 0, comm: swapper Not tainted 2.6.26-rc4 #4
 [<80b41289>] hlt_loop+0x0/0x3
 [<8011efa0>] ? alloc_remap+0x50/0x70
 [<8079e32e>] alloc_node_mem_map+0x5e/0xa0
 [<8012e77b>] ? printk+0x1b/0x20
 [<80b590f6>] free_area_init_node+0xc6/0x470
 [<80b588fc>] ? __alloc_bootmem_node+0x2c/0x50
 [<80b58ad8>] ? find_min_pfn_for_node+0x38/0x70
 [<8012e77b>] ? printk+0x1b/0x20
 [<80b597c4>] free_area_init_nodes+0x254/0x2d0
 [<80b544d7>] zone_sizes_init+0x97/0xa0
 [<80b48a03>] setup_arch+0x383/0x530
 [<8012e77b>] ? printk+0x1b/0x20
 [<80b41aa4>] start_kernel+0x64/0x350
 [<80b412d8>] i386_start_kernel+0x8/0x10
 =======================

this patch increases the acpi table limit to 32.
Also match early_ioremap() with early_iounmap().

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:56 +02:00
Yinghai Lu
a5481280b2 x86: extend e820 early_res support 32bit -fix #5
reserve early numa kva, so it will not clash with new RAMDISK

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:53 +02:00
Yinghai Lu
163872950d x86: extend e820 early_res support 32bit -fix #4
reserve_early pgdata for 32bit numa

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:50 +02:00
Yinghai Lu
f0d43100f1 x86: extend e820 early_res support 32bit -fix #3
introduce init_pg_table_start, so xen PV could specify the value.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:47 +02:00
Kristian Høgsberg
3b6b9293d0 x86: Honor 'quiet' command line option in real mode boot decompressor.
This patch lets the early real mode code look for the 'quiet' option
on the kernel command line and pass a loadflag to the decompressor.
When this flag is set, we suppress the "Decompressing Linux... Parsing
ELF... done." messages.

Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-05-30 17:00:47 -07:00
Kristian Høgsberg
23968f71b2 x86: Use structs instead of hardcoded offsets in x86 boot decompressor.
Replace hardcoded offsets embedded in macros in
arch/x86/boot/compressed with proper structure references.

Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-05-30 17:00:34 -07:00
H. Peter Anvin
1c47cd638e x86: fix overlong line in arch/x86/kernel/cpu/amd_64.c
Clean up an overlong line in arch/x86/kernel/cpu/amd_64.c.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-05-30 15:46:30 -07:00
Dave Jones
30a713180b x86: Move the 64-bit Centaur specific parts out of setup_64.c
Create a separate centaur_64.c file in the cpu/ dir for
the useful parts to live in.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-05-30 15:46:30 -07:00
Dave Jones
7e2191127e x86: Remove workaround for prescott (32bit P4) from 64-bit code.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-05-30 15:46:29 -07:00
Dave Jones
a82fbe31cb x86: Move the 64-bit Intel specific parts out of setup_64.c
Create a separate intel_64.c file in the cpu/ dir for
the useful parts to live in.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-05-30 15:46:29 -07:00
Dave Jones
4d28587856 x86: Move the AMD64 specific parts out of setup_64.c
Create a separate amd_64.c file in the cpu/ dir for
the useful parts to live in.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-05-30 15:46:29 -07:00
Thomas Gleixner
d364319b98 x86: move mmconfig declarations to header
arch/x86/kernel/mmconf-fam10h_64.c is missing the prototypes, which
are decalred in arch/x86/kernel/setup_64.c. Move the prototypes and
the inline stubs to the appropriate header file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-30 15:45:17 -07:00
Rusty Russell
a16ffe93c4 lguest: fix ugly <NULL> in /proc/interrupts
Before:
	root@ubuntu:~# cat /proc/interrupts
	           CPU0
	  1:       1672    lguest-<NULL>    virtio0
	  2:          1    lguest-<NULL>    virtio1
	  ...
After:
	root@ubuntu:~# cat /proc/interrupts
	           CPU0
	  1:       2889    lguest-level     virtio0
	  2:          9    lguest-level     virtio1

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-30 15:09:43 +10:00
Cyrill Gorcunov
19ec673ced x86: nmi - fix incorrect NMI watchdog used by default
The commit

	commit 4b82b27770
	Author: Cyrill Gorcunov <gorcunov@gmail.com>
	Date:   Sat May 24 19:36:35 2008 +0400

set nmi_watchdog to NMI_IO_APIC as by default. This causes hangs on some
machines with buggy watchdogs. Fix it - i.e. restore old behaviour.

Thanks to Sitsofe Wheeler and Adrian Bunk for catching the problem
and Maciej W. Rozycki for explanation what is going on there.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
CC: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-28 21:04:53 +02:00
Dave Jones
c7d624d1ee x86: Fix up silly i1586 boot message.
Trying to boot a 64-bit kernel on a 32bit Pentium 4 gets
you an amusing message along the lines of.
"you need an x86-64, but you only have an i1586"
due to the P4 being family F.  Munge it to be 686.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-05-28 10:34:12 -07:00
Ingo Molnar
b20aeccd6a xen: fix early bootup crash on native hardware
-tip tree auto-testing found the following early bootup hang:

-------------->
get_memcfg_from_srat: assigning address to rsdp
RSD PTR  v0 [Nvidia]
BUG: Int 14: CR2 ffd00040
     EDI 8092fbfe  ESI ffd00040  EBP 80b0aee8  ESP 80b0aed0
     EBX 000f76f0  EDX 0000000e  ECX 00000003  EAX ffd00040
     err 00000000  EIP 802c055a   CS 00000060  flg 00010006
Stack: ffd00040 80bc78d0 80b0af6c 80b1dbfe 8093d8ba 00000008 80b42810 80b4ddb4
       80b42842 00000000 80b0af1c 801079c8 808e724e 00000000 80b42871 802c0531
       00000100 00000000 0003fff0 80b0af40 80129999 00040100 00040100 00000000
Pid: 0, comm: swapper Not tainted 2.6.26-rc4-sched-devel.git #570
 [<802c055a>] ? strncmp+0x11/0x25
 [<80b1dbfe>] ? get_memcfg_from_srat+0xb4/0x568
 [<801079c8>] ? mcount_call+0x5/0x9
 [<802c0531>] ? strcmp+0xa/0x22
 [<80129999>] ? printk+0x38/0x3a
 [<80129999>] ? printk+0x38/0x3a
 [<8011b122>] ? memory_present+0x66/0x6f
 [<80b216b4>] ? setup_memory+0x13/0x40c
 [<80b16b47>] ? propagate_e820_map+0x80/0x97
 [<80b1622a>] ? setup_arch+0x248/0x477
 [<80129999>] ? printk+0x38/0x3a
 [<80b11759>] ? start_kernel+0x6e/0x2eb
 [<80b110fc>] ? i386_start_kernel+0xeb/0xf2
 =======================
<------

with this config:

   http://redhat.com/~mingo/misc/config-Wed_May_28_01_33_33_CEST_2008.bad

The thing is, the crash makes little sense at first sight. We crash on a
benign-looking printk. The code around it got changed in -tip but
checking those topic branches individually did not reproduce the bug.

Bisection led to this commit:

|   d5edbc1f75 is first bad commit
|   commit d5edbc1f75
|   Author: Jeremy Fitzhardinge <jeremy@goop.org>
|   Date:   Mon May 26 23:31:22 2008 +0100
|
|   xen: add p2m mfn_list_list

Which is somewhat surprising, as on native hardware Xen client side
should have little to no side-effects.

After some head scratching, it turns out the following happened:
randconfig enabled the following Xen options:

  CONFIG_XEN=y
  CONFIG_XEN_MAX_DOMAIN_MEMORY=8
  # CONFIG_XEN_BLKDEV_FRONTEND is not set
  # CONFIG_XEN_NETDEV_FRONTEND is not set
  CONFIG_HVC_XEN=y
  # CONFIG_XEN_BALLOON is not set

which activated this piece of code in arch/x86/xen/mmu.c:

> @@ -69,6 +69,13 @@
>  	__attribute__((section(".data.page_aligned"))) =
>  		{ [ 0 ... TOP_ENTRIES - 1] = &p2m_missing[0] };
>
> +/* Arrays of p2m arrays expressed in mfns used for save/restore */
> +static unsigned long p2m_top_mfn[TOP_ENTRIES]
> +	__attribute__((section(".bss.page_aligned")));
> +
> +static unsigned long p2m_top_mfn_list[TOP_ENTRIES / P2M_ENTRIES_PER_PAGE]
> +	__attribute__((section(".bss.page_aligned")));

The problem is, you must only put variables into .bss.page_aligned that
have a _size_ that is _exactly_ page aligned. In this case the size of
p2m_top_mfn_list is not page aligned:

 80b8d000 b p2m_top_mfn
 80b8f000 b p2m_top_mfn_list
 80b8f008 b softirq_stack
 80b97008 b hardirq_stack
 80b9f008 b bm_pte

So all subsequent variables get unaligned which, depending on luck,
breaks the kernel in various funny ways. In this case what killed the
kernel first was the misaligned bootmap pte page, resulting in that
creative crash above.

Anyway, this was a fun bug to track down :-)

I think the moral is that .bss.page_aligned is a dangerous construct in
its current form, and the symptoms of breakage are very non-trivial, so
i think we need build-time checks to make sure all symbols in
.bss.page_aligned are truly page aligned.

The Xen fix below gets the kernel booting again.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-28 14:32:06 +02:00
Pavel Machek
dd564d0cf0 x86: aperture_64.c: cleanups
Some small cleanups for aperture_64.c; they should not really change
any code.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 18:03:56 +02:00
Yinghai Lu
3945e2c9ab x86: extend e820 ealy_res support 32bit - fix #2
remove extra -1 in reseve_early calling
    panic if can not find space for new RAMDISK

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:20:12 +02:00
Jeremy Fitzhardinge
359cdd3f86 xen: maintain clock offset over save/restore
Hook into the device model to make sure that timekeeping's resume handler
is called.  This deals with our clocksource's non-monotonicity over the
save/restore.  Explicitly call clock_has_changed() to make sure that
all the timers get retriggered properly.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:38 +02:00
Jeremy Fitzhardinge
0e91398f2a xen: implement save/restore
This patch implements Xen save/restore and migration.

Saving is triggered via xenbus, which is polled in
drivers/xen/manage.c.  When a suspend request comes in, the kernel
prepares itself for saving by:

1 - Freeze all processes.  This is primarily to prevent any
    partially-completed pagetable updates from confusing the suspend
    process.  If CONFIG_PREEMPT isn't defined, then this isn't necessary.

2 - Suspend xenbus and other devices

3 - Stop_machine, to make sure all the other vcpus are quiescent.  The
    Xen tools require the domain to run its save off vcpu0.

4 - Within the stop_machine state, it pins any unpinned pgds (under
    construction or destruction), performs canonicalizes various other
    pieces of state (mostly converting mfns to pfns), and finally

5 - Suspend the domain

Restore reverses the steps used to save the domain, ending when all
the frozen processes are thawed.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:38 +02:00
Jeremy Fitzhardinge
d5edbc1f75 xen: add p2m mfn_list_list
When saving a domain, the Xen tools need to remap all our mfns to
portable pfns.  In order to remap our p2m table, it needs to know
where all its pages are, so maintain the references to the p2m table
for it to use.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:37 +02:00
Jeremy Fitzhardinge
a0d695c821 xen: make dummy_shared_info non-static
Rename dummy_shared_info to xen_dummy_shared_info and make it
non-static, in anticipation of users outside of enlighten.c

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:37 +02:00
Jeremy Fitzhardinge
cf0923ea29 xen: efficiently support a holey p2m table
When using sparsemem and memory hotplug, the kernel's pseudo-physical
address space can be discontigious.  Previously this was dealt with by
having the upper parts of the radix tree stubbed off.  Unfortunately,
this is incompatible with save/restore, which requires a complete p2m
table.

The solution is to have a special distinguished all-invalid p2m leaf
page, which we can point all the hole areas at.  This allows the tools
to see a complete p2m table, but it only costs a page for all memory
holes.

It also simplifies the code since it removes a few special cases.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:37 +02:00
Jeremy Fitzhardinge
8006ec3e91 xen: add configurable max domain size
Add a config option to set the max size of a Xen domain.  This is used
to scale the size of the physical-to-machine array; it ends up using
around 1 page/GByte, so there's no reason to be very restrictive.

For a 32-bit guest, the default value of 8GB is probably sufficient;
there's not much point in giving a 32-bit machine much more memory
than that.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:37 +02:00
Jeremy Fitzhardinge
d451bb7aa8 xen: make phys_to_machine structure dynamic
We now support the use of memory hotplug, so the physical to machine
page mapping structure must be dynamic.  This is implemented as a
two-level radix tree structure, which allows us to efficiently
incrementally allocate memory for the p2m table as new pages are
added.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:37 +02:00
Jeremy Fitzhardinge
38bb5ab417 xen: count resched interrupts properly
Make sure resched interrupts appear in /proc/interrupts in the proper
place.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:37 +02:00
Isaku Yamahata
ec9b2065d4 xen: Move manage.c to drivers/xen for ia64/xen support
move arch/x86/xen/manage.c under drivers/xen/to share codes
with x86 and ia64.
ia64/xen also uses manage.c

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:36 +02:00
Jeremy Fitzhardinge
83abc70a4c xen: make earlyprintk=xen work again
For some perverse reason, if you call add_preferred_console() it prevents
setup_early_printk() from successfully enabling the boot console -
unless you make it a preferred console too...

Also, make xenboot console output distinct from normal console output,
since it gets repeated when the console handover happens, and the
duplicated output is confusing without disambiguation.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
2008-05-27 10:11:36 +02:00
Markus Armbruster
9e124fe16f xen: Enable console tty by default in domU if it's not a dummy
Without console= arguments on the kernel command line, the first
console to register becomes enabled and the preferred console (the one
behind /dev/console).  This is normally tty (assuming
CONFIG_VT_CONSOLE is enabled, which it commonly is).

This is okay as long tty is a useful console.  But unless we have the
PV framebuffer, and it is enabled for this domain, tty0 in domU is
merely a dummy.  In that case, we want the preferred console to be the
Xen console hvc0, and we want it without having to fiddle with the
kernel command line.  Commit b8c2d3dfbc
did that for us.

Since we now have the PV framebuffer, we want to enable and prefer tty
again, but only when PVFB is enabled.  But even then we still want to
enable the Xen console as well.

Problem: when tty registers, we can't yet know whether the PVFB is
enabled.  By the time we can know (xenstore is up), the console setup
game is over.

Solution: enable console tty by default, but keep hvc as the preferred
console.  Change the preferred console to tty when PVFB probes
successfully, unless we've been given console kernel parameters.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:36 +02:00
Jeremy Fitzhardinge
a15af1c9ea x86/paravirt: add pte_flags to just get pte flags
Add pte_flags() to extract the flags from a pte.  This is a special
case of pte_val() which is only guaranteed to return the pte's flags
correctly; the page number may be corrupted or missing.

The intent is to allow paravirt implementations to return pte flags
without having to do any translation of the page number (most notably,
Xen).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:36 +02:00
Jeremy Fitzhardinge
239d1fc04e xen: don't worry about preempt during xen_irq_enable()
When enabling interrupts, we don't need to worry about preemption,
because we either enter with interrupts disabled - so no preemption -
or the caller is confused and is re-enabling interrupts on some
indeterminate processor.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:35 +02:00
Jeremy Fitzhardinge
2956a3511c xen: allow some cr4 updates
The guest can legitimately change things like cr4.OSFXSR and
OSXMMEXCPT, so let it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:35 +02:00
Jeremy Fitzhardinge
349c709f42 xen: use new sched_op
Use the new sched_op hypercall, mainly because xenner doesn't support
the old one.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:35 +02:00
Jeremy Fitzhardinge
7b1333aa4c xen: use hypercall rather than clts
Xen will trap and emulate clts, but its better to use a hypercall.
Also, xenner doesn't handle clts.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:35 +02:00
Sam Ravnborg
73531905ed Kconfig: introduce ARCH_DEFCONFIG to DEFCONFIG_LIST
init/Kconfig contains a list of configs that are searched
for if 'make *config' are used with no .config present.
Extend this list to look at the config identified by
ARCH_DEFCONFIG.

With this change we now try the defconfig targets last.

This fixes a regression reported
by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2008-05-25 23:03:18 +02:00
Cyrill Gorcunov
1798bc22b2 x86: nmi_32/64.c - merge down nmi_32.c and nmi_64.c to nmi.c
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 22:32:51 +02:00
Cyrill Gorcunov
fd5cea02de x86: nmi_32/64.c - add helper functions to hide arch specific data
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 22:32:51 +02:00
Cyrill Gorcunov
7c2ba83f9a x86: nmi_32.c cleanup - use for_each_online_cpu helper
Since cpu_online_map is touched (by for_each_online_cpu)
at moment when cpu_callin_map is already filled up we can
get rid of its checking at all

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 22:32:51 +02:00
Cyrill Gorcunov
96f9dcb107 x86: nmi_64.c - use for_each_possible_cpu helper
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 22:32:51 +02:00
Cyrill Gorcunov
6c8decdf14 x86: nmi_32.c - unknown_nmi_panic_callback should always panic
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 22:32:51 +02:00
Cyrill Gorcunov
ad63ba169d x86: nmi_32/64.c - use apic_write_around instead of apic_write
apic_write_around will be expanded to apic_write in 64bit mode
anyway. Only a few CPUs (well, old CPUs to be precise) requires
such an action. In general it should not hurt and could be cleaned
up for apic_write (just in case)

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 22:32:51 +02:00
Cyrill Gorcunov
4b82b27770 x86: nmi_32.c - add nmi_watchdog_default helper
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 22:32:51 +02:00
Cyrill Gorcunov
d1b946b97d x86: nmi_32.c - add "panic" option
Allow to pass "panic" option in 32bit mode

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 22:32:50 +02:00
Cyrill Gorcunov
c6425b9f14 x86: move do_nmi(), stop_nmi() and restart_nmi() to traps_64.c
traps_32.c already holds these functions so do the same for traps_64.c

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 22:32:50 +02:00
Cyrill Gorcunov
e56b3a12c4 x86: nmi - die_nmi() output message unification
Make 64bit die_nmi() to produce the same message as 32bit mode has

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 22:32:50 +02:00
Cyrill Gorcunov
ddca03c98a x86: nmi - unify die_nmi() interface
By slightly changing 32bit mode die_nmi() we may unify the
interface and make it common for both (32/64bit) modes

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: hpa@zytor.com
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 22:32:50 +02:00
Thomas Gleixner
85cc35fa72 x86: fix mpparse fallout
UP builds with LOCAL_APIC=y and IO_APIC=n fail with a missing
reference to mp_bus_not_pci. Distangle the mpparse code some more and
move the ioapic specific bus check into a separate function.

This code needs sume urgent un#ifdef surgery all over the place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 21:21:36 +02:00
Alan Cox
b764a15f67 x86: Switch apm to unlocked_kernel
This pushes the lock a fair way down and the final kill looks like it
should be an easy project for someone who wants to have a shot at it.

Signed-off-by: Alan Cox <alan@redhat.com>
Cc: mingo@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 12:03:27 +02:00
Alexey Starikovskiy
136ef671df x86: allow MPPARSE to be deselected in SMP configs 2008-05-25 12:01:26 +02:00
Alexey Starikovskiy
8732fc4b23 x86: move mp_bus_not_pci from mpparse.c 2008-05-25 12:01:26 +02:00
Alexey Starikovskiy
ce6444d39f x86: mp_bus_id_to_pci_bus is not needed 2008-05-25 12:01:25 +02:00
Alexey Starikovskiy
bab4b27c00 x86: move smp_found_config 2008-05-25 12:01:25 +02:00
Alexey Starikovskiy
f391835290 x86: move pic_mode to apic_32.c 2008-05-25 12:01:25 +02:00
Alexey Starikovskiy
b3e2416465 x86: Set pic_mode only if local apic code is present 2008-05-25 12:01:25 +02:00
Yinghai Lu
bf62f3981c x86: move e820_mark_nosave_regions to e820.c
and make e820_mark_nosave_regions to take limit_pfn to use max_low_pfn
for 32bit and end_pfn for 64bit

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 11:35:53 +02:00
Alexey Starikovskiy
aafbdf71f1 x86: fix mpparse/acpi interaction
Sitsofe Wheeler reported boot problems on linux-next.

It looks like the same issue as found by Soeren Sandman in 7575217f656a93,
"x86: initialize all fields of mp_irqs[mp_irq_entries]".

But his fix is also not complete, as dstapic is used before it assigned.

Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Bisected-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:55:13 +02:00
Soeren Sandmann
59f4519ad7 x86: initialize all fields of mp_irqs[mp_irq_entries]
Commit "x86: make config_irqsrc not MPspec specific" introduced some uses
of uninitialized fields in mp_config_acpi_legacy_irqs(). I need the
following patch to get sched-devel/master to boot.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:55:13 +02:00
Alexey Starikovskiy
2fddb6e28e x86: make config_irqsrc not MPspec specific
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:13 +02:00
Alexey Starikovskiy
ec2cd0a22e x86: make struct config_ioapic not MPspec specific
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:13 +02:00
Alexey Starikovskiy
5f8951487d x86: make mp_ioapic_routing definition local
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:12 +02:00
Alexey Starikovskiy
11113f84c7 x86: complete move ACPI from mpparse.c
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:12 +02:00
Alexey Starikovskiy
32c5061265 x86: move es7000_plat out of mpparse.c
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:12 +02:00
Yinghai Lu
11a62a0560 x86: cleanup print out for mptable
the new output is:

 MPTABLE: OEM ID: SUN
 MPTABLE: Product ID: 4600 M2
 MPTABLE: APIC at: 0x

instead of it all in one line with <6> and double Product ID...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:55:12 +02:00
Thomas Gleixner
4a139a7fde x86: include pci.h in e820_64.c
global pci_mem_start needs a declaration. include pci.h

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:12 +02:00
Thomas Gleixner
a91eea6df3 x86: fix shadow variables of global end_pnf in e820_64.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:12 +02:00
Thomas Gleixner
7f028bc0fd x86: move mp_ioapic_routing to mpparse and make it static
mpparse is the only user of mp_ioapic_routing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:12 +02:00
Yinghai Lu
ba5b14cc03 x86: extend e820 ealy_res support 32bit - fix
use find_e820_area to find addess for new RAMDISK, instead of using ram blindly

also print out low ram and bootmap info

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:55:12 +02:00
Yinghai Lu
a4c81cf684 x86: extend e820 ealy_res support 32bit
move early_res related from e820_64.c to e820.c
make edba detection to be done in head32.c
remove smp_alloc_memory, because we have fixed trampoline address now.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

 arch/x86/kernel/e820.c              |  214 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/e820_64.c           |  196 --------------------------------
 arch/x86/kernel/head32.c            |   76 ++++++++++++
 arch/x86/kernel/setup_32.c          |  109 +++---------------
 arch/x86/kernel/smpboot.c           |   17 --
 arch/x86/kernel/trampoline.c        |    2
 arch/x86/mach-voyager/voyager_smp.c |    9 -
 include/asm-x86/e820.h              |    6 +
 include/asm-x86/e820_64.h           |    9 -
 include/asm-x86/smp.h               |    1
 arch/x86/kernel/e820.c              |  214 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/e820_64.c           |  196 --------------------------------
 arch/x86/kernel/head32.c            |   76 ++++++++++++
 arch/x86/kernel/setup_32.c          |  109 +++---------------
 arch/x86/kernel/smpboot.c           |   17 --
 arch/x86/kernel/trampoline.c        |    2
 arch/x86/mach-voyager/voyager_smp.c |    9 -
 include/asm-x86/e820.h              |    6 +
 include/asm-x86/e820_64.h           |    9 -
 include/asm-x86/smp.h               |    1
 arch/x86/kernel/e820.c              |  214 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/e820_64.c           |  196 --------------------------------
 arch/x86/kernel/head32.c            |   76 ++++++++++++
 arch/x86/kernel/setup_32.c          |  109 +++---------------
 arch/x86/kernel/smpboot.c           |   17 --
 arch/x86/kernel/trampoline.c        |    2
 arch/x86/mach-voyager/voyager_smp.c |    9 -
 include/asm-x86/e820.h              |    6 +
 include/asm-x86/e820_64.h           |    9 -
 include/asm-x86/smp.h               |    1
 10 files changed, 320 insertions(+), 319 deletions(-)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:11 +02:00
Paul Jackson
69c9189320 x86 boot: add code to add BIOS provided EFI memory entries to kernel
Add to the kernels boot memory map 'memmap' entries found in
the EFI memory descriptors passed in from the BIOS.

On EFI systems, up to E820MAX == 128 memory map entries can
be passed via the legacy E820 interface (limited by the size
of the 'zeropage').  These entries can be duplicated in the
EFI descriptors also passed from the BIOS, and possibly more
entries passed by the EFI interface, which does not have the
E820MAX limit on number of memory map entries.

This code doesn't worry about the likely duplicate, overlapping
or (unlikely) conflicting entries between the EFI map and the
E820 map.  It just dumps all the EFI entries into the memmap[]
array (which already has the E820 entries) and lets the existing
routine sanitize_e820_map() sort the mess out.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:11 +02:00
Paul Jackson
5b7eb2e9ef x86 boot: longer comment explaining sanitize_e820_map routine
Elaborate on the comment for sanitize_e820_map(), epxlaining more what
it does, what it inputs, and what it returns.  Rearrange the placement of
this comment to fit kernel conventions, before the routine's code rather
than buried inside it.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:11 +02:00
Paul Jackson
6e9bcc796b x86 boot: change sanitize_e820_map parameter from byte to int to allow bigger memory maps
The map size counter passed into, and back out of, sanitize_e820_map(),
was an eight bit type (char or u8), as derived from its origins in
legacy BIOS E820 structures.  This patch changes that type to an 'int',
to allow this sanitize routine to also be used on larger maps (larger
than the 256 count that fits in a char).  The legacy BIOS E820 interface
of course does not change; that remains at 8 bits for this count, holding
up to E820MAX == 128 entries.  But the kernel internals can handle more
when those additional memory map entries are passed from the BIOS via
EFI interfaces.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:11 +02:00
Paul Jackson
028b785888 x86 boot: extend some internal memory map arrays to handle larger EFI input
Extend internal boot time memory tables to allow for up to
three entries per node, which may be larger than the 128 E820MAX
entries handled by the legacy BIOS E820 interface.  The EFI
interface, if present, is capable of passing memory map
entries for these larger node counts.

This patch requires an earlier patch that rewrote code depending
on these array sizes from using E820MAX explicitly to size loops,
to instead using ARRAY_SIZE() of the applicable array.

Another patch following this one will provide the code to pick
up additional memory entries passed via the EFI interface from
the BIOS and insert them in the following, now enlarged, arrays.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:11 +02:00
Paul Jackson
c3965bd151 x86 boot: proper use of ARRAY_SIZE instead of repeated E820MAX constant
This patch is motivated by a subsequent patch which will allow for more
memory map entries on EFI supported systems than can be passed via the x86
legacy BIOS E820 interface.  The legacy interface is limited to E820MAX ==
128 memory entries, and that "E820MAX" manifest constant was used as the
size for several arrays and loops over those arrays.

The primary change in this patch is to change code loop sizes over those
arrays from using the constant E820MAX, to using the ARRAY_SIZE() macro
evaluated for the array being looped.  That way, a subsequent patch can
change the size of some of these arrays, without breaking this code.

This patch also adds a parameter to the sanitize_e820_map() routine,
which had an implicit size for the array passed it of E820MAX entries.
This new parameter explicitly passes the size of said array.  Once again,
this will allow a subsequent patch to change that array size for some
calls to sanitize_e820_map() without breaking the code.

As part of enhancing the sanitize_e820_map() interface this way, I further
combined the unnecessarily distinct x86_32 and x86_64 declarations for
this routine into a single, commonly used, declaration.

This patch in itself should make no difference to the resulting kernel
binary.

[ mingo@elte.hu: merged to -tip ]

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:11 +02:00
Paul Jackson
b25e31cec7 x86 boot: minor code format fixes in e820 and efi routines
Standardize a few pointer declarations to not have the
extra space after the '*' character.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:11 +02:00
Paul Jackson
e9197bf011 x86 boot: remove some unused extern function declarations
Remove three extern declarations for routines
that don't exist.  Fix a typo in a comment.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:10 +02:00
Yinghai Lu
3f03c54a34 x86: mtrr cleanup for converting continuous to discrete layout - fix #2
disable the noisy print out.
also use the one the less spare mtrr reg.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:10 +02:00
Yinghai Lu
8004dd965b x86: amd opteron TOM2 mask val fix
there is a typo in the mask value, need to remove that extra 0,
to avoid 4bit clearing.

Signed-off-by: Yinghal Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:10 +02:00
Yinghai Lu
b79cd8f126 x86: make e820.c to have common functions
remove the duplicated copy of these functions.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:10 +02:00
Yinghai Lu
833e78bfee x86: process fam 10h like k8 with fixed mtrr setting
otherwise fixed MTRR for family 10h may not be changed.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:55:10 +02:00
Yinghai Lu
12031a624a x86: mtrr cleanup for converting continuous to discrete - auto detect v4
Loop through mtrr chunk_size and gran_size from 1M to 2G to find out
the optimal value so user does not need to add mtrr_chunk_size and
mtrr_gran_size to the kernel command line.

If optimal value is not found, print out all list to help select less
optimal value.

Add mtrr_spare_reg_nr= so user could set 2 instead of 1, if the card
need more entries.

v2: find the one with more spare entries
v3: fix hole_basek offset
v4: tight the compare between range and range_new
    loop stop with 4g

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gabriel C <nix.or.die@googlemail.com>
Cc: Mika Fischer <mika.fischer@zoopnet.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:10 +02:00
Yinghai Lu
f5098d62c1 x86: mtrr cleanup for converting continuous to discrete layout v8 - fix
v9: address format change requests by Ingo
    more case handling in range_to_var_with_hole

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:55:10 +02:00
Yinghai Lu
8a374026c2 x86: fix trimming e820 with MTRR holes. - fix
v2: process hole then end_pfn
    fix update_memory_range with whole cover comparing

Signed-off-by: Yinghai Lu <yinghai.lu@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:55:10 +02:00
Yinghai Lu
42651f1582 x86: fix trimming e820 with MTRR holes.
converting MTRR layout from continous to discrete, some time could run out of
MTRRs. So add gran_sizek to prevent that by dumpping small RAM piece less than
gran_sizek.

previous trimming only can handle highest_pfn from mtrr to end_pfn from e820.
when have more than 4g RAM installed, there will be holes below 4g. so need to
check ram below 4g is coverred well.

need to be applied after
	[PATCH] x86: mtrr cleanup for converting continuous to discrete layout v7

Signed-off-by: Yinghai Lu <yinghai.lu@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:55:09 +02:00
Yinghai Lu
95ffa2438d x86: mtrr cleanup for converting continuous to discrete layout, v8
some BIOS like to use continus MTRR layout, and X driver can not add
WB entries for graphical cards when 4g or more RAM installed.

the patch will change MTRR to discrete.

mtrr_chunk_size= could be used to have smaller continuous block to hold holes.
default is 256m, could be set according to size of graphics card memory.

mtrr_gran_size= could be used to send smallest mtrr block to avoid run out of MTRRs

v2: fix -1 for UC checking
v3: default to disable, and need use enable_mtrr_cleanup to enable this feature
    skip the var state change warning.
    remove next_basek in range_to_mtrr()
v4: correct warning mask.
v5: CONFIG_MTRR_SANITIZER
v6: fix 1g, 2g, 512 aligment with extra hole
v7: gran_sizek to prevent running out of MTRRs.
v8: fix hole_basek caculation caused when removing next_basek
    gran_sizek using when basek is 0.

need to apply
	[PATCH] x86: fix trimming e820 with MTRR holes.
right after this one.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:55:09 +02:00
Alexander van Heukelum
0dbfafa5fc x86: move i386 memory setup code to e820_32.c
The x86_64 code has centralized the memory setup code in
e820_64.c. This patch copies that approach to i386:

- early_param("mem", ...) parsing is moved from
setup_32.c to e820_32.c.

- setup_memory_map() and finish_e820_parsing() are
factored out from setup_arch(), and declarations
are added to e820_32.h.

- print_memory_map() is made static and removed from
e820_32.h.

- user_defined_memmap is marked as __initdata.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:55:09 +02:00
Thomas Gleixner
0da72a4aeb x86: fix sparse warning in mtrr/generic.c
arch/x86/kernel/cpu/mtrr/generic.c:216:12: warning: symbol 'lo' shadows an earlier one

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 10:55:09 +02:00
Ingo Molnar
1ac9701816 x86: untangle pci dependencies
make PCI-less subarches not build with PCI - instead of complicating
the PCI dependencies.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:16:00 +02:00
Ingo Molnar
c9fea78dc9 x86: make NUMAQ depend on PCI
NUMAQ depends on the existence of PCI hardware, and requiring PCI
makes this subarch simpler and avoids build failures if PCI is
disabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 10:15:39 +02:00
Thomas Gleixner
48e2395722 x86: fixup the fallout of the bitops changes
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:36 +02:00
Jan Beulich
83cd1daa1d x86: eliminate dead code in x86_64 entry.S
Remove the not longer used handlers for reserved vectors.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:36 +02:00
Miklos Vajna
2141261a11 x86: janitor work in video-vga.c
Just moved an open brace to the previous line.

Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:35 +02:00
Miklos Vajna
bfe4bb1526 x86: janitor work in bugs.c
Just moved trailing statements to the next line, removed space before
open/close parenthesis, wrapped long lines.

Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:34 +02:00
Cyrill Gorcunov
0e192b99d7 x86: head_64.S cleanup - use PMD_SHIFT instead of numeric constant
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:33 +02:00
Cyrill Gorcunov
05139d8fb4 x86: head_64.S cleanup - use straight move to CR4 register
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:33 +02:00
Huang Weiyi
883b7af932 x86: smpboot.c: removed duplicated include
Removed duplicated include <asm/nmi.h> in
arch/x86/kernel/smpboot.c.

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:32 +02:00
Huang Weiyi
2cc74111c7 x86: ipi.c: removed duplicated include
Removed duplicated include <linux/interrupt.h> in
arch/x86/kernel/ipi.c.

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:32 +02:00
Cyrill Gorcunov
e83e31f41a x86: compressed/head_64.S cleanup - use predefined flags from processor-flags.h
We should better use already defined flags from processor-flags.h instead
of defining own ones

[>>> object code check >>>]

original
md5sum: 129f24be6df396fb7d8bf998c01fc716  arch/x86/boot/compressed/head_64.o
   text    data     bss     dec     hex filename
    705      48   45056   45809    b2f1 arch/x86/boot/compressed/head_64.o

patched
md5sum: 129f24be6df396fb7d8bf998c01fc716  arch/x86/boot/compressed/head_64-new.o
   text    data     bss     dec     hex filename
    705      48   45056   45809    b2f1 arch/x86/boot/compressed/head_64-new.o

[<<< object code check <<<]

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:32 +02:00
Cyrill Gorcunov
369101da7e x86: head_64.S cleanup - use predefined flags from processor-flags.h
We should better use already defined flags from processor-flags.h instead
of defining own ones

[>>> object code check >>>]

original
md5sum: 9cfa6dbf045a046bb5dfb85f8bcfe8c4  arch/x86/kernel/head_64.o
   text    data     bss     dec     hex filename
  37361    4432    8192   49985    c341 arch/x86/kernel/head_64.o

patched
md5sum: 9cfa6dbf045a046bb5dfb85f8bcfe8c4  arch/x86/kernel/head_64.o
   text    data     bss     dec     hex filename
  37361    4432    8192   49985    c341 arch/x86/kernel/head_64.o

[<<< object code check <<<]

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:31 +02:00
Jan Beulich
ebdd561a19 x86: constify data in reboot.c
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:30 +02:00
Jan Beulich
4e50e62ce5 x86: eliminate duplicate consistency checks in init_32.c
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:30 +02:00
Thomas Gleixner
6a1673ae22 x86: make memory_add_physaddr_to_nid depend on MEMORY_HOTPLUG
memory_add_physaddr_to_nid() is only used in the
CONFIG_MEMORY_HOTPLUG_SPARSE || CONFIG_ACPI_HOTPLUG_MEMORY case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:29 +02:00
Adrian Bunk
311f834948 x86: kernel/pci-dma.c cleanups
This patch contains the following cleanups:
- make the following needlessly global code static:
  - dma_alloc_pages()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:58:28 +02:00
Thomas Gleixner
968cbfad1a x86: make __pci_mmcfg_init static in mmconfig-shared.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:28 +02:00
Thomas Gleixner
e0b32d768c x86: make command_line static in setup_64.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:27 +02:00
Thomas Gleixner
11034d5597 x86: init64.c include initrd.h
free_initrd_mem needs a prototype, which is in linux/initrd.h

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:27 +02:00
Thomas Gleixner
d1097635de x86: move mmconfig declarations to header
arch/x86/kernel/mmconf-fam10h_64.c is missing the prototypes, which
are decalred in arch/x86/kernel/setup_64.c. Move the prototypes and
the inline stubs to the appropriate header file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:27 +02:00
Thomas Gleixner
55d4f22abc x86: k8topology cleanup variable declarations
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:26 +02:00
Thomas Gleixner
d34c08958f x86: k8topology fix shadow variable
sparse mutters:
arch/x86/mm/k8topology_64.c:108:7: warning: symbol 'nodeid' shadows an earlier one

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:26 +02:00
Thomas Gleixner
0eafe234a2 x86: k8topology add missing header
k8_scan_nodes is global and needs a prototype. Add the header file
which contains it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:26 +02:00
Thomas Gleixner
535694f361 x86: boot/printfc use NULL instead 0
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:25 +02:00
Thomas Gleixner
eef8f871d8 x86: vsmp_64 add missing includes
sparse mutters:
arch/x86/kernel/vsmp_64.c:126:5: warning: symbol 'is_vsmp_box' was not declared. Should it be static?
arch/x86/kernel/vsmp_64.c:145:13: warning: symbol 'vsmp_init' was not declared. Should it be static?

Include the appropriate headers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:58:24 +02:00
Andrew Morton
46dd98a5c0 arch/x86/mm/pat.c: use boot_cpu_has()
arch/x86/mm/pat.c: In function 'phys_mem_access_prot_allowed':
arch/x86/mm/pat.c:526: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:526: warning: passing argument 2 of 'variable_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:527: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:527: warning: passing argument 2 of 'variable_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:528: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:528: warning: passing argument 2 of 'variable_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:529: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type
arch/x86/mm/pat.c:529: warning: passing argument 2 of 'variable_test_bit' from incompatible pointer type

Don't open-code test_bit() on a __u32

Cc: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:50:25 +02:00
OGAWA Hirofumi
e6b0edef34 x86: clean up vdso_enabled type on x86_64
This fixes type of "vdso_enabled" on X86_64 to match extern in asm/elf.h.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 08:45:18 +02:00
Thomas Gleixner
c2c14fb7af x86: tsc_64.c make constant UL
arch/x86/kernel/tsc_64.c:245:13: warning: constant 0x100000000 is so big it is long

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-25 08:39:27 +02:00
Jan Beulich
a2eddfa959 x86: make /proc/stat account for all interrupts
LAPIC interrupts, which don't go through the generic interrupt handling
code, aren't accounted for in /proc/stat. Hence this patch adds a
mechanism architectures can use to accordingly adjust the statistics.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 07:11:49 +02:00
Jan Beulich
63687a528c x86: move tracedata to RODATA
.. allowing it to be write-protected just as other read-only data
under CONFIG_DEBUG_RODATA.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-25 07:09:47 +02:00
Linus Torvalds
eb90d81d03 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
  x86: prevent PGE flush from interruption/preemption
  x86: use explicit copy in vdso_gettimeofday()
  namespacecheck: automated fixes
  x86/xen: fix arbitrary_virt_to_machine()
  x86: don't read maxlvt before checking if APIC is mapped
  x86: disable TSC for sched_clock() when calibration failed
  x86: distangle user disabled TSC from unstable
  x86: fix setup of cyc2ns in tsc_64.c
2008-05-24 10:20:00 -07:00
Thomas Gleixner
ec42418f19 x86: rename the i8259_32/64.c leftovers to irqinit_32/64.c
The leftovers of the i8259 unification have nothing to do with i8259
at all. They contain interrupt init code and the i8259_xx name is just
misleading now.

Rename them to irqinit_32/64.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 16:47:39 +02:00
Thomas Gleixner
d23b200a75 x86: make init_ISA_irqs() static
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-24 16:45:36 +02:00
Pavel Machek
680afbf989 x86: i8259: cleanup codingstyle
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: macro@ds2.pg.gda.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 16:44:26 +02:00
Pavel Machek
3e8631d270 x86: i8259.c: remove trivial ifdefs
Remove #ifdefs where the only difference is formatting of comments.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: macro@ds2.pg.gda.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 16:44:26 +02:00
Pavel Machek
15d613cb25 x86: i8259.c: remove #ifdefs around includes
Remove #ifdefs around includes; including too much should be always
safe.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: macro@ds2.pg.gda.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 16:44:26 +02:00
Pavel Machek
21fd5132b2 x86: automatical unification of i8259.c
Make conversion of i8259 very mechanical -- i8259 was generated by
 diff -D, with too different parts left in i8259_32 and
i8259_64.c. Only "by hand" changes were removal of #ifdef from middle
of the comment (prevented compilation) and removal of one static to
allow splitting into files.

Of course, it will need some cleanups now, and those will follow.

Signed-of-by: Pavel Machek <pavel@suse.cz>
2008-05-24 16:44:26 +02:00
Thomas Gleixner
fce39665ab x86: make init_ISA_irqs() static
Moved to i8259 branch to avoid conflicts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-24 16:21:47 +02:00
Thomas Gleixner
f20b11e716 x86: rename the i8259_32/64.c leftovers to initirq_32/64.c
The leftovers of the i8259 unification have nothing to do with i8259
at all. They contain interrupt init code and the i8259_xx name is just
misleading now.

Rename them to initirq_32/64.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 15:59:58 +02:00
Pavel Machek
4d9a6e6128 x86: i8259: cleanup codingstyle
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: macro@ds2.pg.gda.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 15:57:10 +02:00
Pavel Machek
4f6f3bac18 x86: i8259.c: remove trivial ifdefs
Remove #ifdefs where the only difference is formatting of comments.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: macro@ds2.pg.gda.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 15:57:10 +02:00
Pavel Machek
6b4d3afbe7 x86: i8259.c: remove #ifdefs around includes
Remove #ifdefs around includes; including too much should be always
safe.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: macro@ds2.pg.gda.pl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-24 15:57:09 +02:00
Pavel Machek
40bd217400 x86: automatical unification of i8259.c
Make conversion of i8259 very mechanical -- i8259 was generated by
2008-05-24 15:57:09 +02:00
Linus Torvalds
e6b027a398 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] clarify license of freq_table.c
  [CPUFREQ] Remove documentation of removed ondemand tunable.
  [CPUFREQ] Crusoe: longrun cpufreq module reports false min freq
  [CPUFREQ] powernow-k8: improve error messages
2008-05-23 09:24:52 -07:00
Harvey Harrison
7fafd91d85 x86: fix integer as NULL pointer warning
arch/x86/boot/printf.c:59:10: warning: Using plain integer as NULL pointer

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-23 08:11:06 -07:00
Andi Kleen
a1289643ad x86: use explicit copy in vdso_gettimeofday()
Jeremy's gcc 3.4 seems to be unable to inline a 8 byte memcpy.  But the
vdso doesn't support external references.  Copy the structure members
of struct timezone explicitely instead.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 14:08:06 +02:00
Ingo Molnar
2ddfd20e7c namespacecheck: automated fixes
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-23 14:08:06 +02:00
Jan Beulich
de067814d6 x86/xen: fix arbitrary_virt_to_machine()
While I realize that the function isn't currently being used, I still
think an obvious mistake like this should be corrected.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 14:08:06 +02:00
Chuck Ebbert
2584a82dee x86: don't read maxlvt before checking if APIC is mapped
A check for unmapped apic was added before reading maxlvt but the early
read of maxlvt wasn't removed.

Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2008-05-23 14:08:06 +02:00
Thomas Gleixner
74dc51a3de x86: disable TSC for sched_clock() when calibration failed
When the TSC calibration fails then TSC is still used in
sched_clock(). Disable it completely in that case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2008-05-23 14:08:06 +02:00
Thomas Gleixner
9ccc906c97 x86: distangle user disabled TSC from unstable
tsc_enabled is set to 0 from the command line switch "notsc" and from
the mark_tsc_unstable code. Seperate those functionalities and replace
tsc_enable with tsc_disable. This makes also the native_sched_clock()
decision when to use TSC understandable.

Preparatory patch to solve the sched_clock() issue on 32 bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 14:08:06 +02:00
Thomas Gleixner
b6db80ee13 x86: fix setup of cyc2ns in tsc_64.c
When the TSC is calibrated against the PIT due to the nonavailability
of PMTIMER/HPET or due to SMI interference then the setup of the per
CPU cyc2ns variables is skipped. This is unlikely to happen but it
would definitely render sched_clock() unusable.

This was introduced with commit 53d517cdba

    x86: scale cyc_2_nsec according to CPU frequency

Update the per CPU cyc2ns variables in all exit pathes of tsc_calibrate.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2008-05-23 14:08:06 +02:00
Mike Travis
c3ed64295f x86: change maximum NR_CPUS to 4096 and MAX_NUMNODES to 512
* Change the range of NR_CPUS from 2-255 to 2-4096 and change the
    range of MAX_NUMNODES (NODES_SHIFT) from 1-32768 to 1-512.

  * Alter comment about how much each increment of NR_CPUS consumes.
    (This was found by configuring for 256 cpus and then 512 cpus
     and dividing the difference by 256.)

Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 00:12:28 +02:00
Dave Jones
873b274a41 x86: Add Centaur and Transmeta CPUs to PAT whitelist
Unconditionally enable PAT support on Centaur and Transmeta CPUs.
All known models that advertise PAT have no known errata.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-05-22 13:25:22 -07:00
Jan Beulich
b478458aee x86: avoid re-loading LDT in unrelated address spaces
Performance optimization.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-22 19:11:20 +02:00
Jeremy Fitzhardinge
3843fc2575 xen: remove support for non-PAE 32-bit
Non-PAE operation has been deprecated in Xen for a while, and is
rarely tested or used.  xen-unstable has now officially dropped
non-PAE support.  Since Xen/pvops' non-PAE support has also been
broken for a while, we may as well completely drop it altogether.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-22 18:42:49 +02:00
Thomas Gleixner
0748aca6f0 x86: remove useless static current_tsc_khz variable
current_tsc_khz is just written by the init code and never used
again. Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-22 18:34:40 +02:00
Tony Camuso
a167607255 PCI: Correct last two HP entries in the bfsort whitelist
Greetings.

There is a code flaw in the bfsort whitelist, where there are redundant
entries for the same two HP systems, DL385 G2 and DL585 G2. This patch
replaces those redundant entries with the correct ones. The correct
entries are for large-volume systems, the DL360 and DL380.

-----------------------------------------------------------------------

commit ec69f0374c3b0ad7ea991b0e9ac00377acfe5b1a
Author: Tony Camuso <tony.camuso@hp.com>
Date:   Wed May 14 07:09:28 2008 -0400

     Replace Redundant Whitelist Entries with the Correct Ones

     The ProLiant DL585 G2 and the DL585 G2 are entered reundantly
     in the dmi_system_id table. What should have been there are the
     DL360 and DL380. This patch simply replaces the redundant
     entries with the correct entries.

 arch/x86/pci/common.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

     Signed-off-by: Tony Camuso <tony.camuso@hp.com>
     Signed-off-by: Pat Schoeller <patrick.schoeller@hp.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-22 18:16:24 +02:00
Pavel Machek
0abbc78a01 x86, aperture_64: use symbolic constants
Factor-out common aperture_valid code.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-22 11:35:14 +02:00
Linus Torvalds
737b0fbf44 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: correct mailing list address
  PCI: Correct last two HP entries in the bfsort whitelist
2008-05-20 10:55:04 -07:00
Andy Whitcroft
84d6bd0e27 x86: cope with no remap space being allocated for a numa node
When allocating the pgdat's for numa nodes on x86_32 we attempt to place
them in the numa remap space for that node.  However should the node not
have any remap space allocated (such as due to having non-ram pages in
the remap location in the node) then we will incorrectly place the pgdat
at zero.  Check we have remap available, falling back to node 0 memory
where we do not.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-20 14:10:50 +02:00
Andy Whitcroft
b9ada4281c x86: reinstate numa remap for SPARSEMEM on x86 NUMA systems
Recent kernels have been panic'ing trying to allocate memory early in boot,
in __alloc_pages:

  BUG: unable to handle kernel paging request at 00001568
  IP: [<c10407b6>] __alloc_pages+0x33/0x2cc
  *pdpt = 00000000013a5001 *pde = 0000000000000000
  Oops: 0000 [#1] SMP
  Modules linked in:

  Pid: 1, comm: swapper Not tainted (2.6.25 #78)
  EIP: 0060:[<c10407b6>] EFLAGS: 00010246 CPU: 0
  EIP is at __alloc_pages+0x33/0x2cc
  EAX: 00001564 EBX: 000412d0 ECX: 00001564 EDX: 000005c3
  ESI: f78012a0 EDI: 00000001 EBP: 00001564 ESP: f7871e50
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  Process swapper (pid: 1, ti=f7870000 task=f786f670 task.ti=f7870000)
  Stack: 00000000 f786f670 00000010 00000000 0000b700 000412d0 f78012a0 00000001
         00000000 c105b64d 00000000 000412d0 f78012a0 f7803120 00000000 c105c1c5
         00000010 f7803144 000412d0 00000001 f7803130 f7803120 f78012a0 00000001
  Call Trace:
   [<c105b64d>] kmem_getpages+0x94/0x129
   [<c105c1c5>] cache_grow+0x8f/0x123
   [<c105c689>] ____cache_alloc_node+0xb9/0xe4
   [<c105c999>] kmem_cache_alloc_node+0x92/0xd2
   [<c1018929>] build_sched_domains+0x536/0x70d
   [<c100b63c>] do_flush_tlb_all+0x0/0x3f
   [<c100b63c>] do_flush_tlb_all+0x0/0x3f
   [<c10572d6>] interleave_nodes+0x23/0x5a
   [<c105c44f>] alternate_node_alloc+0x43/0x5b
   [<c1018b47>] arch_init_sched_domains+0x46/0x51
   [<c136e85e>] kernel_init+0x0/0x82
   [<c137ac19>] sched_init_smp+0x10/0xbb
   [<c136e8a1>] kernel_init+0x43/0x82
   [<c10035cf>] kernel_thread_helper+0x7/0x10

Debugging this showed that the NODE_DATA() for nodes other than node 0
were all NULL.  Tracing this back showed that the NODE_DATA() pointers
were being initialised to each nodes remap space.  However under
SPARSEMEM remap is disabled which leads to the pgdat's being placed
incorrectly at kernel virtual address 0.  Leading to the panic when
attempting to allocate memory from these nodes.

Numa remap was disabled in the commit below.  This occured while fixing
problems triggered when attempting to boot x86_32 NUMA SPARSEMEM kernels
on non-numa hardware.

	x86: make NUMA work on 32-bit
	commit 1b000a5dbe

The real problem is believed to be related to other alignment issues in
the regions blocked out from the bootmem allocator for small memory
systems, and has been fixed separately.  Therefore re-enable remap for
SPARSMEM, which fixes pgdat allocation issues.  Testing confirms that
SPARSMEM NUMA kernels will boot correctly with this part of the change
reverted.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-20 14:10:48 +02:00
Linus Torvalds
e23a5f6687 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  [PATCH] return to old errno choice in mkdir() et.al.
  [Patch] fs/binfmt_elf.c: fix wrong return values
  [PATCH] get rid of leak in compat_execve()
  [Patch] fs/binfmt_elf.c: fix a wrong free
  [PATCH] avoid multiplication overflows and signedness issues for max_fds
  [PATCH] dup_fd() part 4 - race fix
  [PATCH] dup_fd() - part 3
  [PATCH] dup_fd() part 2
  [PATCH] dup_fd() fixes, part 1
  [PATCH] take init_files to fs/file.c
2008-05-19 16:37:45 -07:00
maximilian attems
667ad4f701 [CPUFREQ] Crusoe: longrun cpufreq module reports false min freq
The longrun cpufreq module reports a false minimum frequency 3MHz on
300-600MHz Crusoe processor.  This may be due to a calculation bug
in the module.

Original patch from Kaz Sasayama <kazssym@hypercore.co.jp>
submitted as http://bugs.debian.org/468149 patch ported to x86

Cc: Kaz Sasayama <kazssym@hypercore.co.jp>
Signed-off-by: maximilian attems <max@stro.at>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-05-19 18:17:28 -04:00
Mark Langsdorf
eba9fe93a2 [CPUFREQ] powernow-k8: improve error messages
The most common error with powernow-k8 is an ACPI _PSS error
caused either by failure to load the ACPI processor module
or a bad parse of the _PSS object.  Make the error message
returned to the user in these situations more straightforward
and easier to understand.

-Mark Langsdorf
Operating System Research Center
AMD

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-05-19 18:17:27 -04:00
Linus Torvalds
88d53766bd Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: LAPIC: ignore pending timers if LVTT is disabled
  KVM: Update MAINTAINERS for new mailing lists
  KVM: Fix kvm_vcpu_block() task state race
  KVM: ia64: Set KVM_IOAPIC_NUM_PINS to 48
  KVM: ia64: fix GVMM module including position-dependent objects
  KVM: ia64: Define new kvm_fpreg struture to replace ia64_fpreg
  KVM: PIT: take inject_pending into account when emulating hlt
  s390: KVM guest: fix compile error
  KVM: x86 emulator: fix writes to registers with modrm encodings
2008-05-19 13:53:21 -07:00
Tony Camuso
8d64c781f0 PCI: Correct last two HP entries in the bfsort whitelist
Replace Redundant Whitelist Entries with the Correct Ones

The ProLiant DL585 G2 and the DL585 G2 are entered reundantly in the
dmi_system_id table. What should have been there are the DL360 and DL380. This
patch simply replaces the redundant entries with the correct entries.

Signed-off-by: Tony Camuso <tony.camuso@hp.com>
Signed-off-by: Pat Schoeller <patrick.schoeller@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-05-19 12:21:36 -07:00
Ingo Molnar
a8ac1ae3a2 Merge branch 'linus' into x86/pat 2008-05-19 09:23:07 +02:00
Marcelo Tosatti
54aaacee35 KVM: LAPIC: ignore pending timers if LVTT is disabled
Only use the APIC pending timers count to break out of HLT emulation if
the timer vector is enabled.

Certain configurations of Windows simply mask out the vector without
disabling the timer.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-18 14:39:39 +03:00
Marcelo Tosatti
eedaa4e2af KVM: PIT: take inject_pending into account when emulating hlt
Otherwise hlt emulation fails if PIT is not injecting IRQ's.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-18 14:34:15 +03:00
Avi Kivity
107d6d2efa KVM: x86 emulator: fix writes to registers with modrm encodings
A register destination encoded with a mod=3 encoding left dst.ptr NULL.
Normally we don't trap writes to registers, but in the case of smsw, we do.

Fix by pointing dst.ptr at the destination register.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-18 14:34:14 +03:00
Thomas Gleixner
b4ef290d7c Merge branch 'linus' into x86/pat 2008-05-18 11:54:43 +02:00
Thomas Gleixner
e9623b3559 x86: disable mwait for AMD family 10H/11H CPUs
The previous revert of 0c07ee38c9 left
out the mwait disable condition for AMD family 10H/11H CPUs.

Andreas Herrman said:

It depends on the CPU. For AMD CPUs that support MWAIT this is wrong.
Family 0x10 and 0x11 CPUs will enter C1 on HLT. Powersavings then
depend on a clock divisor and current Pstate of the core.

If all cores of a processor are in halt state (C1) the processor can
enter the C1E (C1 enhanced) state. If mwait is used this will never
happen.

Thus HLT saves more power than MWAIT here.

It might be best to switch off the mwait flag for these AMD CPU
families like it was introduced with commit
f039b75471 (x86: Don't use MWAIT on AMD
Family 10)

Re-add the AMD families 10H/11H check and disable the mwait usage for
those.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-17 22:57:20 +02:00
Avi Kivity
31f4d870b0 x86: fix crash on cpu hotplug on pat-incapable machines
pat_disable() is __init, which means it goes away after booting is complete.
Unfortunately it is used by the hotplug code if the machine is not
pat-capable, causing a crash.

Fix by marking pat_disable() as __cpuinit.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-17 22:57:20 +02:00
Ingo Molnar
a738d897b7 x86: remove mwait capability C-state check
Vegard Nossum reports:

| powertop shows between 200-400 wakeups/second with the description
| "<kernel IPI>: Rescheduling interrupts" when all processors have load (e.g.
| I need to run two busy-loops on my 2-CPU system for this to show up).
|
| The bisect resulted in this commit:
|
| commit 0c07ee38c9
| Date:   Wed Jan 30 13:33:16 2008 +0100
|
|     x86: use the correct cpuid method to detect MWAIT support for C states

remove the functional effects of this patch and make mwait unconditional.

A future patch will turn off mwait on specific CPUs where that causes
power to be wasted.

Bisected-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-17 22:57:20 +02:00
Thomas Gleixner
538f0fd0f2 Merge branch 'linus' into x86/gart 2008-05-17 17:12:24 +02:00
Al Viro
f52111b154 [PATCH] take init_files to fs/file.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-05-16 17:22:20 -04:00
Linus Torvalds
4ef7e3e90f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: user_regset_view table fix for ia32 on 64-bit
  x86: arch/x86/mm/pat.c - fix warning
  x86: fix csum_partial() export
  x86: early_init_centaur(): use set_cpu_cap()
  x86: fix app crashes after SMP resume
  x86: wakeup.lds.S - section ordering fix
  x86: [VOYAGER] fix duplicate phys_cpu_present_map symbol
  x86/pci: fix broken ISA DMA
2008-05-13 12:33:56 -07:00
Roland McGrath
1f465f4e47 x86: user_regset_view table fix for ia32 on 64-bit
The user_regset_view table for the 32-bit regsets on the 64-bit build had
the wrong sizes for the FP regsets.  This bug had no user-visible effect
(just on kernel modules using the user_regset interfaces and the like).
But the fix is trivial and risk-free.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-13 19:40:20 +02:00
Pranith Kumar
afc8534380 x86: arch/x86/mm/pat.c - fix warning
fix this warning:

 arch/x86/mm/pat.c: In function `phys_mem_access_prot_allowed':
 arch/x86/mm/pat.c:558: warning: long long unsigned int format, long
 unsigned int arg (arg 6)
 arch/x86/mm/pat.c: In function `map_devmem':
 arch/x86/mm/pat.c:580: warning: long long unsigned int format, long
 unsigned int arg (arg 6)

Signed-off-by: D Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-13 19:39:30 +02:00
Ingo Molnar
89804c022f x86: fix csum_partial() export
Fix this symbol export problem:

    Building modules, stage 2.
    MODPOST 193 modules
    ERROR: "csum_partial" [fs/reiserfs/reiserfs.ko] undefined!
    make[1]: *** [__modpost] Error 1
    make: *** [modules] Error 2

This is due to a known weakness of symbol exports: if a symbol's
only in-core user is an EXPORT_SYMBOL from a lib-y section, the
symbol is not linked in.

The solution is to move the export to x8664_ksyms_64.c - but the real
solution would be to fix kbuild.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-13 19:38:47 +02:00
Andrew Morton
8c45a4e4f2 x86: early_init_centaur(): use set_cpu_cap()
arch/x86/kernel/setup_64.c:954: warning: passing argument 2 of 'set_bit' from incompatible pointer type

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-13 19:37:38 +02:00
Hugh Dickins
61165d7a03 x86: fix app crashes after SMP resume
After resume on a 2cpu laptop, kernel builds collapse with a sed hang,
sh or make segfault (often on 20295564), real-time signal to cc1 etc.

Several hurdles to jump, but a manually-assisted bisect led to -rc1's
d2bcbad5f3 x86: do not zap_low_mappings
in __smp_prepare_cpus.  Though the low mappings were removed at bootup,
they were left behind (with Global flags helping to keep them in TLB)
after resume or cpu online, causing the crashes seen.

Reinstate zap_low_mappings (with local __flush_tlb_all) for each cpu_up
on x86_32.  This used to be serialized by smp_commenced_mask: that's now
gone, but a low_mappings flag will do.  No need for native_smp_cpus_done
to repeat the zap: let mem_init zap BSP's low mappings just like on UP.

(In passing, fix error code from native_cpu_up: do_boot_cpu returns a
variety of diagnostic values, Dprintk what it says but convert to -EIO.
And save_pg_dir separately before zap_low_mappings: doesn't matter now,
but zapping twice in succession wiped out resume's swsusp_pg_dir.)

That worked well on the duo and one quad, but wouldn't boot 3rd or 4th
cpu on P4 Xeon, oopsing just after unlock_ipi_call_lock.  The TLB flush
IPI now being sent reveals a long-standing bug: the booting cpu has its
APIC readied in smp_callin at the top of start_secondary, but isn't put
into the cpu_online_map until just before that unlock_ipi_call_lock.

So native_smp_call_function_mask to online cpus would send_IPI_allbutself,
including the cpu just coming up, though it has been excluded from the
count to wait for: by the time it handles the IPI, the call data on
native_smp_call_function_mask's stack may well have been overwritten.

So fall back to send_IPI_mask while cpu_online_map does not match
cpu_callout_map: perhaps there's a better APICological fix to be
made at the start_secondary end, but I wouldn't know that.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-13 19:36:12 +02:00
Venki Pallipadi
77db988564 x86/PCI: X86_PAT & mprotect
Some versions of X used the mprotect workaround to change caching type from UC
to WB, so that it can then use mtrr to program WC for that region [1].  Change
the mmap of pci space through /sys or /proc interfaces from UC to UC_MINUS.
With this change, X will not need to use mprotect workaround to get WC type
since the MTRR mapping type will be honored.

The bug in mprotect that clobbers PAT bits is fixed in a follow on patch. So,
this X workaround will stop working as well.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-05-13 09:51:54 -07:00
Takashi Iwai
4a367f3a9d x86/PCI: fix broken ISA DMA
Rene Herman reported:

> commit 8779f2fc3b
>
> "x86: don't try to allocate from DMA zone at first"
>
> breaks all of ISA DMA. Or all of ALSA ISA DMA at least. All
> ISA soundcards are silent following that commit -- no error
> messages, everything appears fine, just silence.

That patch is buggy. We had an implicit assumption that
dev = NULL for ISA devices that require 24bit DMA.

The recent work on x86 dma_alloc_coherent() breaks the ISA DMA buffer
allocation, which is represented by "dev = NULL" and requires 24bit
DMA implicitly.

Bisected-by: Rene Herman <rene.herman@keyaccess.nl>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-05-13 09:51:53 -07:00
Yinghai Lu
0327318445 x86_64: simplify the memtest parameter setting
use CONFIG_MEMTEST only. if it is set, will have memtest=0 (disabled)

need to have memtest=4 in command line to test more patterns.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:15 +02:00
Dave Jones
205f932880 x86: add new cache descriptor
The latest rev of Intel doc AP-485 details a new cache
descriptor that we don't yet support.
A 6MB 24-way assoc L2 cache.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:13 +02:00
Ingo Molnar
5cb04df8d3 x86: defconfig updates
refresh 32-bit defconfig too, and update the 64-bit configs as well,
the defconfig should be much more useful by default, so most of the
updates are the enabling of various options.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:12 +02:00
Auke Kok
b5d958ea66 e1000e: set CONFIG_E1000E=y in x86 defconfigs
This adds to the already default CONFIG_E1000=y in these files.

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:12 +02:00
Cyrill Gorcunov
622e3f28e5 x86: 64-bit defconfig remake
The current x86_64_defconfig contains a number of nonexistent
symbols. Lets fix it.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:12 +02:00
Hiroshi Shimamoto
f59a9310b9 x86: nmi and irq counters are unsigned in nmi_64.c
__nmi_count, apic_timer_irqs and irq0_irqs are unsigned.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:11 +02:00
Hiroshi Shimamoto
f784946ded x86: use per_cpu data in nmi_32
use per_cpu for per CPU data.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:11 +02:00
Pavel Machek
3bb6fbf996 x86 gart: factor out common code
Cleanup gart handling on amd64 a bit: move common code into
enable_gart_translation , and use symbolic register names where
appropriate.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:10 +02:00
Yinghai Lu
330fce23da x86: reserve dma32 early for gart fix
we can use free_bootmem() directly.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:10 +02:00
Yinghai Lu
55c0d721df x86: clean up aperture_64.c
1. use symbolic register names where appropriate.
2. num to bus or slot changing
3. handle for new opteron for bus other than 0

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:10 +02:00
Yinghai Lu
7677b2ef6c x86_64: allocate gart aperture from 512M
because we try to reserve dma32 early, so we have chance to get aperture
from 64M.

with some sequence aperture allocated from RAM, could become E820_RESERVED.

and then if doing a kexec with a big kernel that uncompressed size is above
64M we could have a range conflict with still using gart.

So allocate gart aperture from 512M instead.

Also change the fallback_aper_order to 5, because we don't have chance to get
2G or 4G aperture.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:10 +02:00
Yinghai Lu
8c9fd91a0d x86: checking aperture size order
some systems are using 32M for gart and agp when memory is less than 4G.
Kernel will reject and try to allcate another 64M that is not needed,
and we will waste 64M of perfectly good RAM.

this patch adds a workaround by checking aper_base/order between NB and
agp bridge. If they are the same, and memory size is less than 4G, it
will allow it.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:10 +02:00
Yinghai Lu
1edc1ab3f6 x86: agp_gart size checking for buggy device
while looking at Rafael J. Wysocki's system boot log,

I found a funny printout:

	Node 0: aperture @ de000000 size 32 MB
	Aperture too small (32 MB)
	AGP bridge at 00:04:00
	Aperture from AGP @ de000000 size 4096 MB (APSIZE 0)
	Aperture too small (0 MB)
	Your BIOS doesn't leave a aperture memory hole
	Please enable the IOMMU option in the BIOS setup
	This costs you 64 MB of RAM
	Mapping aperture over 65536 KB of RAM @ 4000000

	...

	agpgart: Detected AGP bridge 20
	agpgart: Aperture pointing to RAM
	agpgart: Aperture from AGP @ de000000 size 4096 MB
	agpgart: Aperture too small (0 MB)
	agpgart: No usable aperture found.
	agpgart: Consider rebooting with iommu=memaper=2 to get a good aperture.

it means BIOS allocated the correct gart on the NB and AGP bridge, but
because a bug in the silicon (the agp bridge reports the wrong order,
it wants 4G instead) the kernel will reject that allocation.

Also, because the size is only 32MB, and we try to get another 64M for gart,
late fix_northbridge can not revert that change because it still reads
the wrong size from agp bridge.

So try to double check the order value from the agp bridge, before calling
aperture_valid().

[ mingo@elte.hu: 32-bit fix. ]

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:10 +02:00
Pavel Machek
aa134f1b09 x86: iommu: use symbolic constants, not hardcoded numbers
Move symbolic constants into gart.h, and use them instead of hardcoded
constant.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:10 +02:00
Jan Beulich
f8096f92b8 x86: separate cmpxchg8b checking from PAE checking
.. allowing the former to be use in non-PAE kernels, too.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:09 +02:00
Venki Pallipadi
77b52b4c5c x86: add "debugpat" boot option
enable debug messages by a boot option "debugpat".

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:07 +02:00
Thomas Gleixner
403d8efc94 x86: irq_32 move 4kstacks code to one place
Move the 4KSTACKS related code to one place. This allows to un#ifdef
do_IRQ() and share the executed on stack for the stack overflow printk
and the softirq call.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:06 +02:00
Thomas Gleixner
de9b10af12 x86: janitor stack overflow warning patch
Add KERN_WARNING to the printk as this could not be done in the
original patch, which allegedly only moves code around.

Un#ifdef do_IRQ.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:06 +02:00
Andi Kleen
04b361abfd i386: Execute stack overflow warning on interrupt stack v2
Previously the reporting printk would run on the process stack, which risks
overflow an already low stack. Instead execute it on the interrupt stack.
This makes it more likely for the printk to make it actually out.

It adds one not taken test/branch more to the interrupt path when
stack overflow checking is enabled. We could avoid that by duplicating
more code, but that seemed not worth it.

Based on an observation by Eric Sandeen.

v2: Fix warnings in some configs

Signed-off-by: Andi Kleen <andi@firstfloor.org>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:06 +02:00
Alan Mayer
ce17833183 x86: change FIRST_SYSTEM_VECTOR to a variable, fix
Fixes the build error introduced by my FIRST_SYSTEM_VECTOR patch

Signed-off-by: Alan Mayer <ajm@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:06 +02:00
Alan Mayer
305b92a232 x86: change FIRST_SYSTEM_VECTOR to a variable
The SGI UV system needs several more system vectors than a vanilla
x86_64 system.  Rather than burden the other archs with extra system
vectors that they don't use, change FIRST_SYSTEM_VECTOR to a variable,
so that it can be dynamic.

Signed-off-by: Alan Mayer <ajm@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:28:06 +02:00
Thomas Gleixner
1a331957ef x86: move eisa_set_level_irq declaration to header
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:06 +02:00
Thomas Gleixner
0bc471d930 x86: move BUILD_IRQ macro magic to i8259_64.c
i8259_64.c is the only place which uses those macros.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:05 +02:00
Thomas Gleixner
9b7dc567d0 x86: unify interrupt vector defines
The interrupt vector defines are copied 4 times around with minimal
differences. Move them all into asm-x86/irq_vectors.h

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:05 +02:00
Jan Beulich
41b3eae669 x86: minor polishing to top-level arch Makefile
Use build target when creating compatibility link, and use $(boot)
where possible.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:28:04 +02:00
Venki Pallipadi
8edc5cc5ec x86: remove 6 bank limitation in 64 bit MCE reporting code
Eliminate the 6 bank restriction in 64 bit mce reporting code. This
restriction is artificial (due to static creation of sysfs files) and 32
bit code does not have any such restriction.

This change helps in reporting the details of machine checks on a
machine check exception with errors in bank 6 and above on CPUs that
support those banks. Without the patch, machine check errors in those
banks are not reported.

We still have 128 (MCE_EXTENDED_BANK) bank restriction instead of max
256 supported in hardware. That is not changed in the patch below as it
will have some user level mcelog utility dependency, with bank 128 being
used for thermal reporting currently.

The patch below does not create sysfs control (bankNctl) for banks
higher than 6 as well. That needs some pre-cleanup in /sysfs mce layout,
removal of per cpu /sysfs entries for bankctl as they are really global
system level control today. That change will follow. This basic change
is critical to report the detailed errors on banks higher than 6.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:27:55 +02:00
Thomas Gleixner
7c4728f4a8 x86: print info about available HPET quirk
We have a lot of HPET quirks available which might force enable HPET
even when the BIOS does not enable it. Some of those quirks depend on
the command line option "hpet=force".

Andrew pointed out that hoping that the user will find out about this
boot option is not really helpful.

Emit a kernel info which informs the user about the "hpet=force" boot
option when we enter a quirk which depends on this option and the user
did not provide it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:27:54 +02:00
Andreas Herrmann
e8aa4667ba x86: enable hpet=force for AMD SB400
Add quirk to allow forced usage of HPET on ATI SB400.
I stumbled over machines where HPET is enabled but not reported
by BIOS. This patch configures the HPET base address and makes
it known to the OS.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:27:54 +02:00
Carlos R. Mafra
6fd592daae x86: clean up computation of HPET .mult variables
While reading through the HPET code I realized that the
computation of .mult variables could be done with less
lines of code, resulting in a 1.6% text size saving
for hpet.o

So I propose the following patch, which applies against
today's Linus -git tree.

>From 0c6507e400e9ca5f7f14331e18f8c12baf75a9d3 Mon Sep 17 00:00:00 2001
From: Carlos R. Mafra <crmafra@ift.unesp.br>
Date: Mon, 5 May 2008 19:38:53 -0300

The computation of clocksource_hpet.mult

       tmp = (u64)hpet_period << HPET_SHIFT;
       do_div(tmp, FSEC_PER_NSEC);
       clocksource_hpet.mult = (u32)tmp;

can be streamlined if we note that it is equal to

       clocksource_hpet.mult = div_sc(hpet_period, FSEC_PER_NSEC, HPET_SHIFT);

Furthermore, the computation of hpet_clockevent.mult

       uint64_t hpet_freq;

       hpet_freq = 1000000000000000ULL;
       do_div(hpet_freq, hpet_period);
       hpet_clockevent.mult = div_sc((unsigned long) hpet_freq,
                                     NSEC_PER_SEC, hpet_clockevent.shift);

can also be streamlined with the observation that hpet_period and hpet_freq are
inverse to each other (in proper units).

So instead of computing hpet_freq and using (schematically)
div_sc(hpet_freq, 10^9, shift) we use the trick of calling with the
arguments in reverse order, div_sc(10^6, hpet_period, shift).

The different power of ten is due to frequency being in Hertz (1/sec)
and the period being in units of femtosecond. Explicitly,

mult = (hpet_freq * 2^shift)/10^9    (before)
mult = (10^6 * 2^shift)/hpet_period  (after)

because hpet_freq = 10^15/hpet_period.

The comments in the code are also updated to reflect the changes.

As a result,

   text    data     bss     dec     hex filename
   2957     425      92    3474     d92 arch/x86/kernel/hpet.o
   3006     425      92    3523     dc3 arch/x86/kernel/hpet.o.old

a 1.6% reduction in text size.

Signed-off-by: Carlos R. Mafra <crmafra@ift.unesp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:27:54 +02:00
Cyrill Gorcunov
8c6b0ef2ea x86: wakeup.lds.S - section ordering fix
To allow linker to catch sections overlapping we have to declare
them in appropriate order.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:27:51 +02:00
James Bottomley
f8955ebe3e x86: [VOYAGER] fix duplicate phys_cpu_present_map symbol
The phys_cpu_present_map is an expected symbol in the SMP harness.
Unfortunately, x86 recently moved this and a few others to
kernel/setup.c where it doesn't quite work because voyager has to
define its own.  Use CONFIG_X86_LOCAL_APIC to isolate these
definitions and fix up another area in setup.c where CONFIG_X86_SMP
should be used instead of CONFIG_SMP.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: toralf.foerster@gmx.de
Cc: Mike Travis <travis@sgi.com>
Cc: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:27:51 +02:00
Takashi Iwai
8965eb1938 x86/pci: fix broken ISA DMA
Rene Herman reported:

> commit 8779f2fc3b
>
> "x86: don't try to allocate from DMA zone at first"
>
> breaks all of ISA DMA. Or all of ALSA ISA DMA at least. All
> ISA soundcards are silent following that commit -- no error
> messages, everything appears fine, just silence.

That patch is buggy. We had an implicit assumption that
dev = NULL for ISA devices that require 24bit DMA.

The recent work on x86 dma_alloc_coherent() breaks the ISA DMA buffer
allocation, which is represented by "dev = NULL" and requires 24bit
DMA implicitly.

Bisected-by: Rene Herman <rene.herman@keyaccess.nl>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Rene Herman <rene.herman@keyaccess.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:27:50 +02:00
Linus Torvalds
3e1b83ab39 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
  x86: rdc: leds build/config fix
  x86: sysfs cpu?/topology is empty in 2.6.25 (32-bit Intel system)
  x86: revert commit 709f744 ("x86: bitops asm constraint fixes")
  x86: restrict keyboard io ports reservation to make ipmi driver work
  x86: fix fpu restore from sig return
  x86: remove spew print out about bus to node mapping
  x86: revert printk format warning change which is for linux-next
  x86: cleanup PAT cpu validation
  x86: geode: define geode_has_vsa2() even if CONFIG_MGEODE_LX is not set
  x86: GEODE: cache results from geode_has_vsa2() and uninline
  x86: revert geode config dependency
2008-05-10 21:10:48 -07:00
Ingo Molnar
82fd866701 x86: rdc: leds build/config fix
select NEW_LEDS for now until the Kconfig dependencies have been
fixed.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-10 19:31:45 +02:00
Helge Wagner
9096bd7a66 x86: restrict keyboard io ports reservation to make ipmi driver work
On some of our (single board computer) boards (x86) we are using an
IPMI controller that uses I/O ports 0x62 and 0x66 for a KCS (keyboard
controller style) IPMI system interface.

Trying to load the openipmi driver fails, because the ports
(0x62/0x66) are reserved for keyboard. keyboard reserves the full
range 0x60-0x6F while it doesn't need to.

Reserve only ports 0x60 and 0x64 for the legacy PS/2 i8042 keyboad
controller instead of 0x60-0x6F to allow the openipmi driver to work.

[ tglx: added 64bit fixup ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-10 19:31:45 +02:00
Suresh Siddha
fd3c3ed5d1 x86: fix fpu restore from sig return
If the task never used fpu, initialize the fpu before restoring the FP
state from the signal handler context. This will allocate the fpu
state, if the task never needed it before.

Reported-and-bisected-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Eric Sesterhenn <snakebyte@gmx.de>
Cc: Frederik Deweerdt <deweerdt@free.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-10 19:31:45 +02:00
Yinghai Lu
0646153921 x86: remove spew print out about bus to node mapping
Jeff Garzik pointed out that this printout is not needed.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-10 19:31:45 +02:00
Thomas Gleixner
5ecddcebfb x86: revert printk format warning change which is for linux-next
commit 62179849b4
    x86: fix setup printk format warning

is for linux-next and not for .26

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-10 19:31:44 +02:00
Linus Torvalds
8d53910856 Revert "PCI: remove default PCI expansion ROM memory allocation"
This reverts commit 9f8daccaa0, which was
reported to break X startup (xf86-video-ati-6.8.0). See

	http://bugs.freedesktop.org/show_bug.cgi?id=15523

for details.

Reported-by: Laurence Withers <l@lwithers.me.uk>
Cc: Gary Hade <garyhade@us.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-08 19:02:55 -07:00
Linus Torvalds
f589274533 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  [ALSA] soc at91 minor bug fixes
  [ALSA] soc - at91-pcm - Fix line wrapping
  pcspkr: fix dependancies
2008-05-08 10:58:45 -07:00
Thomas Gleixner
8d4a430085 x86: cleanup PAT cpu validation
Move the scattered checks for PAT support to a single function. Its
moved to addon_cpuid_features.c as this file is shared between 32 and
64 bit.

Remove the manipulation of the PAT feature bit and just disable PAT in
the PAT layer, based on the PAT bit provided by the CPU and the
current CPU version/model white list.

Change the boot CPU check so it works on Voyager somewhere in the
future as well :) Also panic, when a secondary has PAT disabled but
the primary one has alrady switched to PAT. We have no way to undo
that.

The white list is kept for now to ensure that we can rely on known to
work CPU types and concentrate on the software induced problems
instead of fighthing CPU erratas and subtle wreckage caused by not yet
verified CPUs. Once the PAT code has stabilized enough, we can remove
the white list and open the can of worms.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-08 15:43:51 +02:00
Andres Salomon
547acec7ec x86: GEODE: cache results from geode_has_vsa2() and uninline
This moves geode_has_vsa2 into a .c file, caches the result we get from
the VSA virtual registers, and causes the function to no longer be inline.

[akpm@linux-foundation.org: cleanup]

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-08 15:43:50 +02:00
Thomas Gleixner
ac44cc96fb x86: revert geode config dependency
commit e26a28d190
    x86: olpc build fix

was a fix to a patch that was withdrawn/delayed and then erroneously
commited to x86.git. Revert it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-08 15:43:50 +02:00
Stas Sergeev
e5e1d3cb20 pcspkr: fix dependancies
fix pcspkr dependancies: make the pcspkr platform
drivers to depend on a platform device, and
not the other way around.

Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
CC: Vojtech Pavlik <vojtech@suse.cz>
CC: Michael Opdenacker <michael-lists@free-electrons.com>
[fixed for 2.6.26-rc1 by tiwai]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2008-05-07 12:42:03 +02:00
Hugh Dickins
aeed5fce37 x86: fix PAE pmd_bad bootup warning
Fix warning from pmd_bad() at bootup on a HIGHMEM64G HIGHPTE x86_32.

That came from 9fc34113f6 x86: debug pmd_bad();
but we understand now that the typecasting was wrong for PAE in the previous
version: pagetable pages above 4GB looked bad and stopped Arjan from booting.

And revert that cded932b75 x86: fix pmd_bad
and pud_bad to support huge pages.  It was the wrong way round: we shouldn't
weaken every pmd_bad and pud_bad check to let huge pages slip through - in
part they check that we _don't_ have a huge page where it's not expected.

Put the x86 pmd_bad() and pud_bad() definitions back to what they have long
been: they can be improved (x86_32 should use PTE_MASK, to stop PAE thinking
junk in the upper word is good; and x86_64 should follow x86_32's stricter
comparison, to stop thinking any subset of required bits is good); but that
should be a later patch.

Fix Hans' good observation that follow_page() will never find pmd_huge()
because that would have already failed the pmd_bad test: test pmd_huge in
between the pmd_none and pmd_bad tests.  Tighten x86's pmd_huge() check?
No, once it's a hugepage entry, it can get quite far from a good pmd: for
example, PROT_NONE leaves it with only ACCESSED of the KERN_PGTABLE bits.

However... though follow_page() contains this and another test for huge
pages, so it's nice to keep it working on them, where does it actually get
called on a huge page?  get_user_pages() checks is_vm_hugetlb_page(vma) to
to call alternative hugetlb processing, as does unmap_vmas() and others.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Earlier-version-tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeff Chua <jeff.chua.linux@gmail.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-06 13:08:58 -07:00
Linus Torvalds
bb896afe20 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes:
  sched: default to n for GROUP_SCHED and FAIR_GROUP_SCHED
  sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
  sched, x86: add HAVE_UNSTABLE_SCHED_CLOCK
  sched: fix cpu clock
  sched: fair-group: fix a Div0 error of the fair group scheduler
  sched: fix missing locking in sched_domains code
  sched: make clock sync tunable by architecture code
  sched: fix debugging
  sched: fix sched_info_switch not being called according to documentation
  sched: fix hrtick_start_fair and CPU-Hotplug
  sched: fix SCHED_FAIR wake-idle logic error
  sched: fix RT task-wakeup logic
  sched: add statics, don't return void expressions
  sched: add debug checks to idle functions
  sched: remove old sched doc
  sched: make rt_sched_class, idle_sched_class static
  sched: optimize calc_delta_mine()
  sched: fix normalized sleeper
2008-05-05 17:31:14 -07:00
Ingo Molnar
a5574cf65b sched, x86: add HAVE_UNSTABLE_SCHED_CLOCK
add the HAVE_UNSTABLE_SCHED_CLOCK, for architectures to select.

the next change utilizes it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-05 23:56:18 +02:00
Linus Torvalds
108c196184 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86 PCI: call dmi_check_pciprobe()
  x86/pci: add pci=skip_isa_align command lines.
  x86/pci: remove flag in pci_cfg_space_size_ext
  x86: fix section mismatch in pci_scan_bus
2008-05-05 12:39:10 -07:00
Yinghai Lu
0df18ff366 x86 PCI: call dmi_check_pciprobe()
this change:

| commit 08f1c192c3
| Author: Muli Ben-Yehuda <muli@il.ibm.com>
| Date:   Sun Jul 22 00:23:39 2007 +0300
|
|    x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata
|
|    This patch introduces struct pci_sysdata to x86 and x86-64, and
|    converts the existing two users (NUMA, Calgary) to use it.
|
|    This lays the groundwork for having other users of sysdata, such as
|    the PCI domains work.
|
|    The Calgary bits are tested, the NUMA bits just look ok.

replaces pcibios_scan_root with pci_scan_bus_parented...

but in pcibios_scan_root we have a DMI check:

    dmi_check_system(pciprobe_dmi_table);

when when have several peer root buses this could be called multiple
times (which is bad), so move that call to pci_access_init().

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-05-05 09:24:00 -07:00
Yinghai Lu
13a6ddb08e x86/pci: add pci=skip_isa_align command lines.
so we don't align the io port start address for pci cards.

also move out dmi check out acpi.c, because it has nothing to do with acpi.
it could spare some calling when we have several peer root buses.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-05-05 09:22:08 -07:00
Linus Torvalds
45ea2103d8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes:
  x86: fix setup printk format warning
  x86: olpc build fix
  x86: video/fbdev.c: add MODULE_LICENSE
  x86: fix up bootparam.h for userspace inclusion
  x86: relocs ELF handling - use SELFMAG instead of numeric constant
  x86: vdso ELF handling - use SELFMAG instead of numeric constant
  x86: remove dell reboot dmi quirk board name match
  x86: es7000 build fix
  x86: make additional_cpus static
  x86: make start_secondary() static
  kbuild, suspend, x86: fix rebuild of wakeup.bin
  uml: fix gcc problem
  x86: undo visws/numaq build changes
2008-05-04 17:11:43 -07:00
Randy Dunlap
62179849b4 x86: fix setup printk format warning
Fix x86 setup printk format warming:

next-20080430/arch/x86/kernel/setup.c:172: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'ssize_t'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-04 20:04:46 +02:00
Thomas Gleixner
e26a28d190 x86: olpc build fix
CONFIG_OLPC needs to depend on MGEODE_LX

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-04 20:04:46 +02:00
Adrian Bunk
7b04fa014c x86: video/fbdev.c: add MODULE_LICENSE
Add the missing MODULE_LICENSE("GPL").

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-04 20:04:46 +02:00
Cyrill Gorcunov
8bd1796ded x86: relocs ELF handling - use SELFMAG instead of numeric constant
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: akpm@linux-foundation.org
Cc: hpa@zytor.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-04 20:04:45 +02:00
Cyrill Gorcunov
ecb783eae1 x86: vdso ELF handling - use SELFMAG instead of numeric constant
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: akpm@linux-foundation.org
Cc: hpa@zytor.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-04 20:04:45 +02:00
Ben
163ea310b6 x86: remove dell reboot dmi quirk board name match
http://bugzilla.kernel.org/show_bug.cgi?id=10547

Newer Dell OptiPlex 745s hang before rebooting after 'sudo reboot'.

A patch for some versions of the OptiPlex was proposed here --
http://lkml.org/lkml/2007/6/5/59 -- and is included in 2.6.23 and
later kernels, according to
http://lxr.linux.no/linux+v2.6.23/arch/i386/kernel/reboot.c . However,
the DMI_BOARD_NAME ("0WF810") is too restrictive. Newer OptiPlex
machines have a DMI_BOARD_NAME of "0RF703".  I therefore suggest
adding another clause to reboot.c, similar to the one in the original
patch, but matching a DMI_BOARD_NAME of "0RF703".

On further inspection, it seems that there are other DMI_BOARD_NAMEs
for this same machine. They seem to change from time to time, which
means that the current code is fragile. Moreover, using bios reboot
should not break non-SFF OptiPlex 745s, and so a reasonable fix is to
simply drop the match on DMI_BOARD_NAME.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-04 20:04:45 +02:00
Ingo Molnar
e37ee42caa x86: es7000 build fix
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-04 20:04:45 +02:00
Adrian Bunk
c5562faeaa x86: make additional_cpus static
This patch makes the needlessly global additional_cpus static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-04 20:04:45 +02:00
Adrian Bunk
dbe55f4797 x86: make start_secondary() static
start_secondary() needlessly became global.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-04 20:04:45 +02:00
Sam Ravnborg
4c6214c75a kbuild, suspend, x86: fix rebuild of wakeup.bin
In kernel/acpi/realmode/Makefile use the 'always'
variable to say that wakeup.bin should always
be made.

In acpi/Makefile we then do not need to specify the
requested target and we avoid the message from make:

   `arch/x86/kernel/acpi/realmode/wakeup.bin' is up to date.

Add wakeup.lds to list af targets to avoid rebuilding
wakeup.bin - from Roland McGrath.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@suse.cz>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-04 20:04:45 +02:00
Thomas Gleixner
48b83d2425 x86: undo visws/numaq build changes
arch/x86/pci/Makefile_32 has a nasty detail. VISWS and NUMAQ build
override the generic pci-y rules. This needs a proper cleanup, but
that needs more thoughts. Undo

commit 895d30935e
    x86: numaq fix
    do not override the existing pci-y rule when adding visws or
    numaq rules.

There is also a stupid init function ordering problem vs. acpi.o

Add comments to the Makefile to avoid tripping over this again.

Remove the srat stub code in discontig_32.c to allow a proper NUMAQ
build.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-04 20:04:45 +02:00
Glauber Costa
b8ba5f10c5 x86: KVM geust: make setup_secondary_clock definition dependent on local apic
Since the pv_apic_ops are only present if CONFIG_X86_LOCAL_APIC is compiled
in, kvmclock failed to build without this option.  This patch fixes this.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:45:12 +03:00
Avi Kivity
93df766322 KVM: MMU: Allow more than PAGES_PER_HPAGE write protections per large page
nonpae guests can call rmap_write_protect twice per page (for page tables)
or four times per page (for page directories), triggering a bogus warning.

Remove the warning.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:49 +03:00
Andrea Arcangeli
bc1a34f1bf KVM: avoid fx_init() schedule in atomic
This make sure not to schedule in atomic during fx_init. I also
changed the name of fpu_init to fx_finit to avoid duplicating the name
with fpu_init that is already used in the kernel, this makes grep
simpler if nothing else.

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:48 +03:00
Jan Kiszka
b4f14abd95 KVM: Avoid spurious execeptions after setting registers
Clear pending exceptions when setting new register values. This avoids
spurious exceptions after restoring a vcpu state or after
reset-on-triple-fault.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:47 +03:00
Marcelo Tosatti
ece15babfa KVM: PIT: support mode 4
The in-kernel PIT emulation ignores pending timers if operating under
mode 4, which for example DragonFlyBSD uses (and Plan9 too, apparently).

Mode 4 seems to be similar to one-shot mode, other than the fact that it
starts counting after the next CLK pulse once programmed, while mode 1
starts counting immediately, so add a FIXME to enhance precision.

Fixes sourceforge bug 1952988.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:46 +03:00
Avi Kivity
dc7457ea52 KVM: x86 emulator: disable writeback on lmsw
The recent changes allowing memory operands with lmsw and smsw left
lmsw with writeback enabled.  Since lmsw has no oridinary destination
operand, the dst pointer was not initialized, resulting in an oops.

Close the hole by disabling writeback for lmsw.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:45 +03:00
Izik Eidus
3fe913e7c5 KVM: x86: task switch: fix wrong bit setting for the busy flag
The busy bit is bit 1 of the type field, not bit 8.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:43 +03:00
Sheng Yang
1439442c7b KVM: VMX: Enable EPT feature for KVM
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:42 +03:00
Sheng Yang
b7ebfb0509 KVM: VMX: Prepare an identity page table for EPT in real mode
[aliguory: plug leak]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:41 +03:00
Sheng Yang
1ac593c97e KVM: MMU: Remove #ifdef CONFIG_X86_64 to support 4 level EPT
Currently EPT level is 4 for both pae and x86_64. The patch remove the #ifdef
for alloc root_hpa and free root_hpa to support EPT.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:39 +03:00
Sheng Yang
7b52345e2c KVM: MMU: Add EPT support
Enable kvm_set_spte() to generate EPT entries.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:38 +03:00
Sheng Yang
67253af52e KVM: Add kvm_x86_ops get_tdp_level()
The function get_tdp_level() provided the number of tdp level for EPT and
NPT rather than the NPT specific macro.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:34 +03:00
Sheng Yang
8c6d6adc6b KVM: MMU: Move some definitions to a header file
Move some definitions to mmu.h in order to allow building common table
entries between EPT and non-EPT.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 12:26:38 +03:00
Sheng Yang
d56f546db9 KVM: VMX: EPT Feature Detection
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 12:26:38 +03:00
Ulrich Drepper
d35c7b0e54 unified (weak) sys_pipe implementation
This replaces the duplicated arch-specific versions of "sys_pipe()" with
one unified implementation.  This removes almost 250 lines of duplicated
code.

It's marked __weak, so that *if* an architecture wants to override the
default implementation it can do so by simply having its own replacement
version, since many architectures use alternate calling conventions for
the 'pipe()' system call for legacy reasons (ie traditional UNIX
implementations often return the two file descriptors in registers)

I still haven't changed the cris version even though Linus says the BKL
isn't needed.  The arch maintainer can easily do it if there are really
no obstacles.

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-03 13:50:33 -07:00
Roman Zippel
6f6d6a1a6a rename div64_64 to div64_u64
Rename div64_64 to div64_u64 to make it consistent with the other divide
functions, so it clearly includes the type of the divide.  Move its definition
to math64.h as currently no architecture overrides the generic implementation.
 They can still override it of course, but the duplicated declarations are
avoided.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:58 -07:00
Linus Torvalds
6d98ca7364 x86: Mark OPTIMIZE_INLINING broken
So Ingo finally did figure out why UML broke with this option: UML
passes gcc the -fno-unit-at-a-time flag, and apparently that wreaks
havoc with gcc's inlining.

We could turn off -fno-unit-at-a-time for UML for gcc4+ (which is what
x86 does), but there's bad blood about this whole option, and it does
show that the thing is just fragile as heck.

So let tempers cool, and disable the thing, and we can revisit the
decision later.

Cc: Adrian Bunk <bunk@kernel.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 20:07:22 -07:00
Ingo Molnar
895d30935e x86: numaq fix
do not override the existing pci-y rule when adding visws or
numaq rules.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-30 23:15:35 +02:00
Ingo Molnar
6b8e1c7ec4 x86: 8K stacks by default
Switch back to 8K stacks as the safer default. Out-of-memory
situations are less problematic than silent and hard to debug
stack corruption.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:35 +02:00
Andres Salomon
cb8ab687c3 x86: ioremap ram check fix
bdd3cee2e4 (x86: ioremap(), extend check
to all RAM pages) breaks OLPC's ioremap call.  The ioremap that OLPC uses is:

        romsig = ioremap(0xffffffc0, 16);

The commit that breaks it is basically:

-       for (pfn = phys_addr >> PAGE_SHIFT; pfn < max_pfn_mapped &&
-            (pfn << PAGE_SHIFT) < last_addr; pfn++) {
+       for (pfn = phys_addr >> PAGE_SHIFT;
+                               (pfn << PAGE_SHIFT) < last_addr; pfn++) {
+

Previously, the 'pfn < max_pfn_mapped' check would've caused us to not
enter the loop.  Removing that check means we loop infinitely.  The
reason for that is because pfn is 0xfffff, and last_addr is 0xffffffcf.
The remaining check that is used to exit the loop is not sufficient;
when pfn<<PAGE_SHIFT is 0xfffff000, that is less than 0xffffffcf; when
we increment pfn and it overflows (pfn == 0x100000), pfn<<PAGE_SHIFT
ends up being 0.  That, of course, is less than last_addr.  In effect,
pfn<<PAGE_SHIFT is never lower than last_addr.

The simple fix for this is to limit the last_addr check to the PAGE_MASK;
a patch is below.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:35 +02:00
Ingo Molnar
5de8f68b43 x86: optimize inlining off
default to inline optimizing off.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:35 +02:00
Ingo Molnar
acbaa93e3d x86: CONFIG_X86_ELAN fix
move the X86_CPU section out of the !X86_ELAN branch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:35 +02:00
Ingo Molnar
c9af1e3323 x86: Kconfig fix
Andrew noticed that OPTIMIZE_INLINING appeared in the toplevel
menu - fix it.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:35 +02:00
Suresh Siddha
de33c442ed x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()
Use UC_MINUS for ioremap(), ioremap_nocache() instead of strong UC.
Once all the X drivers move to ioremap_wc(), we can go back to strong
UC semantics for ioremap() and ioremap_nocache().

To avoid attribute aliasing issues, pci_mmap_page_range() will also
use UC_MINUS for default non write-combining mapping request.

Next steps:
	a) change all the video drivers using ioremap() or ioremap_nocache()
	   and adding WC MTTR using mttr_add() to ioremap_wc()

	b) for strict usage, we can go back to strong uc semantics
	   for ioremap() and ioremap_nocache() after some grace period for
	   completing step-a.

	c) user level X server needs to use the appropriate method for setting
	   up WC mapping (like using resourceX_wc sysfs file instead of
	   adding MTRR for WC and using /dev/mem or resourceX under /sys)

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:35 +02:00
Sam Ravnborg
b9b39bfba5 x86: use defconfigs from x86/configs/*
Daniel Drake <dsd@gentoo.org> reported:

In 2.6.23, if you unpacked a kernel source tarball and then
ran "make menuconfig" you'd be presented with this message:
    # using defaults found in arch/i386/defconfig

and the default options would be set.

The same thing in 2.6.24 does not give you any "using defaults" message, and
the default config options within menuconfig are rather blank (e.g. no PCI
support). You can work around this by explicitly running "make defconfig"
before menuconfig, but it would be nice to have the behaviour the way it was
for 2.6.23 (and the way it still is for other archs).

Fixed by adding a x86 specific defconfig list to Kconfig.

Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=10470
Tested-by: dsd@gentoo.org
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Ingo Molnar
2544a873ab revert: "x86: ioremap(), extend check to all RAM pages"
Vegard Nossum reported a large (150 seconds) boot delay during bootup,
and bisected it to "x86: ioremap(), extend check to all RAM pages"
(commit bdd3cee2e4). Revert this commit for now.

Bisected-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Jeremy Fitzhardinge
a4c863f497 x86: don't bother printing compat vdso address
The kernel prints the compat vdso address regardless of whether compat
vdso mode is enabled or not, which is confusing.  Given that this
isn't very interesting information anyway, just remove the printk.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Gerhard Mack <gmack@innerfire.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Andi Kleen
f6c133f7d5 fix: x86: support for new UV apic
Don't warn in read_apic_id() when preemptible but only one CPU online.

Signed-off-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Vegard Nossum
575ca7351b x86: fix early-BUG message
The .asciz directive takes any number of strings, but each one is zero-
terminated, and string pasting is not done as in C. That results in only the
first line being output.

Replace .asciz with multiple .ascii directives and terminate with .asciz.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Dmitri Vorobiev
b4cdc4300d x86: iommu_sac_force can become static
The iommu_sac_force variable is needlessly defined global,
and this patch makes it static. Additionally, this variable
needs not be explicitly initialized.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Dmitri Vorobiev
4412620fc2 x86: add proper header for reboot_force
This patch fixes one sparse warning by including the appropriate
header for the reboot_force symbol.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Ingo Molnar
3e8f7e35f3 x86 VISWS: build fix
the 'reboot_force' flag is a notion that non-PC subarchitectures do
not have.

also, unify the X86_BIOS_REBOOT option between 32-bit and 64-bit
and get rid of a few unnecessary Kconfig and Makefile complications
that way.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Ingo Molnar
ed5e233284 x86, voyager: fix ioremap_nocache()
James Bottomley reported that the following commit:

| commit 6371b49599
| Author: Ingo Molnar <mingo@elte.hu>
| Date:   Wed Jan 30 13:33:40 2008 +0100
|
|     x86: change ioremap() to default to uncached

broke Voyager.

James says:

" it broke a class of voyager machines: those which
  rely on the quad interrupt controller (QIC).  The precis of why they
  broke is because the QIC does IPIs (or CPIs in its terminology) via
  cache line interference: you interrupt a processor by moving a
  designated memory area to write exclusive in the cache (by simply
  writing to the line) and the CPU acks the interrupt by moving it back to
  read shared (by reading from it).  That area, is, of course, mapped by
  ioremap, so reversing the ioremap semantics and adding the uncached bit
  completely breaks the QIC. "

Sorry about that!

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Ingo Molnar
fc3fbc4509 hpet: fix
Al Viro pointed out that there's a missing readl() of timer->hpet_config,
found by Sparse.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Adrian Bunk
b9e017e04b x86: unexport kmap_atomic_to_page
This patch removes the no longer used export of kmap_atomic_to_page.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-30 23:15:34 +02:00
Linus Torvalds
08acd4f8af Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (179 commits)
  ACPI: Fix acpi_processor_idle and idle= boot parameters interaction
  acpi: fix section mismatch warning in pnpacpi
  intel_menlo: fix build warning
  ACPI: Cleanup: Remove unneeded, multiple local dummy variables
  ACPI: video - fix permissions on some proc entries
  ACPI: video - properly handle errors when registering proc elements
  ACPI: video - do not store invalid entries in attached_array list
  ACPI: re-name acpi_pm_ops to acpi_suspend_ops
  ACER_WMI/ASUS_LAPTOP: fix build bug
  thinkpad_acpi: fix possible NULL pointer dereference if kstrdup failed
  ACPI: check a return value correctly in acpi_power_get_context()
  #if 0 acpi/bay.c:eject_removable_drive()
  eeepc-laptop: add hwmon fan control
  eeepc-laptop: add backlight
  eeepc-laptop: add base driver
  ACPI: thinkpad-acpi: bump up version to 0.20
  ACPI: thinkpad-acpi: fix selects in Kconfig
  ACPI: thinkpad-acpi: use a private workqueue
  ACPI: thinkpad-acpi: fluff really minor fix
  ACPI: thinkpad-acpi: use uppercase for "LED" on user documentation
  ...

Fixed conflicts in drivers/acpi/video.c and drivers/misc/intel_menlow.c
manually.
2008-04-30 11:52:52 -07:00
Len Brown
96916090f4 Merge branches 'release', 'acpica', 'bugzilla-10224', 'bugzilla-9772', 'bugzilla-9916', 'ec', 'eeepc', 'idle', 'misc', 'pm-legacy', 'sysfs-links-2.6.26', 'thermal', 'thinkpad' and 'video' into release 2008-04-30 13:58:00 -04:00
Roland McGrath
5a8da0ea82 signals: x86 TS_RESTORE_SIGMASK
Replace TIF_RESTORE_SIGMASK with TS_RESTORE_SIGMASK and define our own
set_restore_sigmask() function.  This saves the costly SMP-safe set_bit
operation, which we do not need for the sigmask flag since TIF_SIGPENDING
always has to be set too.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Yinghai Lu
70b9f7dc14 x86/pci: remove flag in pci_cfg_space_size_ext
so let pci_cfg_space_size call it directly without flag.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-04-29 15:34:05 -07:00
Sam Ravnborg
98db6f193c x86: fix section mismatch in pci_scan_bus
Fix following section mismatch warning:
WARNING: vmlinux.o(.text+0x275616): Section mismatch in reference from the function pci_scan_bus() to the function .devinit.text:pci_scan_bus_parented()

The warning was seen with a CONFIG_DEBUG_SECTION_MISMATCH=y build.
The inline function pci_scan_bus refer to functions annotated
__devinit - so annotate it __devinit too.
This revealed a few x86 specific functions that were only
used from __init or __devinit context.
So annotate these __devinit and the warning was killed.

The added include in pci.h was not strictly required but
added to avoid being dependent on indirect includes.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Jesse Barnes <jbarnes@hobbes.lan>
2008-04-29 13:41:59 -07:00
Linus Torvalds
1f43c53930 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes:
  x86: fix PCI MSI breaks when booting with nosmp
  x86: vget_cycles() __always_inline
  x86: add more boot protocol documentation
  bootprotocol: cleanup
  x86: fix warning in "x86: clean up vSMP detection"
  x86: !x & y typo in mtrr code
2008-04-29 09:03:19 -07:00
Linus Torvalds
5f78e4d339 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-pci
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-pci:
  x86: add pci=check_enable_amd_mmconf and dmi check
  x86: work around io allocation overlap of HT links
  acpi: get boot_cpu_id as early for k8_scan_nodes
  x86_64: don't need set default res if only have one root bus
  x86: double check the multi root bus with fam10h mmconf
  x86: multi pci root bus with different io resource range, on 64-bit
  x86: use bus conf in NB conf fun1 to get bus range on, on 64-bit
  x86: get mp_bus_to_node early
  x86 pci: remove checking type for mmconfig probe
  x86: remove unneeded check in mmconf reject
  driver core: try parent numa_node at first before using default
  x86: seperate mmconf for fam10h out from setup_64.c
  x86: if acpi=off, force setting the mmconf for fam10h
  x86_64: check MSR to get MMCONFIG for AMD Family 10h
  x86_64: check and enable MMCONFIG for AMD Family 10h
  x86_64: set cfg_size for AMD Family 10h in case MMCONFIG
  x86: mmconf enable mcfg early
  x86: clear pci_mmcfg_virt when mmcfg get rejected
  x86: validate against acpi motherboard resources

Fixed up fairly trivial conflicts in arch/x86/pci/{init.c,pci.h} due to
OLPC support manually.
2008-04-29 08:26:51 -07:00
Linus Torvalds
44473d9913 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] state info wrong after resume
  [CPUFREQ] allow use of the powersave governor as the default one
  [CPUFREQ] document the currently undocumented parts of the sysfs interface
  [CPUFREQ] expose cpufreq coordination requirements regardless of coordination mechanism
2008-04-29 08:18:49 -07:00
Christoph Lameter
66916cd267 x86: use kbuild.h
Drop the macro definitions in asm-offsets_*.c and use kbuild.h

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:29 -07:00
Tim Gardner
8c4dd60682 edd: add default mode CONFIG_EDD_OFF=n, override with edd={on,off}
Add a kernel parameter option to 'edd' to enable/disable BIOS Enhanced Disk
Drive Services.  CONFIG_EDD_OFF disables EDD while still compiling EDD into
the kernel.  Default behavior can be forced using 'edd=on' or 'edd=off' as
a kernel parameter.

[akpm@linux-foundation.org: fix kernel-parameters.txt]
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:23 -07:00
Alexey Dobriyan
c74c120a21 proc: remove proc_root from drivers
Remove proc_root export.  Creation and removal works well if parent PDE is
supplied as NULL -- it worked always that way.

So, one useless export removed and consistency added, some drivers created
PDEs with &proc_root as parent but removed them as NULL and so on.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Andres Salomon
3ef0e1f8ca x86: olpc: add One Laptop Per Child architecture support
This adds support for OLPC XO hardware.  Open Firmware on XOs don't contain
the VSA, so it is necessary to emulate the PCI BARs in the kernel.  This also
adds functionality for running EC commands, and a CONFIG_OLPC.

A number of OLPC drivers depend upon CONFIG_OLPC.

olpc_ec_timeout is a hack to work around Embedded Controller bugs.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: geode_has_vsa build fix]
[akpm@linux-foundation.org: olpc_register_battery_callback doesn't exist]
Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:07 -07:00
FUJITA Tomonori
a852250920 swiotlb: use iommu_is_span_boundary helper function
iommu_is_span_boundary in lib/iommu-helper.c was exported for PARISC IOMMUs
(commit 3715863aa1).  SWIOTLB can use it instead
of the homegrown function.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:05 -07:00
Adrian Bunk
7d195a5409 proper extern for late_time_init
Add a proper extern for late_time_init in include/linux/init.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:03 -07:00
Adrian Bunk
eb0f1c442d proper __do_softirq() prototype
Add a proper prototype for __do_softirq() in include/linux/interrupt.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:02 -07:00
Jesse Barnes
e90955c26d x86: fix PCI MSI breaks when booting with nosmp
set up sane APIC state even in the nosmp case.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-29 13:45:24 +02:00
Ingo Molnar
781fe2ebc0 bootprotocol: cleanup
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-29 13:45:24 +02:00
Alexander van Heukelum
8008abbd87 x86: fix warning in "x86: clean up vSMP detection"
The function detect_vsmp_box is a void function in the PCI case.
Change the !PCI stub to void too.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-29 13:45:24 +02:00
Harvey Harrison
e686d34156 x86: !x & y typo in mtrr code
As written, this can never be true.

Spotted by the Sparse checker.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-29 13:45:24 +02:00
Roland McGrath
d9dedc1385 x86_64 vDSO: use initdata
The 64-bit vDSO image is in a special ".vdso" section for no reason
I can determine.  Furthermore, the location of the vdso_end symbol
includes some wrongly-calculated padding space in the image, which
is then (correctly) rounded to page size, resulting in an extra page
of zeros in the image mapped in to user processes.

This changes it to put the vdso.so image into normal initdata as we
have always done for the 32-bit vDSO images.  The extra padding is
gone, so the user VMA is one page instead of two.  The image that
was already copied around at boot time is now in initdata, so we
recover that wasted space after boot.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 13:49:35 -07:00
Darrick J. Wong
e8628dd06d [CPUFREQ] expose cpufreq coordination requirements regardless of coordination mechanism
Currently, affected_cpus shows which CPUs need to have their frequency
coordinated in software.  When hardware coordination is in use, the contents
of this file appear the same as when no coordination is required.  This can
lead to some confusion among user-space programs, for example, that do not
know that extra coordination is required to force a CPU core to a particular
speed to control power consumption.

To fix this, create a "related_cpus" attribute that always displays the
coordination map regardless of whatever coordination strategy the cpufreq
driver uses (sw or hw).  If the cpufreq driver does not provide a value, fall
back to policy->cpus.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-04-28 16:27:08 -04:00
Venkatesh Pallipadi
e56a727b02 [CPUFREQ] Make acpi-cpufreq more robust against BIOS freq changes behind our back.
We checked the hardware freq with OS cached freq value in get_cur_freqon_cpu().

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-04-28 15:16:46 -04:00
PJ Waskiewicz
9d9ad4b51d x86: Fix 32-bit MSI-X allocation leakage
This bug was introduced in the 2.6.24 i386/x86_64 tree merge, where
MSI-X vector allocation will eventually fail.  The cause is the new
bit array tracking used vectors is not getting cleared properly on
IRQ destruction on the 32-bit APIC code.

This can be seen easily using the ixgbe 10 GbE driver on multi-core
systems by simply loading and unloading the driver a few times.
Depending on the number of available vectors on the host system, the
MSI-X allocation will eventually fail, and the driver will only be
able to use legacy interrupts.

I am generating the same patch for both stable trees for 2.6.24 and
2.6.25.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 10:49:17 -07:00
Andres Salomon
32bf87e369 x86: geode: MSR cleanup
This cleans up a few MSR-using drivers in the following manner:
  - Ensures MSRs are all defined in asm/geode.h, rather than in misc
    places
  - Makes the naming consistent; cs553[56] ones begin with MSR_,
    GX-specific ones start with MSR_GX_, and LX-specific ones start
    with MSR_LX_.  Also, make the names match the data sheet.
  - Use MSR names rather than numbers in source code
  - Document the fact that the LX's MSR_PADSEL has the wrong value
    in the data sheet.  That's, uh, good to note.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:35 -07:00
Thomas Petazzoni
7ae9392c0a x86: configurable DMI scanning code
Turn CONFIG_DMI into a selectable option if EMBEDDED is defined, in
order to be able to remove the DMI table scanning code if it's not
needed, and then reduce the kernel code size.

With CONFIG_DMI (i.e before) :

   text    data     bss     dec     hex filename
1076076  128656   98304 1303036  13e1fc vmlinux

Without CONFIG_DMI (i.e after) :

   text    data     bss     dec     hex filename
1068092  126308   98304 1292704  13b9a0 vmlinux

Result:

   text    data     bss     dec     hex filename
  -7984   -2348       0  -10332   -285c vmlinux

The new option appears in "Processor type and features", only when
CONFIG_EMBEDDED is defined.

This patch is part of the Linux Tiny project, and is based on previous work
done by Matt Mackall <mpm@selenic.com>.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Anvin" <hpa@zytor.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:30 -07:00
Christoph Lameter
d60cd46bbd pageflags: use proper page flag functions in Xen
Xen uses bitops to manipulate page flags.  Make it use proper page flag
functions.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:22 -07:00
Christoph Lameter
2301696932 vmallocinfo: add caller information
Add caller information so that /proc/vmallocinfo shows where the allocation
request for a slice of vmalloc memory originated.

Results in output like this:

0xffffc20000000000-0xffffc20000801000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages
0xffffc20000801000-0xffffc20000806000   20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
0xffffc20000806000-0xffffc20000c07000 4198400 alloc_large_system_hash+0x127/0x246 pages=1024 vmalloc vpages
0xffffc20000c07000-0xffffc20000c0a000   12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc
0xffffc20000c0a000-0xffffc20000c0c000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c0c000-0xffffc20000c0f000   12288 acpi_os_map_memory+0x13/0x1c phys=cff64000 ioremap
0xffffc20000c10000-0xffffc20000c15000   20480 acpi_os_map_memory+0x13/0x1c phys=cff65000 ioremap
0xffffc20000c16000-0xffffc20000c18000    8192 acpi_os_map_memory+0x13/0x1c phys=cff69000 ioremap
0xffffc20000c18000-0xffffc20000c1a000    8192 acpi_os_map_memory+0x13/0x1c phys=fed1f000 ioremap
0xffffc20000c1a000-0xffffc20000c1c000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c1c000-0xffffc20000c1e000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c1e000-0xffffc20000c20000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c20000-0xffffc20000c22000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c22000-0xffffc20000c24000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c24000-0xffffc20000c26000    8192 acpi_os_map_memory+0x13/0x1c phys=e0081000 ioremap
0xffffc20000c26000-0xffffc20000c28000    8192 acpi_os_map_memory+0x13/0x1c phys=e0080000 ioremap
0xffffc20000c28000-0xffffc20000c2d000   20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
0xffffc20000c2d000-0xffffc20000c31000   16384 tcp_init+0xd5/0x31c pages=3 vmalloc
0xffffc20000c31000-0xffffc20000c34000   12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc
0xffffc20000c34000-0xffffc20000c36000    8192 init_vdso_vars+0xde/0x1f1
0xffffc20000c36000-0xffffc20000c38000    8192 pci_iomap+0x8a/0xb4 phys=d8e00000 ioremap
0xffffc20000c38000-0xffffc20000c3a000    8192 usb_hcd_pci_probe+0x139/0x295 [usbcore] phys=d8e00000 ioremap
0xffffc20000c3a000-0xffffc20000c3e000   16384 sys_swapon+0x509/0xa15 pages=3 vmalloc
0xffffc20000c40000-0xffffc20000c61000  135168 e1000_probe+0x1c4/0xa32 phys=d8a20000 ioremap
0xffffc20000c61000-0xffffc20000c6a000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20000c6a000-0xffffc20000c73000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20000c73000-0xffffc20000c7c000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20000c7c000-0xffffc20000c7f000   12288 e1000e_setup_tx_resources+0x29/0xbe pages=2 vmalloc
0xffffc20000c80000-0xffffc20001481000 8392704 pci_mmcfg_arch_init+0x90/0x118 phys=e0000000 ioremap
0xffffc20001481000-0xffffc20001682000 2101248 alloc_large_system_hash+0x127/0x246 pages=512 vmalloc
0xffffc20001682000-0xffffc20001e83000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages
0xffffc20001e83000-0xffffc20002204000 3674112 alloc_large_system_hash+0x127/0x246 pages=896 vmalloc vpages
0xffffc20002204000-0xffffc2000220d000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc2000220d000-0xffffc20002216000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20002216000-0xffffc2000221f000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc2000221f000-0xffffc20002228000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20002228000-0xffffc20002231000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
0xffffc20002231000-0xffffc20002234000   12288 e1000e_setup_rx_resources+0x35/0x122 pages=2 vmalloc
0xffffc20002240000-0xffffc20002261000  135168 e1000_probe+0x1c4/0xa32 phys=d8a60000 ioremap
0xffffc20002261000-0xffffc2000270c000 4894720 sys_swapon+0x509/0xa15 pages=1194 vmalloc vpages
0xffffffffa0000000-0xffffffffa0022000  139264 module_alloc+0x4f/0x55 pages=33 vmalloc
0xffffffffa0022000-0xffffffffa0029000   28672 module_alloc+0x4f/0x55 pages=6 vmalloc
0xffffffffa002b000-0xffffffffa0034000   36864 module_alloc+0x4f/0x55 pages=8 vmalloc
0xffffffffa0034000-0xffffffffa003d000   36864 module_alloc+0x4f/0x55 pages=8 vmalloc
0xffffffffa003d000-0xffffffffa0049000   49152 module_alloc+0x4f/0x55 pages=11 vmalloc
0xffffffffa0049000-0xffffffffa0050000   28672 module_alloc+0x4f/0x55 pages=6 vmalloc

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:21 -07:00
Pekka Enberg
1b27d05b6e mm: move cache_line_size() to <linux/cache.h>
Not all architectures define cache_line_size() so as suggested by Andrew move
the private implementations in mm/slab.c and mm/slob.c to <linux/cache.h>.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:19 -07:00
Jeremy Fitzhardinge
180c06efce hotplug-memory: make online_page() common
All architectures use an effectively identical definition of online_page(), so
just make it common code.  x86-64, ia64, powerpc and sh are actually
identical; x86-32 is slightly different.

x86-32's differences arise because it puts its hotplug pages in the highmem
zone.  We can handle this in the generic code by inspecting the page to see if
its in highmem, and update the totalhigh_pages count appropriately.  This
leaves init_32.c:free_new_highpage with a single caller, so I folded it into
add_one_highpage_init.

I also removed an incorrect comment referring to the NUMA case; any NUMA
details have already been dealt with by the time online_page() is called.

[akpm@linux-foundation.org: fix indenting]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Tested-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:17 -07:00
Ingo Molnar
f022bfd582 x86: PAT fix
Adrian Bunk noticed the following Coverity report:

> Commit e7f260a276
> (x86: PAT use reserve free memtype in mmap of /dev/mem)
> added the following gem to arch/x86/mm/pat.c:
>
> <--  snip  -->
>
> ...
> int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
>                                 unsigned long size, pgprot_t *vma_prot)
> {
>         u64 offset = ((u64) pfn) << PAGE_SHIFT;
>         unsigned long flags = _PAGE_CACHE_UC_MINUS;
>         unsigned long ret_flags;
> ...
> ...  (nothing that touches ret_flags)
> ...
>         if (flags != _PAGE_CACHE_UC_MINUS) {
>                 retval = reserve_memtype(offset, offset + size, flags, NULL);
>         } else {
>                 retval = reserve_memtype(offset, offset + size, -1, &ret_flags);
>         }
>
>         if (retval < 0)
>                 return 0;
>
>         flags = ret_flags;
>
>         if (pfn <= max_pfn_mapped &&
>             ioremap_change_attr((unsigned long)__va(offset), size, flags) < 0) {
>                 free_memtype(offset, offset + size);
>                 printk(KERN_INFO
>                 "%s:%d /dev/mem ioremap_change_attr failed %s for %Lx-%Lx\n",
>                         current->comm, current->pid,
>                         cattr_name(flags),
>                         offset, offset + size);
>                 return 0;
>         }
>
>         *vma_prot = __pgprot((pgprot_val(*vma_prot) & ~_PAGE_CACHE_MASK) |
>                              flags);
>         return 1;
> }
>
> <--  snip  -->
>
> If (flags != _PAGE_CACHE_UC_MINUS) we pass garbage from the stack to
> ioremap_change_attr() and/or __pgprot().
>
> Spotted by the Coverity checker.

the fix simplifies the code as we get rid of the 'ret_flags'
complication.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:15:16 -07:00
Linus Torvalds
86cf02f8ea x86 PAT: tone down debugging messages some more
Ingo already fixed one of these at my request (in "x86 PAT: tone down
debugging messages", commit 1ebcc654f0),
but there was another one he missed.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-27 11:59:30 -07:00
Linus Torvalds
42cadc8600 Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (147 commits)
  KVM: kill file->f_count abuse in kvm
  KVM: MMU: kvm_pv_mmu_op should not take mmap_sem
  KVM: SVM: remove selective CR0 comment
  KVM: SVM: remove now obsolete FIXME comment
  KVM: SVM: disable CR8 intercept when tpr is not masking interrupts
  KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled
  KVM: export kvm_lapic_set_tpr() to modules
  KVM: SVM: sync TPR value to V_TPR field in the VMCB
  KVM: ppc: PowerPC 440 KVM implementation
  KVM: Add MAINTAINERS entry for PowerPC KVM
  KVM: ppc: Add DCR access information to struct kvm_run
  ppc: Export tlb_44x_hwater for KVM
  KVM: Rename debugfs_dir to kvm_debugfs_dir
  KVM: x86 emulator: fix lea to really get the effective address
  KVM: x86 emulator: fix smsw and lmsw with a memory operand
  KVM: x86 emulator: initialize src.val and dst.val for register operands
  KVM: SVM: force a new asid when initializing the vmcb
  KVM: fix kvm_vcpu_kick vs __vcpu_run race
  KVM: add ioctls to save/store mpstate
  KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*
  ...
2008-04-27 10:13:52 -07:00
Marcelo Tosatti
960b399169 KVM: MMU: kvm_pv_mmu_op should not take mmap_sem
kvm_pv_mmu_op should not take mmap_sem. All gfn_to_page() callers down
in the MMU processing will take it if necessary, so as it is it can
deadlock.

Apparently a leftover from the days before slots_lock.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:45 +03:00
Joerg Roedel
1336028b9a KVM: SVM: remove selective CR0 comment
There is not selective cr0 intercept bug. The code in the comment sets the
CR0.PG bit. But KVM sets the CR4.PG bit for SVM always to implement the paged
real mode. So the 'mov %eax,%cr0' instruction does not change the CR0.PG bit.
Selective CR0 intercepts only occur when a bit is actually changed. So its the
right behavior that there is no intercept on this instruction.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:44 +03:00
Joerg Roedel
aaf697e4e0 KVM: SVM: remove now obsolete FIXME comment
With the usage of the V_TPR field this comment is now obsolete.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:43 +03:00
Joerg Roedel
aaacfc9ae2 KVM: SVM: disable CR8 intercept when tpr is not masking interrupts
This patch disables the intercept of CR8 writes if the TPR is not masking
interrupts. This reduces the total number CR8 intercepts to below 1 percent of
what we have without this patch using Windows 64 bit guests.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:43 +03:00
Joerg Roedel
d7bf8221a3 KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled
If the CR8 write intercept is disabled the V_TPR field of the VMCB needs to be
synced with the TPR field in the local apic.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:42 +03:00
Joerg Roedel
ec7cf6903f KVM: export kvm_lapic_set_tpr() to modules
This patch exports the kvm_lapic_set_tpr() function from the lapic code to
modules. It is required in the kvm-amd module to optimize CR8 intercepts.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:41 +03:00
Joerg Roedel
649d68643e KVM: SVM: sync TPR value to V_TPR field in the VMCB
This patch adds syncing of the lapic.tpr field to the V_TPR field of the VMCB.
With this change we can safely remove the CR8 read intercept.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:40 +03:00
Avi Kivity
f9b7aab35c KVM: x86 emulator: fix lea to really get the effective address
We never hit this, since there is currently no reason to emulate lea.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:35 +03:00
Avi Kivity
16286d082d KVM: x86 emulator: fix smsw and lmsw with a memory operand
lmsw and smsw were implemented only with a register operand.  Extend them
to support a memory operand as well.  Fixes Windows running some display
compatibility test on AMD hosts.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:34 +03:00
Avi Kivity
66b8550573 KVM: x86 emulator: initialize src.val and dst.val for register operands
This lets us treat the case where mod == 3 in the same manner as other cases.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:33 +03:00
Avi Kivity
a79d2f1805 KVM: SVM: force a new asid when initializing the vmcb
Shutdown interception clears the vmcb, leaving the asid at zero (which is
illegal.  so force a new asid on vmcb initialization.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:32 +03:00
Marcelo Tosatti
e9571ed54b KVM: fix kvm_vcpu_kick vs __vcpu_run race
There is a window open between testing of pending IRQ's
and assignment of guest_mode in __vcpu_run.

Injection of IRQ's can race with __vcpu_run as follows:

CPU0                                CPU1
kvm_x86_ops->run()
vcpu->guest_mode = 0                SET_IRQ_LINE ioctl
..
kvm_x86_ops->inject_pending_irq
kvm_cpu_has_interrupt()

                                    apic_test_and_set_irr()
                                    kvm_vcpu_kick
                                    if (vcpu->guest_mode)
                                        send_ipi()

vcpu->guest_mode = 1

So move guest_mode=1 assignment before ->inject_pending_irq, and make
sure that it won't reorder after it.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:32 +03:00
Marcelo Tosatti
62d9f0dbc9 KVM: add ioctls to save/store mpstate
So userspace can save/restore the mpstate during migration.

[avi: export the #define constants describing the value]
[christian: add s390 stubs]
[avi: ditto for ia64]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:16 +03:00
Avi Kivity
a45352908b KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*
We wish to export it to userspace, so move it into the kvm namespace.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:04:13 +03:00
Marcelo Tosatti
3d80840d96 KVM: hlt emulation should take in-kernel APIC/PIT timers into account
Timers that fire between guest hlt and vcpu_block's add_wait_queue() are
ignored, possibly resulting in hangs.

Also make sure that atomic_inc and waitqueue_active tests happen in the
specified order, otherwise the following race is open:

CPU0                                        CPU1
                                            if (waitqueue_active(wq))
add_wait_queue()
if (!atomic_read(pit_timer->pending))
    schedule()
                                            atomic_inc(pit_timer->pending)

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:04:11 +03:00
Joerg Roedel
3564990af1 KVM: SVM: do not intercept task switch with NPT
When KVM uses NPT there is no reason to intercept task switches. This patch
removes the intercept for it in that case.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:23 +03:00
Feng(Eric) Liu
d4c9ff2d1b KVM: Add kvm trace userspace interface
This interface allows user a space application to read the trace of kvm
related events through relayfs.

Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:22 +03:00
Feng (Eric) Liu
2714d1d3d6 KVM: Add trace markers
Trace markers allow userspace to trace execution of a virtual machine
in order to monitor its performance.

Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:19 +03:00
Joerg Roedel
53371b5098 KVM: SVM: add intercept for machine check exception
To properly forward a MCE occured while the guest is running to the host, we
have to intercept this exception and call the host handler by hand. This is
implemented by this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:18 +03:00
Joerg Roedel
6394b6494c KVM: SVM: align shadow CR4.MCE with host
This patch aligns the host version of the CR4.MCE bit with the CR4 active in
the guest. This is necessary to get MCE exceptions when the guest is running.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:18 +03:00
Joerg Roedel
ec077263b2 KVM: SVM: indent svm_set_cr4 with tabs instead of spaces
The svm_set_cr4 function is indented with spaces. This patch replaces
them with tabs.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:17 +03:00
Anthony Liguori
35149e2129 KVM: MMU: Don't assume struct page for x86
This patch introduces a gfn_to_pfn() function and corresponding functions like
kvm_release_pfn_dirty().  Using these new functions, we can modify the x86
MMU to no longer assume that it can always get a struct page for any given gfn.

We don't want to eliminate gfn_to_page() entirely because a number of places
assume they can do gfn_to_page() and then kmap() the results.  When we support
IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will
succeed.

This does not implement support for avoiding reference counting for reserved
RAM or for IO memory.  However, it should make those things pretty straight
forward.

Since we're only introducing new common symbols, I don't think it will break
the non-x86 architectures but I haven't tested those.  I've tested Intel,
AMD, NPT, and hugetlbfs with Windows and Linux guests.

[avi: fix overflow when shifting left pfns by adding casts]

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:15 +03:00
Marcelo Tosatti
bed1d1dfc4 KVM: MMU: prepopulate guest pages after write-protecting
Zdenek reported a bug where a looping "dmsetup status" eventually hangs
on SMP guests.

The problem is that kvm_mmu_get_page() prepopulates the shadow MMU
before write protecting the guest page tables. By doing so, it leaves a
window open where the guest can mark a pte as present while the host has
shadow cached such pte as "notrap". Accesses to such address will fault
in the guest without the host having a chance to fix the situation.

Fix by moving the write protection before the pte prefetch.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:58 +03:00
Avi Kivity
fcd6dbac92 KVM: MMU: Only mark_page_accessed() if the page was accessed by the guest
If the accessed bit is not set, the guest has never accessed this page
(at least through this spte), so there's no need to mark the page
accessed.  This provides more accurate data for the eviction algortithm.

Noted by Andrea Arcangeli.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:57 +03:00
Avi Kivity
3d45830c2b KVM: Free apic access page on vm destruction
Noticed by Marcelo Tosatti.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:54 +03:00
Izik Eidus
3ee16c8145 KVM: MMU: allow the vm to shrink the kvm mmu shadow caches
Allow the Linux memory manager to reclaim memory in the kvm shadow cache.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:53 +03:00
Marcelo Tosatti
3200f405a1 KVM: MMU: unify slots_lock usage
Unify slots_lock acquision around vcpu_run(). This is simpler and less
error-prone.

Also fix some callsites that were not grabbing the lock properly.

[avi: drop slots_lock while in guest mode to avoid holding the lock
      for indefinite periods]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:52 +03:00
Sheng Yang
25c5f225be KVM: VMX: Enable MSR Bitmap feature
MSR Bitmap controls whether the accessing of an MSR causes VM Exit.
Eliminating exits on automatically saved and restored MSRs yields a
small performance gain.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:52 +03:00
Izik Eidus
37817f2982 KVM: x86: hardware task switching support
This emulates the x86 hardware task switch mechanism in software, as it is
unsupported by either vmx or svm.  It allows operating systems which use it,
like freedos, to run as kvm guests.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:39 +03:00
Izik Eidus
2e4d265349 KVM: x86: add functions to get the cpl of vcpu
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:38 +03:00
Avi Kivity
4c9fc8ef50 KVM: VMX: Add module option to disable flexpriority
Useful for debugging.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:37 +03:00
Avi Kivity
268fe02ae0 KVM: no longer EXPERIMENTAL
Long overdue.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:36 +03:00
Avi Kivity
0b49ea8659 KVM: MMU: Introduce and use spte_to_page()
Encapsulate the pte mask'n'shift in a function.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:35 +03:00
Izik Eidus
855149aaa9 KVM: MMU: fix dirty bit setting when removing write permissions
When mmu_set_spte() checks if a page related to spte should be release as
dirty or clean, it check if the shadow pte was writeble, but in case
rmap_write_protect() is called called it is possible for shadow ptes that were
writeble to become readonly and therefor mmu_set_spte will release the pages
as clean.

This patch fix this issue by marking the page as dirty inside
rmap_write_protect().

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:34 +03:00
Avi Kivity
947da53830 KVM: MMU: Set the accessed bit on non-speculative shadow ptes
If we populate a shadow pte due to a fault (and not speculatively due to a
pte write) then we can set the accessed bit on it, as we know it will be
set immediately on the next guest instruction.  This saves a read-modify-write
operation.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:33 +03:00
Glauber Costa
1e977aa12d x86: KVM guest: disable clock before rebooting.
This patch writes 0 (actually, what really matters is that the
LSB is cleared) to the system time msr before shutting down
the machine for kexec.

Without it, we can have a random memory location being written
when the guest comes back

It overrides the functions shutdown, used in the path of kernel_kexec() (sys.c)
and crash_shutdown, used in the path of crash_kexec() (kexec.c)

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:31 +03:00
Glauber Costa
3c62c62502 x86: make native_machine_shutdown non-static
it will allow external users to call it. It is mainly
useful for routines that will override its machine_ops
field for its own special purposes, but want to call the
normal shutdown routine after they're done

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:30 +03:00
Glauber Costa
ed23dc6f5b x86: allow machine_crash_shutdown to be replaced
This patch a llows machine_crash_shutdown to
be replaced, just like any of the other functions
in machine_ops

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:29 +03:00
Marcelo Tosatti
096d14a3b5 x86: KVM guest: hypercall batching
Batch pte updates and tlb flushes in lazy MMU mode.

[avi:
 - adjust to mmu_op
 - helper for getting para_state without debug warnings]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:28 +03:00
Marcelo Tosatti
1da8a77bdc x86: KVM guest: hypercall based pte updates and TLB flushes
Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.

Don't report the feature if two dimensional paging is enabled.

[avi:
 - guest/host split
 - fix 32-bit truncation issues
 - adjust to mmu_op
 - adjust to ->release_*() renamed
 - add ->release_pud()]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:28 +03:00