Commit Graph

291 Commits

Author SHA1 Message Date
Jeremy Fitzhardinge
95c7c23b06 xen: report hypervisor version
Various versions of the hypervisor have differences in what ABIs and
features they support.  Print some details into the boot log to help
with remote debugging.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 13:50:42 +02:00
Maciej W. Rozycki
593f4a788e x86: APIC: remove apic_write_around(); use alternatives
Use alternatives to select the workaround for the 11AP Pentium erratum
for the affected steppings on the fly rather than build time.  Remove the
X86_GOOD_APIC configuration option and replace all the calls to
apic_write_around() with plain apic_write(), protecting accesses to the
ESR as appropriate due to the 3AP Pentium erratum.  Remove
apic_read_around() and all its invocations altogether as not needed.
Remove apic_write_atomic() and all its implementing backends.  The use of
ASM_OUTPUT2() is not strictly needed for input constraints, but I have
used it for readability's sake.

I had the feeling no one else was brave enough to do it, so I went ahead
and here it is.  Verified by checking the generated assembly and tested
with both a 32-bit and a 64-bit configuration, also with the 11AP
"feature" forced on and verified with gdb on /proc/kcore to work as
expected (as an 11AP machines are quite hard to get hands on these days).
Some script complained about the use of "volatile", but apic_write() needs
it for the same reason and is effectively a replacement for writel(), so I
have disregarded it.

I am not sure what the policy wrt defconfig files is, they are generated
and there is risk of a conflict resulting from an unrelated change, so I
have left changes to them out.  The option will get removed from them at
the next run.

Some testing with machines other than mine will be needed to avoid some
stupid mistake, but despite its volume, the change is not really that
intrusive, so I am fairly confident that because it works for me, it will
everywhere.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 12:51:21 +02:00
Jeremy Fitzhardinge
93a0886e23 x86, xen, power: fix up config dependencies on PM
Xen save/restore needs bits of code enabled by PM_SLEEP, and PM_SLEEP
depends on PM.  So make XEN_SAVE_RESTORE depend on PM and PM_SLEEP
depend on XEN_SAVE_RESTORE.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-17 19:25:20 +02:00
Jeremy Fitzhardinge
2d9e1e2f58 xen: implement Xen-specific spinlocks
The standard ticket spinlocks are very expensive in a virtual
environment, because their performance depends on Xen's scheduler
giving vcpus time in the order that they're supposed to take the
spinlock.

This implements a Xen-specific spinlock, which should be much more
efficient.

The fast-path is essentially the old Linux-x86 locks, using a single
lock byte.  The locker decrements the byte; if the result is 0, then
they have the lock.  If the lock is negative, then locker must spin
until the lock is positive again.

When there's contention, the locker spin for 2^16[*] iterations waiting
to get the lock.  If it fails to get the lock in that time, it adds
itself to the contention count in the lock and blocks on a per-cpu
event channel.

When unlocking the spinlock, the locker looks to see if there's anyone
blocked waiting for the lock by checking for a non-zero waiter count.
If there's a waiter, it traverses the per-cpu "lock_spinners"
variable, which contains which lock each CPU is waiting on.  It picks
one CPU waiting on the lock and sends it an event to wake it up.

This allows efficient fast-path spinlock operation, while allowing
spinning vcpus to give up their processor time while waiting for a
contended lock.

[*] 2^16 iterations is threshold at which 98% locks have been taken
according to Thomas Friebel's Xen Summit talk "Preventing Guests from
Spinning Around".  Therefore, we'd expect the lock and unlock slow
paths will only be entered 2% of the time.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <clameter@linux-foundation.org>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Virtualization <virtualization@lists.linux-foundation.org>
Cc: Xen devel <xen-devel@lists.xensource.com>
Cc: Thomas Friebel <thomas.friebel@amd.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:15:53 +02:00
Jeremy Fitzhardinge
56397f8dad xen: use lock-byte spinlock implementation
Switch to using the lock-byte spinlock implementation, to avoid the
worst of the performance hit from ticket locks.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <clameter@linux-foundation.org>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Virtualization <virtualization@lists.linux-foundation.org>
Cc: Xen devel <xen-devel@lists.xensource.com>
Cc: Thomas Friebel <thomas.friebel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:15:53 +02:00
Jeremy Fitzhardinge
d5303b811b x86: xen: no need to disable vdso32
Now that the vdso32 code can cope with both syscall and sysenter
missing for 32-bit compat processes, just disable the features without
disabling vdso altogether.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-07-16 11:08:44 +02:00
Jeremy Fitzhardinge
6a52e4b1cd x86_64: further cleanup of 32-bit compat syscall mechanisms
AMD only supports "syscall" from 32-bit compat usermode.
Intel and Centaur(?) only support "sysenter" from 32-bit compat usermode.

Set the X86 feature bits accordingly, and set up the vdso in
accordance with those bits.  On the offchance we run on in a 64-bit
environment which supports neither syscall nor sysenter from 32-bit
mode, then fall back to the int $0x80 vdso.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-07-16 11:08:27 +02:00
Ingo Molnar
71415c6a08 x86, xen, vdso: fix build error
fix:

   arch/x86/xen/built-in.o: In function `xen_enable_syscall':
   (.cpuinit.text+0xdb): undefined reference to `sysctl_vsyscall32'

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:07:58 +02:00
Jeremy Fitzhardinge
62541c3766 xen64: disable 32-bit syscall/sysenter if not supported.
Old versions of Xen (3.1 and before) don't support sysenter or syscall
from 32-bit compat userspaces.  If we can't set the appropriate
syscall callback, then disable the corresponding feature bit, which
will cause the vdso32 setup to fall back appropriately.

Linux assumes that syscall is always available to 32-bit userspace,
and installs it by default if sysenter isn't available.  In that case,
we just disable vdso altogether, forcing userspace libc to fall back
to int $0x80.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:07:44 +02:00
Ingo Molnar
b3fe124389 xen64: fix build error on 32-bit + !HIGHMEM
fix:

arch/x86/xen/enlighten.c: In function 'xen_set_fixmap':
arch/x86/xen/enlighten.c:1127: error: 'FIX_KMAP_BEGIN' undeclared (first use in this function)
arch/x86/xen/enlighten.c:1127: error: (Each undeclared identifier is reported only once
arch/x86/xen/enlighten.c:1127: error: for each function it appears in.)
arch/x86/xen/enlighten.c:1127: error: 'FIX_KMAP_END' undeclared (first use in this function)
make[1]: *** [arch/x86/xen/enlighten.o] Error 1
make: *** [arch/x86/xen/enlighten.o] Error 2

FIX_KMAP_BEGIN is only available on HIGHMEM.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:07:02 +02:00
Jeremy Fitzhardinge
51dd660a2c xen: update Kconfig to allow 64-bit Xen
Allow Xen to be enabled on 64-bit.

Also extend domain size limit from 8 GB (on 32-bit) to 32 GB on 64-bit.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:06:34 +02:00
Jeremy Fitzhardinge
1153968a48 xen: implement Xen write_msr operation
64-bit uses MSRs for important things like the base for fs and
gs-prefixed addresses.  It's more efficient to use a hypercall to
update these, rather than go via the trap and emulate path.

Other MSR writes are just passed through; in an unprivileged domain
they do nothing, but it might be useful later.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:06:20 +02:00
Jeremy Fitzhardinge
bf18bf94dc xen64: set up userspace syscall patch
64-bit userspace expects the vdso to be mapped at a specific fixed
address, which happens to be in the middle of the kernel address
space.  Because we have split user and kernel pagetables, we need to
make special arrangements for the vsyscall mapping to appear in the
kernel part of the user pagetable.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:06:06 +02:00
Jeremy Fitzhardinge
6fcac6d305 xen64: set up syscall and sysenter entrypoints for 64-bit
We set up entrypoints for syscall and sysenter.  sysenter is only used
for 32-bit compat processes, whereas syscall can be used in by both 32
and 64-bit processes.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:05:52 +02:00
Jeremy Fitzhardinge
d6182fbf04 xen64: allocate and manage user pagetables
Because the x86_64 architecture does not enforce segment limits, Xen
cannot protect itself with them as it does in 32-bit mode.  Therefore,
to protect itself, it runs the guest kernel in ring 3.  Since it also
runs the guest userspace in ring3, the guest kernel must maintain a
second pagetable for its userspace, which does not map kernel space.
Naturally, the guest kernel pagetables map both kernel and userspace.

The userspace pagetable is attached to the corresponding kernel
pagetable via the pgd's page->private field.  It is allocated and
freed at the same time as the kernel pgd via the
paravirt_pgd_alloc/free hooks.

Fortunately, the user pagetable is almost entirely shared with the
kernel pagetable; the only difference is the pgd page itself.  set_pgd
will populate all entries in the kernel pagetable, and also set the
corresponding user pgd entry if the address is less than
STACK_TOP_MAX.

The user pagetable must be pinned and unpinned with the kernel one,
but because the pagetables are aliased, pgd_walk() only needs to be
called on the kernel pagetable.  The user pgd page is then
pinned/unpinned along with the kernel pgd page.

xen_write_cr3 must write both the kernel and user cr3s.

The init_mm.pgd pagetable never has a user pagetable allocated for it,
because it can never be used while running usermode.

One awkward area is that early in boot the page structures are not
available.  No user pagetable can exist at that point, but it
complicates the logic to avoid looking at the page structure.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:05:38 +02:00
Eduardo Habkost
8a95408e18 xen64: Clear %fs on xen_load_tls()
We need to do this, otherwise we can get a GPF on hypercall return
after TLS descriptor is cleared but %fs is still pointing to it.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:04:55 +02:00
Jeremy Fitzhardinge
b7c3c5c159 xen: make sure the kernel command line is right
Point the boot params cmd_line_ptr to the domain-builder-provided
command line.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:04:13 +02:00
Jeremy Fitzhardinge
5deb30d194 xen: rework pgd_walk to deal with 32/64 bit
Rewrite pgd_walk to deal with 64-bit address spaces.  There are two
notible features of 64-bit workspaces:

 1. The physical address is only 48 bits wide, with the upper 16 bits
    being sign extension; kernel addresses are negative, and userspace is
    positive.

 2. The Xen hypervisor mapping is at the negative-most address, just above
    the sign-extension hole.

1. means that we can't easily use addresses when traversing the space,
since we must deal with sign extension.  This rewrite expresses
everything in terms of pgd/pud/pmd indices, which means we don't need
to worry about the exact configuration of the virtual memory space.
This approach works equally well in 32-bit.

To deal with 2, assume the hole is between the uppermost userspace
address and PAGE_OFFSET.  For 64-bit this skips the Xen mapping hole.
For 32-bit, the hole is zero-sized.

In all cases, the uppermost kernel address is FIXADDR_TOP.

A side-effect of this patch is that the upper boundary is actually
handled properly, exposing a long-standing bug in 32-bit, which failed
to pin kernel pmd page.  The kernel pmd is not shared, and so must be
explicitly pinned, even though the kernel ptes are shared and don't
need pinning.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:03:59 +02:00
Eduardo Habkost
a8fc1089e4 xen64: implement xen_load_gs_index()
xen-64: implement xen_load_gs_index()

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:03:45 +02:00
Jeremy Fitzhardinge
0725cbb977 xen64: add identity irq->vector map
The x86_64 interrupt subsystem is oriented towards vectors, as opposed
to a flat irq space as it is in x86-32.  This patch adds a simple
identity irq->vector mapping so that we can continue to feed irqs into
do_IRQ() and get a good result.

Ideally x86_32 will unify with the 64-bit code and use vectors too.
At that point we can move to mapping event channels to vectors, which
will allow us to economise on irqs (so per-cpu event channels can
share irqs, rather than having to allocte one per cpu, for example).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:03:16 +02:00
Jeremy Fitzhardinge
88459d4c7e xen64: register callbacks in arch-independent way
Use callback_op hypercall to register callbacks in a 32/64-bit
independent way (64-bit doesn't need a code segment, but that detail
is hidden in XEN_CALLBACK).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:03:01 +02:00
Jeremy Fitzhardinge
952d1d7055 xen64: add pvop for swapgs
swapgs is a no-op under Xen, because the hypervisor makes sure the
right version of %gs is current when switching between user and kernel
modes.  This means that the swapgs "implementation" can be inlined and
used when the stack is unsafe (usermode).  Unfortunately, it means
that disabling patching will result in a non-booting kernel...

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:02:46 +02:00
Jeremy Fitzhardinge
997409d3d0 xen64: deal with extra words Xen pushes onto exception frames
Xen pushes two extra words containing the values of rcx and r11.  This
pvop hook copies the words back into their appropriate registers, and
cleans them off the stack.  This leaves the stack in native form, so
the normal handler can run unchanged.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:02:31 +02:00
Eduardo Habkost
e176d367d0 xen64: xen_write_idt_entry() and cvt_gate_to_trap()
Changed to use the (to-be-)unified descriptor structs.

Signed-off-by: Eduardo Habkost <ehabkost@Rawhide-64.localdomain>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:02:15 +02:00
Jeremy Fitzhardinge
836fe2f291 xen: use set_pte_vaddr
Make Xen's set_pte_mfn() use set_pte_vaddr rather than copying it.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:02:01 +02:00
Jeremy Fitzhardinge
8745f8b0b9 xen64: defer setting pagetable alloc/release ops
We need to wait until the page structure is available to use the
proper pagetable page alloc/release operations, since they use struct
page to determine if a pagetable is pinned.

This happened to work in 32bit because nobody allocated new pagetable
pages in the interim between xen_pagetable_setup_done and
xen_post_allocator_init, but the 64-bit kenrel needs to allocate more
pagetable levels.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:01:45 +02:00
Jeremy Fitzhardinge
4560a2947e xen: set num_processors
Someone's got to do it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:01:31 +02:00
Jeremy Fitzhardinge
ce803e705f xen64: use arbitrary_virt_to_machine for xen_set_pmd
When building initial pagetables in 64-bit kernel the pud/pmd pointer may
be in ioremap/fixmap space, so we need to walk the pagetable to look up the
physical address.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:01:17 +02:00
Jeremy Fitzhardinge
ebd879e397 xen: fix truncation of machine address
arbitrary_virt_to_machine can truncate a machine address if its above
4G.  Cast the problem away.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:01:03 +02:00
Jeremy Fitzhardinge
39dbc5bd34 xen32: create initial mappings like 64-bit
Rearrange the pagetable initialization to share code with the 64-bit
kernel.  Rather than deferring anything to pagetable_setup_start, just
set up an initial pagetable in swapper_pg_dir early at startup, and
create an additional 8MB of physical memory mappings.  This matches
the native head_32.S mappings to a large degree, and allows the rest
of the pagetable setup to continue without much Xen vs. native
difference.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:00:49 +02:00
Jeremy Fitzhardinge
d114e1981c xen64: map an initial chunk of physical memory
Early in boot, map a chunk of extra physical memory for use later on.
We need a pool of mapped pages to allocate further pages to construct
pagetables mapping all physical memory.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:00:35 +02:00
Jeremy Fitzhardinge
22911b3f1c xen64: 64-bit starts using set_pte from very early
It also doesn't need the 32-bit hack version of set_pte for initial
pagetable construction, so just make it use the real thing.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:00:21 +02:00
Jeremy Fitzhardinge
084a2a4e76 xen64: early mapping setup
Set up the initial pagetables to map the kernel mapping into the
physical mapping space.  This makes __va() usable, since it requires
physical mappings.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 11:00:07 +02:00
Jeremy Fitzhardinge
7d087b68d6 xen: cpu_detect is 32-bit only
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:59:38 +02:00
Jeremy Fitzhardinge
15664f968a xen64: use set_fixmap for shared_info structure
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:59:24 +02:00
Jeremy Fitzhardinge
cdacc1278b xen64: add 64-bit assembler
Split xen-asm into 32- and 64-bit files, and implement the 64-bit
variants.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:59:09 +02:00
Jeremy Fitzhardinge
8c5e5ac32f xen64: add xen-head code to head_64.S
Add the Xen entrypoint and ELF notes to head_64.S.  Adapts xen-head.S
to compile either 32-bit or 64-bit.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:58:41 +02:00
Jeremy Fitzhardinge
c7b75947f8 xen64: smp.c compile hacking
A number of random changes to make xen/smp.c compile in 64-bit mode.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>a
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:58:27 +02:00
Jeremy Fitzhardinge
5b09b2876e x86_64: add workaround for no %gs-based percpu
As a stopgap until Mike Travis's x86-64 gs-based percpu patches are
ready, provide workaround functions for x86_read/write_percpu for
Xen's use.

Specifically, this means that we can't really make use of vcpu
placement, because we can't use a single gs-based memory access to get
to vcpu fields.  So disable all that for now.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:58:13 +02:00
Jeremy Fitzhardinge
a9e7062d73 xen: move smp setup into smp.c
Move all the smp_ops setup into smp.c, allowing a lot of things to
become static.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:57:59 +02:00
Jeremy Fitzhardinge
ce87b3d326 xen64: get active_mm from the pda
x86_64 stores the active_mm in the pda, so fetch it from there.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:57:45 +02:00
Jeremy Fitzhardinge
f5d36de069 xen64: random ifdefs to mask out 32-bit only code
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:57:30 +02:00
Jeremy Fitzhardinge
f6e587325b xen64: add extra pv_mmu_ops
We need extra pv_mmu_ops for 64-bit, to deal with the extra level of
pagetable.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:57:16 +02:00
Jeremy Fitzhardinge
7077c33d81 xen: make ELF notes work for 32 and 64 bit
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:56:32 +02:00
Jeremy Fitzhardinge
48b5db2062 xen64: define asm/xen/interface for 64-bit
Copy 64-bit definitions of various interface structures into place.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:56:18 +02:00
Jeremy Fitzhardinge
851fa3c4e7 xen: define set_pte from the outset
We need set_pte to work from a relatively early point, so enable it
from the start.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:56:04 +02:00
Isaku Yamahata
ad55db9fed xen: add xen_arch_resume()/xen_timer_resume hook for ia64 support
add xen_timer_resume() hook.

Timer resume should be done after event channel is resumed.
add xen_arch_resume() hook when ipi becomes usable after resume.
After resume, some cpu specific resource must be reinitialized
on ia64 that can't be set by another cpu.

However available hooks is run once on only one cpu so that ipi has
to be used.

During stop_machine_run() ipi can't be used because interrupt is masked.
So add another hook after stop_machine_run().
Another approach might be use resume hook which is run by
device_resume(). However device_resume() may be executed on
suspend error recovery path.

So it is necessary to determine whether it is executed on real resume path
or error recovery path.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:55:50 +02:00
Jeremy Fitzhardinge
8ba6c2b095 xen: print backtrace on multicall failure
Print a backtrace if a multicall fails, to help with debugging.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:55:21 +02:00
Jeremy Fitzhardinge
cbcd79c2e5 x86: use __page_aligned_data/bss
Update arch/x86's use of page-aligned variables.  The change to
arch/x86/xen/mmu.c fixes an actual bug, but the rest are cleanups
and to set a precedent.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:54:39 +02:00
Eduardo Habkost
c1f2f09ef6 pvops-64: call paravirt_post_allocator_init() on setup_arch()
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:53:57 +02:00
Eduardo Habkost
a312b37b2a x86/paravirt: call paravirt_pagetable_setup_{start, done}
Call paravirt_pagetable_setup_{start,done}

These paravirt_ops functions were not being called on x86_64.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 10:53:43 +02:00
Ingo Molnar
82638844d9 Merge branch 'linus' into cpus4096
Conflicts:

	arch/x86/xen/smp.c
	kernel/sched_rt.c
	net/iucv/iucv.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16 00:29:07 +02:00
Ingo Molnar
1a781a777b Merge branch 'generic-ipi' into generic-ipi-for-linus
Conflicts:

	arch/powerpc/Kconfig
	arch/s390/kernel/time.c
	arch/x86/kernel/apic_32.c
	arch/x86/kernel/cpu/perfctr-watchdog.c
	arch/x86/kernel/i8259_64.c
	arch/x86/kernel/ldt.c
	arch/x86/kernel/nmi_64.c
	arch/x86/kernel/smpboot.c
	arch/x86/xen/smp.c
	include/asm-x86/hw_irq_32.h
	include/asm-x86/hw_irq_64.h
	include/asm-x86/mach-default/irq_vectors.h
	include/asm-x86/mach-voyager/irq_vectors.h
	include/asm-x86/smp.h
	kernel/Makefile

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-15 21:55:59 +02:00
Yinghai Lu
94a8c3c243 x86: let 32bit use apic_ops too - fix
fix for pv - clean up the namespace there too.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-14 09:02:54 +02:00
Suresh Siddha
ad66dd340f x2apic: xen64 paravirt basic apic ops
Define the Xen specific basic apic ops, in additon to paravirt apic ops,
with some misc warning fixes.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-12 08:45:10 +02:00
Alok Kataria
e93ef949fd x86: rename paravirtualized TSC functions
Rename the paravirtualized calculate_cpu_khz to calibrate_tsc.
In all cases, we actually calibrate_tsc and use that as the cpu_khz value.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Dan Hecht <dhecht@vmware.com>
Cc: Dan Hecht <dhecht@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-09 07:43:28 +02:00
Jeremy Fitzhardinge
fab58420ac x86/paravirt, 64-bit: add adjust_exception_frame
64-bit Xen pushes a couple of extra words onto an exception frame.
Add a hook to deal with them.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:15:57 +02:00
Jeremy Fitzhardinge
d75cd22fdd x86/paravirt: split sysret and sysexit
Don't conflate sysret and sysexit; they're different instructions with
different semantics, and may be in use at the same time (at least
within the same kernel, depending on whether its an Intel or AMD
system).

sysexit - just return to userspace, does no register restoration of
    any kind; must explicitly atomically enable interrupts.

sysret - reloads flags from r11, so no need to explicitly enable
    interrupts on 64-bit, responsible for restoring usermode %gs

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citirx.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:13:15 +02:00
Jeremy Fitzhardinge
eba0045ff8 x86/paravirt: add a pgd_alloc/free hooks
Add hooks which are called at pgd_alloc/free time.  The pgd_alloc hook
may return an error code, which if non-zero, causes the pgd allocation
to be failed.  The hooks may be used to allocate/free auxillary
per-pgd information.

also fix:

> * Ingo Molnar <mingo@elte.hu> wrote:
>
>  include/asm/pgalloc.h: In function ‘paravirt_pgd_free':
>  include/asm/pgalloc.h:14: error: parameter name omitted
>  arch/x86/kernel/entry_64.S: In file included from
>  arch/x86/kernel/traps_64.c:51:include/asm/pgalloc.h: In function ‘paravirt_pgd_free':
>  include/asm/pgalloc.h:14: error: parameter name omitted

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:11:01 +02:00
Jeremy Fitzhardinge
88a6846c70 xen: set max_pfn_mapped
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: the arch/x86 maintainers <x86@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:30 +02:00
Jeremy Fitzhardinge
b792c75590 xen: reserve ISA space in e820 map
[ TODO: release the underlying memory back to Xen. ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: the arch/x86 maintainers <x86@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Ian Campbell <ian.campbell@citrix.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:29 +02:00
Jeremy Fitzhardinge
be5bf9fa1c xen: reserve Xen-specific memory in e820 map
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: the arch/x86 maintainers <x86@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:48:28 +02:00
Ingo Molnar
6236af82d8 Merge branch 'x86/fixmap' into x86/devel
Conflicts:

	arch/x86/mm/init_64.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:24:29 +02:00
Ingo Molnar
3de352bbd8 Merge branch 'x86/mpparse' into x86/devel
Conflicts:

	arch/x86/Kconfig
	arch/x86/kernel/io_apic_32.c
	arch/x86/kernel/setup_64.c
	arch/x86/mm/init_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:14:58 +02:00
Yinghai Lu
d0be6bdea1 x86: rename two e820 related functions
rename update_memory_range to e820_update_range
rename add_memory_region to e820_add_region

to make it more clear that they are about e820 map operations.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:37:01 +02:00
Ingo Molnar
896395c290 Merge branch 'linus' into tmp.x86.mpparse.new 2008-07-08 10:32:56 +02:00
Ingo Molnar
6924d1ab8b Merge branches 'x86/numa-fixes', 'x86/apic', 'x86/apm', 'x86/bitops', 'x86/build', 'x86/cleanups', 'x86/cpa', 'x86/cpu', 'x86/defconfig', 'x86/gart', 'x86/i8259', 'x86/intel', 'x86/irqstats', 'x86/kconfig', 'x86/ldt', 'x86/mce', 'x86/memtest', 'x86/pat', 'x86/ptemask', 'x86/resumetrace', 'x86/threadinfo', 'x86/timers', 'x86/vdso' and 'x86/xen' into x86/devel 2008-07-08 09:16:56 +02:00
Ingo Molnar
68083e05d7 Merge commit 'v2.6.26-rc9' into cpus4096 2008-07-06 14:23:39 +02:00
Linus Torvalds
b8a0b6ccf2 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  xen: fix address truncation in pte mfn<->pfn conversion
  arch/x86/mm/init_64.c: early_memtest(): fix types
  x86: fix Intel Mac booting with EFI
2008-07-04 10:46:46 -07:00
Jeremy Fitzhardinge
d8355aca23 xen: fix address truncation in pte mfn<->pfn conversion
When converting the page number in a pte/pmd/pud/pgd between
machine and pseudo-physical addresses, the converted result was
being truncated at 32-bits.  This caused failures on machines
with more than 4G of physical memory.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: "Christopher S. Aker" <caker@theshore.net>
Cc: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-04 11:31:20 +02:00
Jens Axboe
8691e5a8f6 smp_call_function: get rid of the unused nonatomic/retry argument
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:24:35 +02:00
Jens Axboe
3b16cf8748 x86: convert to generic helpers for IPI function calls
This converts x86, x86-64, and xen to use the new helpers for
smp_call_function() and friends, and adds support for
smp_call_function_single().

Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:21:54 +02:00
Jeremy Fitzhardinge
400d34944c xen: add mechanism to extend existing multicalls
Some Xen hypercalls accept an array of operations to work on.  In
general this is because its more efficient for the hypercall to the
work all at once rather than as separate hypercalls (even batched as a
multicall).

This patch adds a mechanism (xen_mc_extend_args()) to allocate more
argument space to the last-issued multicall, in order to extend its
argument list.

The user of this mechanism is xen/mmu.c, which uses it to extend the
args array of mmu_update.  This is particularly valuable when doing
the update for a large mprotect, which goes via
ptep_modify_prot_commit(), but it also manages to batch updates to
pgd/pmds as well.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-25 15:17:34 +02:00
Jeremy Fitzhardinge
e57778a1e3 xen: implement ptep_modify_prot_start/commit
Xen has a pte update function which will update a pte while preserving
its accessed and dirty bits.  This means that ptep_modify_prot_start() can be
implemented as a simple read of the pte value.  The hardware may
update the pte in the meantime, but ptep_modify_prot_commit() updates it while
preserving any changes that may have happened in the meantime.

The updates in ptep_modify_prot_commit() are batched if we're currently in lazy
mmu mode.

The mmu_update hypercall can take a batch of updates to perform, but
this code doesn't make particular use of that feature, in favour of
using generic multicall batching to get them all into the hypervisor.

The net effect of this is that each mprotect pte update turns from two
expensive trap-and-emulate faults into they hypervisor into a single
hypercall whose cost is amortized in a batched multicall.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-25 15:17:23 +02:00
Jeremy Fitzhardinge
08b882c627 paravirt: add hooks for ptep_modify_prot_start/commit
This patch adds paravirt-ops hooks in pv_mmu_ops for ptep_modify_prot_start and
ptep_modify_prot_commit.  This allows the hypervisor-specific backends to
implement these in some more efficient way.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-25 15:16:00 +02:00
Ingo Molnar
8b7ef4ec5b Merge branch 'linus' into x86/fixmap 2008-06-25 12:30:21 +02:00
Ingo Molnar
d02859ecb3 Merge commit 'v2.6.26-rc8' into x86/xen
Conflicts:

	arch/x86/xen/enlighten.c
	arch/x86/xen/mmu.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-25 12:16:51 +02:00
Linus Torvalds
919c0d14ae Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: Remove now unused structs from kvm_para.h
  x86: KVM guest: Use the paravirt clocksource structs and functions
  KVM: Make kvm host use the paravirt clocksource structs
  x86: Make xen use the paravirt clocksource structs and functions
  x86: Add structs and functions for paravirt clocksource
  KVM: VMX: Fix host msr corruption with preemption enabled
  KVM: ioapic: fix lost interrupt when changing a device's irq
  KVM: MMU: Fix oops on guest userspace access to guest pagetable
  KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)
  KVM: MMU: Fix rmap_write_protect() hugepage iteration bug
  KVM: close timer injection race window in __vcpu_run
  KVM: Fix race between timer migration and vcpu migration
2008-06-24 18:09:06 -07:00
Gerd Hoffmann
1c7b67f757 x86: Make xen use the paravirt clocksource structs and functions
This patch updates the xen guest to use the pvclock structs
and helper functions.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24 21:02:32 +03:00
Jeremy Fitzhardinge
2849914393 xen: remove support for non-PAE 32-bit
Non-PAE operation has been deprecated in Xen for a while, and is
rarely tested or used.  xen-unstable has now officially dropped
non-PAE support.  Since Xen/pvops' non-PAE support has also been
broken for a while, we may as well completely drop it altogether.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 17:00:55 +02:00
Jeremy Fitzhardinge
aeaaa59c7e x86/paravirt/xen: add set_fixmap pv_mmu_ops
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 15:09:56 +02:00
Jeremy Fitzhardinge
ebb9cfe20f xen: don't drop NX bit
Because NX is now enforced properly, we must put the hypercall page
into the .text segment so that it is executable.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: the arch/x86 maintainers <x86@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 14:56:41 +02:00
Jeremy Fitzhardinge
05345b0f00 xen: mask unwanted pte bits in __supported_pte_mask
[ Stable: this isn't a bugfix in itself, but it's a pre-requiste
  for "xen: don't drop NX bit" ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: the arch/x86 maintainers <x86@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 14:56:36 +02:00
Jeremy Fitzhardinge
a987b16cc6 xen: don't drop NX bit
Because NX is now enforced properly, we must put the hypercall page
into the .text segment so that it is executable.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: the arch/x86 maintainers <x86@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 14:55:13 +02:00
Jeremy Fitzhardinge
eb179e443d xen: mask unwanted pte bits in __supported_pte_mask
[ Stable: this isn't a bugfix in itself, but it's a pre-requiste
  for "xen: don't drop NX bit" ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: the arch/x86 maintainers <x86@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 14:55:11 +02:00
Ingo Molnar
688d22e23a Merge branch 'linus' into x86/xen 2008-06-16 11:21:27 +02:00
Jeremy Fitzhardinge
f595ec964d common implementation of iterative div/mod
We have a few instances of the open-coded iterative div/mod loop, used
when we don't expcet the dividend to be much bigger than the divisor.
Unfortunately modern gcc's have the tendency to strength "reduce" this
into a full mod operation, which isn't necessarily any faster, and
even if it were, doesn't exist if gcc implements it in libgcc.

The workaround is to put a dummy asm statement in the loop to prevent
gcc from performing the transformation.

This patch creates a single implementation of this loop, and uses it
to replace the open-coded versions I know about.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:47:56 +02:00
Jeremy Fitzhardinge
7e0edc1bc3 xen: add new Xen elfnote types and use them appropriately
Define recently added XEN_ELFNOTEs, and use them appropriately.
Most significantly, this enables domain checkpointing (xm save -c).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 13:25:51 +02:00
Jeremy Fitzhardinge
d07af1f0e3 xen: resume timers on all vcpus
On resume, the vcpu timer modes will not be restored.  The timer
infrastructure doesn't do this for us, since it assumes the cpus
are offline.  We can just poke the other vcpus into the right mode
directly though.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 13:25:44 +02:00
Jeremy Fitzhardinge
9c7a794209 xen: restore vcpu_info mapping
If we're using vcpu_info mapping, then make sure its restored on all
processors before relasing them from stop_machine.

The only complication is that if this fails, we can't continue because
we've already made assumptions that the mapping is available (baked in
calls to the _direct versions of the functions, for example).

Fortunately this can only happen with a 32-bit hypervisor, which may
possibly run out of mapping space.  On a 64-bit hypervisor, this is a
non-issue.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 13:25:34 +02:00
Jeremy Fitzhardinge
e2426cf85f xen: avoid hypercalls when updating unpinned pud/pmd
When operating on an unpinned pagetable (ie, one under construction or
destruction), it isn't necessary to use a hypercall to update a
pud/pmd entry.  Jan Beulich observed that a similar optimisation
avoided many thousands of hypercalls while doing a kernel build.

One tricky part is that early in the kernel boot there's no page
structure, so we can't check to see if the page is pinned.  In that
case, we just always use the hypercall.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 13:24:40 +02:00
Ingo Molnar
15ce60056b xen: export get_phys_to_machine
-tip testing found the following xen-console symbols trouble:

  ERROR: "get_phys_to_machine" [drivers/video/xen-fbfront.ko] undefined!
  ERROR: "get_phys_to_machine" [drivers/net/xen-netfront.ko] undefined!
  ERROR: "get_phys_to_machine" [drivers/input/xen-kbdfront.ko] undefined!

with:

  http://redhat.com/~mingo/misc/config-Mon_Jun__2_12_25_13_CEST_2008.bad
2008-06-02 13:20:11 +02:00
Yinghai Lu
f0d43100f1 x86: extend e820 early_res support 32bit -fix #3
introduce init_pg_table_start, so xen PV could specify the value.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:47 +02:00
Ingo Molnar
b20aeccd6a xen: fix early bootup crash on native hardware
-tip tree auto-testing found the following early bootup hang:

-------------->
get_memcfg_from_srat: assigning address to rsdp
RSD PTR  v0 [Nvidia]
BUG: Int 14: CR2 ffd00040
     EDI 8092fbfe  ESI ffd00040  EBP 80b0aee8  ESP 80b0aed0
     EBX 000f76f0  EDX 0000000e  ECX 00000003  EAX ffd00040
     err 00000000  EIP 802c055a   CS 00000060  flg 00010006
Stack: ffd00040 80bc78d0 80b0af6c 80b1dbfe 8093d8ba 00000008 80b42810 80b4ddb4
       80b42842 00000000 80b0af1c 801079c8 808e724e 00000000 80b42871 802c0531
       00000100 00000000 0003fff0 80b0af40 80129999 00040100 00040100 00000000
Pid: 0, comm: swapper Not tainted 2.6.26-rc4-sched-devel.git #570
 [<802c055a>] ? strncmp+0x11/0x25
 [<80b1dbfe>] ? get_memcfg_from_srat+0xb4/0x568
 [<801079c8>] ? mcount_call+0x5/0x9
 [<802c0531>] ? strcmp+0xa/0x22
 [<80129999>] ? printk+0x38/0x3a
 [<80129999>] ? printk+0x38/0x3a
 [<8011b122>] ? memory_present+0x66/0x6f
 [<80b216b4>] ? setup_memory+0x13/0x40c
 [<80b16b47>] ? propagate_e820_map+0x80/0x97
 [<80b1622a>] ? setup_arch+0x248/0x477
 [<80129999>] ? printk+0x38/0x3a
 [<80b11759>] ? start_kernel+0x6e/0x2eb
 [<80b110fc>] ? i386_start_kernel+0xeb/0xf2
 =======================
<------

with this config:

   http://redhat.com/~mingo/misc/config-Wed_May_28_01_33_33_CEST_2008.bad

The thing is, the crash makes little sense at first sight. We crash on a
benign-looking printk. The code around it got changed in -tip but
checking those topic branches individually did not reproduce the bug.

Bisection led to this commit:

|   d5edbc1f75 is first bad commit
|   commit d5edbc1f75
|   Author: Jeremy Fitzhardinge <jeremy@goop.org>
|   Date:   Mon May 26 23:31:22 2008 +0100
|
|   xen: add p2m mfn_list_list

Which is somewhat surprising, as on native hardware Xen client side
should have little to no side-effects.

After some head scratching, it turns out the following happened:
randconfig enabled the following Xen options:

  CONFIG_XEN=y
  CONFIG_XEN_MAX_DOMAIN_MEMORY=8
  # CONFIG_XEN_BLKDEV_FRONTEND is not set
  # CONFIG_XEN_NETDEV_FRONTEND is not set
  CONFIG_HVC_XEN=y
  # CONFIG_XEN_BALLOON is not set

which activated this piece of code in arch/x86/xen/mmu.c:

> @@ -69,6 +69,13 @@
>  	__attribute__((section(".data.page_aligned"))) =
>  		{ [ 0 ... TOP_ENTRIES - 1] = &p2m_missing[0] };
>
> +/* Arrays of p2m arrays expressed in mfns used for save/restore */
> +static unsigned long p2m_top_mfn[TOP_ENTRIES]
> +	__attribute__((section(".bss.page_aligned")));
> +
> +static unsigned long p2m_top_mfn_list[TOP_ENTRIES / P2M_ENTRIES_PER_PAGE]
> +	__attribute__((section(".bss.page_aligned")));

The problem is, you must only put variables into .bss.page_aligned that
have a _size_ that is _exactly_ page aligned. In this case the size of
p2m_top_mfn_list is not page aligned:

 80b8d000 b p2m_top_mfn
 80b8f000 b p2m_top_mfn_list
 80b8f008 b softirq_stack
 80b97008 b hardirq_stack
 80b9f008 b bm_pte

So all subsequent variables get unaligned which, depending on luck,
breaks the kernel in various funny ways. In this case what killed the
kernel first was the misaligned bootmap pte page, resulting in that
creative crash above.

Anyway, this was a fun bug to track down :-)

I think the moral is that .bss.page_aligned is a dangerous construct in
its current form, and the symptoms of breakage are very non-trivial, so
i think we need build-time checks to make sure all symbols in
.bss.page_aligned are truly page aligned.

The Xen fix below gets the kernel booting again.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-28 14:32:06 +02:00
Jeremy Fitzhardinge
359cdd3f86 xen: maintain clock offset over save/restore
Hook into the device model to make sure that timekeeping's resume handler
is called.  This deals with our clocksource's non-monotonicity over the
save/restore.  Explicitly call clock_has_changed() to make sure that
all the timers get retriggered properly.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:38 +02:00
Jeremy Fitzhardinge
0e91398f2a xen: implement save/restore
This patch implements Xen save/restore and migration.

Saving is triggered via xenbus, which is polled in
drivers/xen/manage.c.  When a suspend request comes in, the kernel
prepares itself for saving by:

1 - Freeze all processes.  This is primarily to prevent any
    partially-completed pagetable updates from confusing the suspend
    process.  If CONFIG_PREEMPT isn't defined, then this isn't necessary.

2 - Suspend xenbus and other devices

3 - Stop_machine, to make sure all the other vcpus are quiescent.  The
    Xen tools require the domain to run its save off vcpu0.

4 - Within the stop_machine state, it pins any unpinned pgds (under
    construction or destruction), performs canonicalizes various other
    pieces of state (mostly converting mfns to pfns), and finally

5 - Suspend the domain

Restore reverses the steps used to save the domain, ending when all
the frozen processes are thawed.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:38 +02:00
Jeremy Fitzhardinge
d5edbc1f75 xen: add p2m mfn_list_list
When saving a domain, the Xen tools need to remap all our mfns to
portable pfns.  In order to remap our p2m table, it needs to know
where all its pages are, so maintain the references to the p2m table
for it to use.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:37 +02:00
Jeremy Fitzhardinge
a0d695c821 xen: make dummy_shared_info non-static
Rename dummy_shared_info to xen_dummy_shared_info and make it
non-static, in anticipation of users outside of enlighten.c

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:37 +02:00
Jeremy Fitzhardinge
cf0923ea29 xen: efficiently support a holey p2m table
When using sparsemem and memory hotplug, the kernel's pseudo-physical
address space can be discontigious.  Previously this was dealt with by
having the upper parts of the radix tree stubbed off.  Unfortunately,
this is incompatible with save/restore, which requires a complete p2m
table.

The solution is to have a special distinguished all-invalid p2m leaf
page, which we can point all the hole areas at.  This allows the tools
to see a complete p2m table, but it only costs a page for all memory
holes.

It also simplifies the code since it removes a few special cases.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:37 +02:00
Jeremy Fitzhardinge
8006ec3e91 xen: add configurable max domain size
Add a config option to set the max size of a Xen domain.  This is used
to scale the size of the physical-to-machine array; it ends up using
around 1 page/GByte, so there's no reason to be very restrictive.

For a 32-bit guest, the default value of 8GB is probably sufficient;
there's not much point in giving a 32-bit machine much more memory
than that.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:37 +02:00
Jeremy Fitzhardinge
d451bb7aa8 xen: make phys_to_machine structure dynamic
We now support the use of memory hotplug, so the physical to machine
page mapping structure must be dynamic.  This is implemented as a
two-level radix tree structure, which allows us to efficiently
incrementally allocate memory for the p2m table as new pages are
added.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:37 +02:00
Jeremy Fitzhardinge
38bb5ab417 xen: count resched interrupts properly
Make sure resched interrupts appear in /proc/interrupts in the proper
place.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:37 +02:00
Isaku Yamahata
ec9b2065d4 xen: Move manage.c to drivers/xen for ia64/xen support
move arch/x86/xen/manage.c under drivers/xen/to share codes
with x86 and ia64.
ia64/xen also uses manage.c

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:36 +02:00
Jeremy Fitzhardinge
83abc70a4c xen: make earlyprintk=xen work again
For some perverse reason, if you call add_preferred_console() it prevents
setup_early_printk() from successfully enabling the boot console -
unless you make it a preferred console too...

Also, make xenboot console output distinct from normal console output,
since it gets repeated when the console handover happens, and the
duplicated output is confusing without disambiguation.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
2008-05-27 10:11:36 +02:00
Markus Armbruster
9e124fe16f xen: Enable console tty by default in domU if it's not a dummy
Without console= arguments on the kernel command line, the first
console to register becomes enabled and the preferred console (the one
behind /dev/console).  This is normally tty (assuming
CONFIG_VT_CONSOLE is enabled, which it commonly is).

This is okay as long tty is a useful console.  But unless we have the
PV framebuffer, and it is enabled for this domain, tty0 in domU is
merely a dummy.  In that case, we want the preferred console to be the
Xen console hvc0, and we want it without having to fiddle with the
kernel command line.  Commit b8c2d3dfbc
did that for us.

Since we now have the PV framebuffer, we want to enable and prefer tty
again, but only when PVFB is enabled.  But even then we still want to
enable the Xen console as well.

Problem: when tty registers, we can't yet know whether the PVFB is
enabled.  By the time we can know (xenstore is up), the console setup
game is over.

Solution: enable console tty by default, but keep hvc as the preferred
console.  Change the preferred console to tty when PVFB probes
successfully, unless we've been given console kernel parameters.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:36 +02:00
Jeremy Fitzhardinge
a15af1c9ea x86/paravirt: add pte_flags to just get pte flags
Add pte_flags() to extract the flags from a pte.  This is a special
case of pte_val() which is only guaranteed to return the pte's flags
correctly; the page number may be corrupted or missing.

The intent is to allow paravirt implementations to return pte flags
without having to do any translation of the page number (most notably,
Xen).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:36 +02:00
Jeremy Fitzhardinge
239d1fc04e xen: don't worry about preempt during xen_irq_enable()
When enabling interrupts, we don't need to worry about preemption,
because we either enter with interrupts disabled - so no preemption -
or the caller is confused and is re-enabling interrupts on some
indeterminate processor.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:35 +02:00
Jeremy Fitzhardinge
2956a3511c xen: allow some cr4 updates
The guest can legitimately change things like cr4.OSFXSR and
OSXMMEXCPT, so let it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:35 +02:00
Jeremy Fitzhardinge
349c709f42 xen: use new sched_op
Use the new sched_op hypercall, mainly because xenner doesn't support
the old one.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:35 +02:00
Jeremy Fitzhardinge
7b1333aa4c xen: use hypercall rather than clts
Xen will trap and emulate clts, but its better to use a hypercall.
Also, xenner doesn't handle clts.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-27 10:11:35 +02:00
Mike Travis
334ef7a7ab x86: use performance variant for_each_cpu_mask_nr
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate

Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 2d474871e2fb092eb46a0930aba5442e10eb96cc
Author: Mike Travis <travis@sgi.com>
Date:   Mon May 12 21:21:13 2008 +0200
2008-05-23 18:35:12 +02:00
Jan Beulich
de067814d6 x86/xen: fix arbitrary_virt_to_machine()
While I realize that the function isn't currently being used, I still
think an obvious mistake like this should be corrected.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 14:08:06 +02:00
Jeremy Fitzhardinge
3843fc2575 xen: remove support for non-PAE 32-bit
Non-PAE operation has been deprecated in Xen for a while, and is
rarely tested or used.  xen-unstable has now officially dropped
non-PAE support.  Since Xen/pvops' non-PAE support has also been
broken for a while, we may as well completely drop it altogether.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-22 18:42:49 +02:00
Christoph Lameter
d60cd46bbd pageflags: use proper page flag functions in Xen
Xen uses bitops to manipulate page flags.  Make it use proper page flag
functions.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:22 -07:00
Akinobu Mita
7c04e64a1b x86: use cpumask function for present, possible, and online cpus
cpu_online(), cpu_present(), for_each_possible_cpu(), num_possible_cpus()

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Jeremy Fitzhardinge
af7ae3b9c4 xen: allow compilation with non-flat memory
There's no real reason we can't support sparsemem/discontigmem, so do so.
This is mostly useful to support hotplug memory.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:33 +02:00
Jeremy Fitzhardinge
b77797fb2b xen: fold xen_sysexit into xen_iret
xen_sysexit and xen_iret were doing essentially the same thing.  Rather
than having a separate implementation for xen_sysexit, we can just strip
the stack back to an iret frame and jump into xen_iret.  This removes
a lot of code and complexity - specifically, another critical region.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:33 +02:00
Jeremy Fitzhardinge
2bd50036b5 xen: allow set_pte_at on init_mm to be lockless
The usual pagetable locking protocol doesn't seem to apply to updates
to init_mm, so don't rely on preemption being disabled in xen_set_pte_at
on init_mm.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:33 +02:00
Jeremy Fitzhardinge
41e332b2a2 xen: disable preemption during tlb flush
Various places in the kernel flush the tlb even though preemption doens't
guarantee the tlb flush is happening on any particular CPU.  In many cases
this doesn't seem to matter, so don't make a fuss about it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:33 +02:00
Isaku Yamahata
8d3d2106c1 xen: make grant table arch portable
split out x86 specific part from grant-table.c and
allow ia64/xen specific initialization.
ia64/xen grant table is based on pseudo physical address
(guest physical address) unlike x86/xen. On ia64 init_mm
doesn't map identity straight mapped area.
ia64/xen specific grant table initialization is necessary.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Isaku Yamahata
e04d0d0767 xen: move events.c to drivers/xen for IA64/Xen support
move arch/x86/xen/events.c undedr drivers/xen to share codes
with x86 and ia64. And minor adjustment to compile.
ia64/xen also uses events.c

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Isaku Yamahata
af711cda4f xen: move features.c from arch/x86/xen/features.c to drivers/xen
ia64/xen also uses it too. Move it into common place so that
ia64/xen can share the code.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge
0f2c876952 xen: jump to iret fixup
Use jmp rather than call for the iret fixup, so its consistent with
the sysexit fixup, and it simplifies the stack (which is already
complex).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge
dbe9e994c9 xen: no need for domU to worry about MCE/MCA
Mask MCE/MCA out of cpu caps.  Its harmless to leave them there, but
it does prevent the kernel from starting an unnecessary thread.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge
229664bee6 xen: short-cut for recursive event handling
If an event comes in while events are currently being processed, then
just increment the counter and have the outer event loop reprocess the
pending events.  This prevents unbounded recursion on heavy event
loads (of course massive event storms will cause infinite loops).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge
ee8fa1c67f xen: make sure retriggered events are set pending
retrigger_dynirq() was incomplete, and didn't properly set the event
to be pending again.  It doesn't seem to actually get used.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge
ee523ca1e4 xen: implement a debug-interrupt handler
Xen supports the notion of a debug interrupt which can be triggered
from the console.  For now this is implemented to show pending events,
masks and each CPU's pending event set.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge
e2a81baf66 xen: support sysenter/sysexit if hypervisor does
64-bit Xen supports sysenter for 32-bit guests, so support its
use.  (sysenter is faster than int $0x80 in 32-on-64.)

sysexit is still not supported, so we fake it up using iret.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
90e9f53662 xen: make sure iret faults are trapped
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
947a69c90c xen: unify pte operations
We can fold the essentially common pte functions together now.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
430442e38e xen: make use of pte_t union
pte_t always contains a "pte" field for the whole pte value, so make
use of it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
abf33038ff xen: use appropriate pte types
Convert Xen pagetable handling to use appropriate *val_t types.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
6944a9c894 x86: rename paravirt_alloc_pt etc after the pagetable structure
Rename (alloc|release)_(pt|pd) to pte/pmd to explicitly match the name
of the appropriate pagetable level structure.

[ x86.git merge work by Mark McLoughlin <markmc@redhat.com> ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Glauber Costa
ecaa6c9de7 x86: change naming of cpu_initialized_mask for xen
xen does not use the global cpu_initialized mask, but rather,
a specific one. So we change its name so it won't conflict with the upcoming
movement of cpu_initialized_mask from smp_64.h to smp_32.h.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:32 +02:00
Jeremy Fitzhardinge
81e103f1f1 xen: use iret instruction all the time
Change iret implementation to not be dependent on direct-access vcpu
structure.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:40:51 +02:00
Markus Armbruster
b8c2d3dfbc xen: make hvc0 the preferred console in domU
This makes the Xen console just work.  Before, you had to ask for it
on the kernel command line with console=hvc0

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:40:51 +02:00
Paolo Ciarrocchi
7ebed39ff7 x86: coding style fixes to arch/x86/xen/multicalls.c
Before:
   total: 2 errors, 2 warnings, 138 lines checked
After:
   total: 0 errors, 2 warnings, 138 lines checked

No code changed:

arch/x86/xen/multicalls.o:

   text	   data	    bss	    dec	    hex	filename
    887	   2832	      0	   3719	    e87	multicalls.o.before
    887	   2832	      0	   3719	    e87	multicalls.o.after

md5:
   cf6d72d9db6dc5a3ebe01eec9f05e95f  multicalls.o.before.asm
   cf6d72d9db6dc5a3ebe01eec9f05e95f  multicalls.o.after.asm

Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:40:50 +02:00
Mark McLoughlin
c946c7de49 xen: Clear PG_pinned in release_{pt,pd}()
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Cc: xen-devel@lists.xensource.com
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-04 18:36:48 +02:00
Mark McLoughlin
a684d69d15 xen: Do not pin/unpin PMD pages
i.e. with this simple test case:

    int fd = open("/dev/zero", O_RDONLY);
    munmap(mmap((void *)0x40000000, 0x1000_LEN, PROT_READ, MAP_PRIVATE, fd, 0), 0x1000);
    close(fd);

we currently get:

   kernel BUG at arch/x86/xen/enlighten.c:678!
   ...
   EIP is at xen_release_pt+0x79/0xa9
   ...
   Call Trace:
    [<c041da25>] ? __pmd_free_tlb+0x1a/0x75
    [<c047a192>] ? free_pgd_range+0x1d2/0x2b5
    [<c047a2f3>] ? free_pgtables+0x7e/0x93
    [<c047b272>] ? unmap_region+0xb9/0xf5
    [<c047c1bd>] ? do_munmap+0x193/0x1f5
    [<c047c24f>] ? sys_munmap+0x30/0x3f
    [<c0408cce>] ? syscall_call+0x7/0xb
    =======================

and xen complains:

  (XEN) mm.c:2241:d4 Mfn 1cc37 not pinned

Further details at:

  https://bugzilla.redhat.com/436453

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Cc: xen-devel@lists.xensource.com
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-04 18:36:48 +02:00
Mark McLoughlin
f64337062c xen: refactor xen_{alloc,release}_{pt,pd}()
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Cc: xen-devel@lists.xensource.com
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-04 18:36:48 +02:00
Jeremy Fitzhardinge
2e8fe719b5 xen: fix UP setup of shared_info
We need to set up the shared_info pointer once we've mapped the real
shared_info into its fixmap slot.  That needs to happen once the general
pagetable setup has been done.  Previously, the UP shared_info was set
up one in xen_start_kernel, but that was left pointing to the dummy
shared info.  Unfortunately there's no really good place to do a later
setup of the shared_info in UP, so just do it once the pagetable setup
has been done.

[ Stable: needed in 2.6.24.x ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-27 16:08:45 +01:00
Jeremy Fitzhardinge
04c44a080d xen: fix RMW when unmasking events
xen_irq_enable_direct and xen_sysexit were using "andw $0x00ff,
XEN_vcpu_info_pending(vcpu)" to unmask events and test for pending ones
in one instuction.

Unfortunately, the pending flag must be modified with a locked operation
since it can be set by another CPU, and the unlocked form of this
operation was causing the pending flag to get lost, allowing the processor
to return to usermode with pending events and ultimately deadlock.

The simple fix would be to make it a locked operation, but that's rather
costly and unnecessary.  The fix here is to split the mask-clearing and
pending-testing into two instructions; the interrupt window between
them is of no concern because either way pending or new events will
be processed.

This should fix lingering bugs in using direct vcpu structure access too.

[ Stable: needed in 2.6.24.x ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-27 16:08:45 +01:00
Ian Campbell
87d034f313 x86/xen: fix DomU boot problem
Construct Xen guest e820 map with a hole between 640K-1M.

It's pure luck that Xen kernels have gotten away with it in the past.

The patch below seems like the right thing to do. It certainly boots in
a domU without the DMI problem (without any of the other related patches
such as Alexander's).

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Tested-by: Mark McLoughlin <markmc@redhat.com>
Acked-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-04 17:10:12 +01:00
Jeremy Fitzhardinge
d40e705903 xen: mask out SEP from CPUID
Fix 32-on-64 pvops kernel:

we don't want userspace using syscall/sysenter, even if the hypervisor
supports it, so mask it out from CPUID.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 18:55:43 +01:00
Jeremy Fitzhardinge
2b5407811d xen: unpin initial Xen pagetable once we're finished with it
Unpin the Xen-provided pagetable once we've finished with it, so it
doesn't cause stray references which cause later swapper_pg_dir
pagetable updates to fail.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Tested-by: Jody Belka <knew-linux@pimb.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-13 16:20:35 +01:00
Harvey Harrison
da7bfc50f5 x86: sparse warnings in pageattr.c
Adjust the definition of lookup_address to take an unsigned long
level argument.  Adjust callers in xen/mmu.c that pass in a
dummy variable.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-09 23:24:08 +01:00
Harvey Harrison
88a5ac8966 x86: fix sparse warning in xen/time.c
Use xen_khz to denote xen_specific clock speed.  Avoid shadowing
cpu_khz.

arch/x86/xen/time.c:220:6: warning: symbol 'cpu_khz' shadows an earlier one
include/asm/tsc.h:17:21: originally declared here

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-09 23:24:08 +01:00
Ingo Molnar
f0646e43ac x86: return the page table level in lookup_address()
based on this patch from Andi Kleen:

|  Subject: CPA: Return the page table level in lookup_address()
|  From: Andi Kleen <ak@suse.de>
|
|  Needed for the next change.
|
|  And change all the callers.

and ported it to x86.git.

Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:43 +01:00
Jeremy Fitzhardinge
1c70e9bd83 xen: deal with pmd being allocated/freed
Deal properly with pmd-level pages being allocated and freed
dynamically.  We can handle them more or less the same as pte pages.

Also, deal with early_ioremap pagetable manipulations.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:39 +01:00
Jeremy Fitzhardinge
a89780f3b8 xen: fix mismerge in masking pte flags
Looks like a mismerge/misapply dropped one of the cases of pte flag
masking for Xen.  Also, only mask the flags for present ptes.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:39 +01:00