Every so often, after code shuffles, I need to go through and unbitrot the Lguest Journey (see drivers/lguest/README). Since we now use RCU in a simple form in one place I took the opportunity to expand that explanation. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
		
			
				
	
	
		
			389 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			ArmAsm
		
	
	
	
	
	
			
		
		
	
	
			389 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			ArmAsm
		
	
	
	
	
	
| /*P:900
 | |
|  * This is the Switcher: code which sits at 0xFFC00000 (or 0xFFE00000) astride
 | |
|  * both the Host and Guest to do the low-level Guest<->Host switch.  It is as
 | |
|  * simple as it can be made, but it's naturally very specific to x86.
 | |
|  *
 | |
|  * You have now completed Preparation.  If this has whet your appetite; if you
 | |
|  * are feeling invigorated and refreshed then the next, more challenging stage
 | |
|  * can be found in "make Guest".
 | |
|  :*/
 | |
| 
 | |
| /*M:012
 | |
|  * Lguest is meant to be simple: my rule of thumb is that 1% more LOC must
 | |
|  * gain at least 1% more performance.  Since neither LOC nor performance can be
 | |
|  * measured beforehand, it generally means implementing a feature then deciding
 | |
|  * if it's worth it.  And once it's implemented, who can say no?
 | |
|  *
 | |
|  * This is why I haven't implemented this idea myself.  I want to, but I
 | |
|  * haven't.  You could, though.
 | |
|  *
 | |
|  * The main place where lguest performance sucks is Guest page faulting.  When
 | |
|  * a Guest userspace process hits an unmapped page we switch back to the Host,
 | |
|  * walk the page tables, find it's not mapped, switch back to the Guest page
 | |
|  * fault handler, which calls a hypercall to set the page table entry, then
 | |
|  * finally returns to userspace.  That's two round-trips.
 | |
|  *
 | |
|  * If we had a small walker in the Switcher, we could quickly check the Guest
 | |
|  * page table and if the page isn't mapped, immediately reflect the fault back
 | |
|  * into the Guest.  This means the Switcher would have to know the top of the
 | |
|  * Guest page table and the page fault handler address.
 | |
|  *
 | |
|  * For simplicity, the Guest should only handle the case where the privilege
 | |
|  * level of the fault is 3 and probably only not present or write faults.  It
 | |
|  * should also detect recursive faults, and hand the original fault to the
 | |
|  * Host (which is actually really easy).
 | |
|  *
 | |
|  * Two questions remain.  Would the performance gain outweigh the complexity?
 | |
|  * And who would write the verse documenting it?
 | |
| :*/
 | |
| 
 | |
| /*M:011
 | |
|  * Lguest64 handles NMI.  This gave me NMI envy (until I looked at their
 | |
|  * code).  It's worth doing though, since it would let us use oprofile in the
 | |
|  * Host when a Guest is running.
 | |
| :*/
 | |
| 
 | |
| /*S:100
 | |
|  * Welcome to the Switcher itself!
 | |
|  *
 | |
|  * This file contains the low-level code which changes the CPU to run the Guest
 | |
|  * code, and returns to the Host when something happens.  Understand this, and
 | |
|  * you understand the heart of our journey.
 | |
|  *
 | |
|  * Because this is in assembler rather than C, our tale switches from prose to
 | |
|  * verse.  First I tried limericks:
 | |
|  *
 | |
|  *	There once was an eax reg,
 | |
|  *	To which our pointer was fed,
 | |
|  *	It needed an add,
 | |
|  *	Which asm-offsets.h had
 | |
|  *	But this limerick is hurting my head.
 | |
|  *
 | |
|  * Next I tried haikus, but fitting the required reference to the seasons in
 | |
|  * every stanza was quickly becoming tiresome:
 | |
|  *
 | |
|  *	The %eax reg
 | |
|  *	Holds "struct lguest_pages" now:
 | |
|  *	Cherry blossoms fall.
 | |
|  *
 | |
|  * Then I started with Heroic Verse, but the rhyming requirement leeched away
 | |
|  * the content density and led to some uniquely awful oblique rhymes:
 | |
|  *
 | |
|  *	These constants are coming from struct offsets
 | |
|  *	For use within the asm switcher text.
 | |
|  *
 | |
|  * Finally, I settled for something between heroic hexameter, and normal prose
 | |
|  * with inappropriate linebreaks.  Anyway, it aint no Shakespeare.
 | |
|  */
 | |
| 
 | |
| // Not all kernel headers work from assembler
 | |
| // But these ones are needed: the ENTRY() define
 | |
| // And constants extracted from struct offsets
 | |
| // To avoid magic numbers and breakage:
 | |
| // Should they change the compiler can't save us
 | |
| // Down here in the depths of assembler code.
 | |
| #include <linux/linkage.h>
 | |
| #include <asm/asm-offsets.h>
 | |
| #include <asm/page.h>
 | |
| #include <asm/segment.h>
 | |
| #include <asm/lguest.h>
 | |
| 
 | |
| // We mark the start of the code to copy
 | |
| // It's placed in .text tho it's never run here
 | |
| // You'll see the trick macro at the end
 | |
| // Which interleaves data and text to effect.
 | |
| .text
 | |
| ENTRY(start_switcher_text)
 | |
| 
 | |
| // When we reach switch_to_guest we have just left
 | |
| // The safe and comforting shores of C code
 | |
| // %eax has the "struct lguest_pages" to use
 | |
| // Where we save state and still see it from the Guest
 | |
| // And %ebx holds the Guest shadow pagetable:
 | |
| // Once set we have truly left Host behind.
 | |
| ENTRY(switch_to_guest)
 | |
| 	// We told gcc all its regs could fade,
 | |
| 	// Clobbered by our journey into the Guest
 | |
| 	// We could have saved them, if we tried
 | |
| 	// But time is our master and cycles count.
 | |
| 
 | |
| 	// Segment registers must be saved for the Host
 | |
| 	// We push them on the Host stack for later
 | |
| 	pushl	%es
 | |
| 	pushl	%ds
 | |
| 	pushl	%gs
 | |
| 	pushl	%fs
 | |
| 	// But the compiler is fickle, and heeds
 | |
| 	// No warning of %ebp clobbers
 | |
| 	// When frame pointers are used.  That register
 | |
| 	// Must be saved and restored or chaos strikes.
 | |
| 	pushl	%ebp
 | |
| 	// The Host's stack is done, now save it away
 | |
| 	// In our "struct lguest_pages" at offset
 | |
| 	// Distilled into asm-offsets.h
 | |
| 	movl	%esp, LGUEST_PAGES_host_sp(%eax)
 | |
| 
 | |
| 	// All saved and there's now five steps before us:
 | |
| 	// Stack, GDT, IDT, TSS
 | |
| 	// Then last of all the page tables are flipped.
 | |
| 
 | |
| 	// Yet beware that our stack pointer must be
 | |
| 	// Always valid lest an NMI hits
 | |
| 	// %edx does the duty here as we juggle
 | |
| 	// %eax is lguest_pages: our stack lies within.
 | |
| 	movl	%eax, %edx
 | |
| 	addl	$LGUEST_PAGES_regs, %edx
 | |
| 	movl	%edx, %esp
 | |
| 
 | |
| 	// The Guest's GDT we so carefully
 | |
| 	// Placed in the "struct lguest_pages" before
 | |
| 	lgdt	LGUEST_PAGES_guest_gdt_desc(%eax)
 | |
| 
 | |
| 	// The Guest's IDT we did partially
 | |
| 	// Copy to "struct lguest_pages" as well.
 | |
| 	lidt	LGUEST_PAGES_guest_idt_desc(%eax)
 | |
| 
 | |
| 	// The TSS entry which controls traps
 | |
| 	// Must be loaded up with "ltr" now:
 | |
| 	// The GDT entry that TSS uses 
 | |
| 	// Changes type when we load it: damn Intel!
 | |
| 	// For after we switch over our page tables
 | |
| 	// That entry will be read-only: we'd crash.
 | |
| 	movl	$(GDT_ENTRY_TSS*8), %edx
 | |
| 	ltr	%dx
 | |
| 
 | |
| 	// Look back now, before we take this last step!
 | |
| 	// The Host's TSS entry was also marked used;
 | |
| 	// Let's clear it again for our return.
 | |
| 	// The GDT descriptor of the Host
 | |
| 	// Points to the table after two "size" bytes
 | |
| 	movl	(LGUEST_PAGES_host_gdt_desc+2)(%eax), %edx
 | |
| 	// Clear "used" from type field (byte 5, bit 2)
 | |
| 	andb	$0xFD, (GDT_ENTRY_TSS*8 + 5)(%edx)
 | |
| 
 | |
| 	// Once our page table's switched, the Guest is live!
 | |
| 	// The Host fades as we run this final step.
 | |
| 	// Our "struct lguest_pages" is now read-only.
 | |
| 	movl	%ebx, %cr3
 | |
| 
 | |
| 	// The page table change did one tricky thing:
 | |
| 	// The Guest's register page has been mapped
 | |
| 	// Writable under our %esp (stack) --
 | |
| 	// We can simply pop off all Guest regs.
 | |
| 	popl	%eax
 | |
| 	popl	%ebx
 | |
| 	popl	%ecx
 | |
| 	popl	%edx
 | |
| 	popl	%esi
 | |
| 	popl	%edi
 | |
| 	popl	%ebp
 | |
| 	popl	%gs
 | |
| 	popl	%fs
 | |
| 	popl	%ds
 | |
| 	popl	%es
 | |
| 
 | |
| 	// Near the base of the stack lurk two strange fields
 | |
| 	// Which we fill as we exit the Guest
 | |
| 	// These are the trap number and its error
 | |
| 	// We can simply step past them on our way.
 | |
| 	addl	$8, %esp
 | |
| 
 | |
| 	// The last five stack slots hold return address
 | |
| 	// And everything needed to switch privilege
 | |
| 	// From Switcher's level 0 to Guest's 1,
 | |
| 	// And the stack where the Guest had last left it.
 | |
| 	// Interrupts are turned back on: we are Guest.
 | |
| 	iret
 | |
| 
 | |
| // We tread two paths to switch back to the Host
 | |
| // Yet both must save Guest state and restore Host
 | |
| // So we put the routine in a macro.
 | |
| #define SWITCH_TO_HOST							\
 | |
| 	/* We save the Guest state: all registers first			\
 | |
| 	 * Laid out just as "struct lguest_regs" defines */		\
 | |
| 	pushl	%es;							\
 | |
| 	pushl	%ds;							\
 | |
| 	pushl	%fs;							\
 | |
| 	pushl	%gs;							\
 | |
| 	pushl	%ebp;							\
 | |
| 	pushl	%edi;							\
 | |
| 	pushl	%esi;							\
 | |
| 	pushl	%edx;							\
 | |
| 	pushl	%ecx;							\
 | |
| 	pushl	%ebx;							\
 | |
| 	pushl	%eax;							\
 | |
| 	/* Our stack and our code are using segments			\
 | |
| 	 * Set in the TSS and IDT					\
 | |
| 	 * Yet if we were to touch data we'd use			\
 | |
| 	 * Whatever data segment the Guest had.				\
 | |
| 	 * Load the lguest ds segment for now. */			\
 | |
| 	movl	$(LGUEST_DS), %eax;					\
 | |
| 	movl	%eax, %ds;						\
 | |
| 	/* So where are we?  Which CPU, which struct?			\
 | |
| 	 * The stack is our clue: our TSS starts			\
 | |
| 	 * It at the end of "struct lguest_pages".			\
 | |
| 	 * Or we may have stumbled while restoring			\
 | |
| 	 * Our Guest segment regs while in switch_to_guest,		\
 | |
| 	 * The fault pushed atop that part-unwound stack.		\
 | |
| 	 * If we round the stack down to the page start			\
 | |
| 	 * We're at the start of "struct lguest_pages". */		\
 | |
| 	movl	%esp, %eax;						\
 | |
| 	andl	$(~(1 << PAGE_SHIFT - 1)), %eax;			\
 | |
| 	/* Save our trap number: the switch will obscure it		\
 | |
| 	 * (In the Host the Guest regs are not mapped here)		\
 | |
| 	 * %ebx holds it safe for deliver_to_host */			\
 | |
| 	movl	LGUEST_PAGES_regs_trapnum(%eax), %ebx;			\
 | |
| 	/* The Host GDT, IDT and stack!					\
 | |
| 	 * All these lie safely hidden from the Guest:			\
 | |
| 	 * We must return to the Host page tables			\
 | |
| 	 * (Hence that was saved in struct lguest_pages) */		\
 | |
| 	movl	LGUEST_PAGES_host_cr3(%eax), %edx;			\
 | |
| 	movl	%edx, %cr3;						\
 | |
| 	/* As before, when we looked back at the Host			\
 | |
| 	 * As we left and marked TSS unused				\
 | |
| 	 * So must we now for the Guest left behind. */			\
 | |
| 	andb	$0xFD, (LGUEST_PAGES_guest_gdt+GDT_ENTRY_TSS*8+5)(%eax); \
 | |
| 	/* Switch to Host's GDT, IDT. */				\
 | |
| 	lgdt	LGUEST_PAGES_host_gdt_desc(%eax);			\
 | |
| 	lidt	LGUEST_PAGES_host_idt_desc(%eax);			\
 | |
| 	/* Restore the Host's stack where its saved regs lie */		\
 | |
| 	movl	LGUEST_PAGES_host_sp(%eax), %esp;			\
 | |
| 	/* Last the TSS: our Host is returned */			\
 | |
| 	movl	$(GDT_ENTRY_TSS*8), %edx;				\
 | |
| 	ltr	%dx;							\
 | |
| 	/* Restore now the regs saved right at the first. */		\
 | |
| 	popl	%ebp;							\
 | |
| 	popl	%fs;							\
 | |
| 	popl	%gs;							\
 | |
| 	popl	%ds;							\
 | |
| 	popl	%es
 | |
| 
 | |
| // The first path is trod when the Guest has trapped:
 | |
| // (Which trap it was has been pushed on the stack).
 | |
| // We need only switch back, and the Host will decode
 | |
| // Why we came home, and what needs to be done.
 | |
| return_to_host:
 | |
| 	SWITCH_TO_HOST
 | |
| 	iret
 | |
| 
 | |
| // We are lead to the second path like so:
 | |
| // An interrupt, with some cause external
 | |
| // Has ajerked us rudely from the Guest's code
 | |
| // Again we must return home to the Host
 | |
| deliver_to_host:
 | |
| 	SWITCH_TO_HOST
 | |
| 	// But now we must go home via that place
 | |
| 	// Where that interrupt was supposed to go
 | |
| 	// Had we not been ensconced, running the Guest.
 | |
| 	// Here we see the trickness of run_guest_once():
 | |
| 	// The Host stack is formed like an interrupt
 | |
| 	// With EIP, CS and EFLAGS layered.
 | |
| 	// Interrupt handlers end with "iret"
 | |
| 	// And that will take us home at long long last.
 | |
| 
 | |
| 	// But first we must find the handler to call!
 | |
| 	// The IDT descriptor for the Host
 | |
| 	// Has two bytes for size, and four for address:
 | |
| 	// %edx will hold it for us for now.
 | |
| 	movl	(LGUEST_PAGES_host_idt_desc+2)(%eax), %edx
 | |
| 	// We now know the table address we need,
 | |
| 	// And saved the trap's number inside %ebx.
 | |
| 	// Yet the pointer to the handler is smeared
 | |
| 	// Across the bits of the table entry.
 | |
| 	// What oracle can tell us how to extract
 | |
| 	// From such a convoluted encoding?
 | |
| 	// I consulted gcc, and it gave
 | |
| 	// These instructions, which I gladly credit:
 | |
| 	leal	(%edx,%ebx,8), %eax
 | |
| 	movzwl	(%eax),%edx
 | |
| 	movl	4(%eax), %eax
 | |
| 	xorw	%ax, %ax
 | |
| 	orl	%eax, %edx
 | |
| 	// Now the address of the handler's in %edx
 | |
| 	// We call it now: its "iret" drops us home.
 | |
| 	jmp	*%edx
 | |
| 
 | |
| // Every interrupt can come to us here
 | |
| // But we must truly tell each apart.
 | |
| // They number two hundred and fifty six
 | |
| // And each must land in a different spot,
 | |
| // Push its number on stack, and join the stream.
 | |
| 
 | |
| // And worse, a mere six of the traps stand apart
 | |
| // And push on their stack an addition:
 | |
| // An error number, thirty two bits long
 | |
| // So we punish the other two fifty
 | |
| // And make them push a zero so they match.
 | |
| 
 | |
| // Yet two fifty six entries is long
 | |
| // And all will look most the same as the last
 | |
| // So we create a macro which can make
 | |
| // As many entries as we need to fill.
 | |
| 
 | |
| // Note the change to .data then .text:
 | |
| // We plant the address of each entry
 | |
| // Into a (data) table for the Host
 | |
| // To know where each Guest interrupt should go.
 | |
| .macro IRQ_STUB N TARGET
 | |
| 	.data; .long 1f; .text; 1:
 | |
|  // Trap eight, ten through fourteen and seventeen
 | |
|  // Supply an error number.  Else zero.
 | |
|  .if (\N <> 8) && (\N < 10 || \N > 14) && (\N <> 17)
 | |
| 	pushl	$0
 | |
|  .endif
 | |
| 	pushl	$\N
 | |
| 	jmp	\TARGET
 | |
| 	ALIGN
 | |
| .endm
 | |
| 
 | |
| // This macro creates numerous entries
 | |
| // Using GAS macros which out-power C's.
 | |
| .macro IRQ_STUBS FIRST LAST TARGET
 | |
|  irq=\FIRST
 | |
|  .rept \LAST-\FIRST+1
 | |
| 	IRQ_STUB irq \TARGET
 | |
|   irq=irq+1
 | |
|  .endr
 | |
| .endm
 | |
| 
 | |
| // Here's the marker for our pointer table
 | |
| // Laid in the data section just before
 | |
| // Each macro places the address of code
 | |
| // Forming an array: each one points to text
 | |
| // Which handles interrupt in its turn.
 | |
| .data
 | |
| .global default_idt_entries
 | |
| default_idt_entries:
 | |
| .text
 | |
| 	// The first two traps go straight back to the Host
 | |
| 	IRQ_STUBS 0 1 return_to_host
 | |
| 	// We'll say nothing, yet, about NMI
 | |
| 	IRQ_STUB 2 handle_nmi
 | |
| 	// Other traps also return to the Host
 | |
| 	IRQ_STUBS 3 31 return_to_host
 | |
| 	// All interrupts go via their handlers
 | |
| 	IRQ_STUBS 32 127 deliver_to_host
 | |
| 	// 'Cept system calls coming from userspace
 | |
| 	// Are to go to the Guest, never the Host.
 | |
| 	IRQ_STUB 128 return_to_host
 | |
| 	IRQ_STUBS 129 255 deliver_to_host
 | |
| 
 | |
| // The NMI, what a fabulous beast
 | |
| // Which swoops in and stops us no matter that
 | |
| // We're suspended between heaven and hell,
 | |
| // (Or more likely between the Host and Guest)
 | |
| // When in it comes!  We are dazed and confused
 | |
| // So we do the simplest thing which one can.
 | |
| // Though we've pushed the trap number and zero
 | |
| // We discard them, return, and hope we live.
 | |
| handle_nmi:
 | |
| 	addl	$8, %esp
 | |
| 	iret
 | |
| 
 | |
| // We are done; all that's left is Mastery
 | |
| // And "make Mastery" is a journey long
 | |
| // Designed to make your fingers itch to code.
 | |
| 
 | |
| // Here ends the text, the file and poem.
 | |
| ENTRY(end_switcher_text)
 |