We have a number of systems industry-wide that have a subset of their
functionality that works as follows:
1. Receive a message from local kmsg, serial console, or netconsole;
2. Apply a set of rules to classify the message;
3. Do something based on this classification (like scheduling a
   remediation for the machine), rinse, and repeat.
As a couple of examples of places we have this implemented just inside
Facebook, although this isn't a Facebook-specific problem, we have this
inside our netconsole processing (for alarm classification), and as part
of our machine health checking. We use these messages to determine
fairly important metrics around production health, and it's important
that we get them right.
While for some kinds of issues we have counters, tracepoints, or metrics
with a stable interface which can reliably indicate the issue, in order
to react to production issues quickly we need to work with the interface
which most kernel developers naturally use when developing: printk.
Most production issues come from unexpected phenomena, and as such
usually the code in question doesn't have easily usable tracepoints or
other counters available for the specific problem being mitigated. We
have a number of lines of monitoring defence against problems in
production (host metrics, process metrics, service metrics, etc), and
where it's not feasible to reliably monitor at another level, this kind
of pragmatic netconsole monitoring is essential.
As one would expect, monitoring using printk is rather brittle for a
number of reasons -- most notably that the message might disappear
entirely in a new version of the kernel, or that the message may change
in some way that the regex or other classification methods start to
silently fail.
One factor that makes this even harder is that, under normal operation,
many of these messages are never expected to be hit. For example, there
may be a rare hardware bug which one wants to detect if it was to ever
happen again, but its recurrence is not likely or anticipated. This
precludes using something like checking whether the printk in question
was printed somewhere fleetwide recently to determine whether the
message in question is still present or not, since we don't anticipate
that it should be printed anywhere, but still need to monitor for its
future presence in the long-term.
This class of issue has happened on a number of occasions, causing
unhealthy machines with hardware issues to remain in production for
longer than ideal. As a recent example, some monitoring around
blk_update_request fell out of date and caused semi-broken machines to
remain in production for longer than would be desirable.
Searching through the codebase to find the message is also extremely
fragile, because many of the messages are further constructed beyond
their callsite (eg. btrfs_printk and other module-specific wrappers,
each with their own functionality). Even if they aren't, guessing the
format and formulation of the underlying message based on the aesthetics
of the message emitted is not a recipe for success at scale, and our
previous issues with fleetwide machine health checking demonstrate as
much.
This provides a solution to the issue of silently changed or deleted
printks: we record pointers to all printk format strings known at
compile time into a new .printk_index section, both in vmlinux and
modules. At runtime, this can then be iterated by looking at
<debugfs>/printk/index/<module>, which emits the following format, both
readable by humans and able to be parsed by machines:
    $ head -1 vmlinux; shuf -n 5 vmlinux
    # <level[,flags]> filename:line function "format"
    <5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
    <4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
    <6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
    <6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
    <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
This mitigates the majority of cases where we have a highly-specific
printk which we want to match on, as we can now enumerate and check
whether the format changed or the printk callsite disappeared entirely
in userspace. This allows us to catch changes to printks we monitor
earlier and decide what to do about it before it becomes problematic.
There is no additional runtime cost for printk callers or printk itself,
and the assembly generated is exactly the same.
Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h}
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name
		
	
			
		
			
				
	
	
		
			583 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			ArmAsm
		
	
	
	
	
	
			
		
		
	
	
			583 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			ArmAsm
		
	
	
	
	
	
| /* SPDX-License-Identifier: GPL-2.0 */
 | |
| /*
 | |
|  *
 | |
|  *  Copyright (C) 1991, 1992  Linus Torvalds
 | |
|  *
 | |
|  *  Enhanced CPU detection and feature setting code by Mike Jagdis
 | |
|  *  and Martin Mares, November 1997.
 | |
|  */
 | |
| 
 | |
| .text
 | |
| #include <linux/threads.h>
 | |
| #include <linux/init.h>
 | |
| #include <linux/linkage.h>
 | |
| #include <asm/segment.h>
 | |
| #include <asm/page_types.h>
 | |
| #include <asm/pgtable_types.h>
 | |
| #include <asm/cache.h>
 | |
| #include <asm/thread_info.h>
 | |
| #include <asm/asm-offsets.h>
 | |
| #include <asm/setup.h>
 | |
| #include <asm/processor-flags.h>
 | |
| #include <asm/msr-index.h>
 | |
| #include <asm/cpufeatures.h>
 | |
| #include <asm/percpu.h>
 | |
| #include <asm/nops.h>
 | |
| #include <asm/bootparam.h>
 | |
| #include <asm/export.h>
 | |
| #include <asm/pgtable_32.h>
 | |
| 
 | |
| /* Physical address */
 | |
| #define pa(X) ((X) - __PAGE_OFFSET)
 | |
| 
 | |
| /*
 | |
|  * References to members of the new_cpu_data structure.
 | |
|  */
 | |
| 
 | |
| #define X86		new_cpu_data+CPUINFO_x86
 | |
| #define X86_VENDOR	new_cpu_data+CPUINFO_x86_vendor
 | |
| #define X86_MODEL	new_cpu_data+CPUINFO_x86_model
 | |
| #define X86_STEPPING	new_cpu_data+CPUINFO_x86_stepping
 | |
| #define X86_HARD_MATH	new_cpu_data+CPUINFO_hard_math
 | |
| #define X86_CPUID	new_cpu_data+CPUINFO_cpuid_level
 | |
| #define X86_CAPABILITY	new_cpu_data+CPUINFO_x86_capability
 | |
| #define X86_VENDOR_ID	new_cpu_data+CPUINFO_x86_vendor_id
 | |
| 
 | |
| 
 | |
| #define SIZEOF_PTREGS 17*4
 | |
| 
 | |
| /*
 | |
|  * Worst-case size of the kernel mapping we need to make:
 | |
|  * a relocatable kernel can live anywhere in lowmem, so we need to be able
 | |
|  * to map all of lowmem.
 | |
|  */
 | |
| KERNEL_PAGES = LOWMEM_PAGES
 | |
| 
 | |
| INIT_MAP_SIZE = PAGE_TABLE_SIZE(KERNEL_PAGES) * PAGE_SIZE
 | |
| RESERVE_BRK(pagetables, INIT_MAP_SIZE)
 | |
| 
 | |
| /*
 | |
|  * 32-bit kernel entrypoint; only used by the boot CPU.  On entry,
 | |
|  * %esi points to the real-mode code as a 32-bit pointer.
 | |
|  * CS and DS must be 4 GB flat segments, but we don't depend on
 | |
|  * any particular GDT layout, because we load our own as soon as we
 | |
|  * can.
 | |
|  */
 | |
| __HEAD
 | |
| SYM_CODE_START(startup_32)
 | |
| 	movl pa(initial_stack),%ecx
 | |
| 	
 | |
| /*
 | |
|  * Set segments to known values.
 | |
|  */
 | |
| 	lgdt pa(boot_gdt_descr)
 | |
| 	movl $(__BOOT_DS),%eax
 | |
| 	movl %eax,%ds
 | |
| 	movl %eax,%es
 | |
| 	movl %eax,%fs
 | |
| 	movl %eax,%gs
 | |
| 	movl %eax,%ss
 | |
| 	leal -__PAGE_OFFSET(%ecx),%esp
 | |
| 
 | |
| /*
 | |
|  * Clear BSS first so that there are no surprises...
 | |
|  */
 | |
| 	cld
 | |
| 	xorl %eax,%eax
 | |
| 	movl $pa(__bss_start),%edi
 | |
| 	movl $pa(__bss_stop),%ecx
 | |
| 	subl %edi,%ecx
 | |
| 	shrl $2,%ecx
 | |
| 	rep ; stosl
 | |
| /*
 | |
|  * Copy bootup parameters out of the way.
 | |
|  * Note: %esi still has the pointer to the real-mode data.
 | |
|  * With the kexec as boot loader, parameter segment might be loaded beyond
 | |
|  * kernel image and might not even be addressable by early boot page tables.
 | |
|  * (kexec on panic case). Hence copy out the parameters before initializing
 | |
|  * page tables.
 | |
|  */
 | |
| 	movl $pa(boot_params),%edi
 | |
| 	movl $(PARAM_SIZE/4),%ecx
 | |
| 	cld
 | |
| 	rep
 | |
| 	movsl
 | |
| 	movl pa(boot_params) + NEW_CL_POINTER,%esi
 | |
| 	andl %esi,%esi
 | |
| 	jz 1f			# No command line
 | |
| 	movl $pa(boot_command_line),%edi
 | |
| 	movl $(COMMAND_LINE_SIZE/4),%ecx
 | |
| 	rep
 | |
| 	movsl
 | |
| 1:
 | |
| 
 | |
| #ifdef CONFIG_OLPC
 | |
| 	/* save OFW's pgdir table for later use when calling into OFW */
 | |
| 	movl %cr3, %eax
 | |
| 	movl %eax, pa(olpc_ofw_pgd)
 | |
| #endif
 | |
| 
 | |
| #ifdef CONFIG_MICROCODE
 | |
| 	/* Early load ucode on BSP. */
 | |
| 	call load_ucode_bsp
 | |
| #endif
 | |
| 
 | |
| 	/* Create early pagetables. */
 | |
| 	call  mk_early_pgtbl_32
 | |
| 
 | |
| 	/* Do early initialization of the fixmap area */
 | |
| 	movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax
 | |
| #ifdef  CONFIG_X86_PAE
 | |
| #define KPMDS (((-__PAGE_OFFSET) >> 30) & 3) /* Number of kernel PMDs */
 | |
| 	movl %eax,pa(initial_pg_pmd+0x1000*KPMDS-8)
 | |
| #else
 | |
| 	movl %eax,pa(initial_page_table+0xffc)
 | |
| #endif
 | |
| 
 | |
| 	jmp .Ldefault_entry
 | |
| SYM_CODE_END(startup_32)
 | |
| 
 | |
| #ifdef CONFIG_HOTPLUG_CPU
 | |
| /*
 | |
|  * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
 | |
|  * up already except stack. We just set up stack here. Then call
 | |
|  * start_secondary().
 | |
|  */
 | |
| SYM_FUNC_START(start_cpu0)
 | |
| 	movl initial_stack, %ecx
 | |
| 	movl %ecx, %esp
 | |
| 	call *(initial_code)
 | |
| 1:	jmp 1b
 | |
| SYM_FUNC_END(start_cpu0)
 | |
| #endif
 | |
| 
 | |
| /*
 | |
|  * Non-boot CPU entry point; entered from trampoline.S
 | |
|  * We can't lgdt here, because lgdt itself uses a data segment, but
 | |
|  * we know the trampoline has already loaded the boot_gdt for us.
 | |
|  *
 | |
|  * If cpu hotplug is not supported then this code can go in init section
 | |
|  * which will be freed later
 | |
|  */
 | |
| SYM_FUNC_START(startup_32_smp)
 | |
| 	cld
 | |
| 	movl $(__BOOT_DS),%eax
 | |
| 	movl %eax,%ds
 | |
| 	movl %eax,%es
 | |
| 	movl %eax,%fs
 | |
| 	movl %eax,%gs
 | |
| 	movl pa(initial_stack),%ecx
 | |
| 	movl %eax,%ss
 | |
| 	leal -__PAGE_OFFSET(%ecx),%esp
 | |
| 
 | |
| #ifdef CONFIG_MICROCODE
 | |
| 	/* Early load ucode on AP. */
 | |
| 	call load_ucode_ap
 | |
| #endif
 | |
| 
 | |
| .Ldefault_entry:
 | |
| 	movl $(CR0_STATE & ~X86_CR0_PG),%eax
 | |
| 	movl %eax,%cr0
 | |
| 
 | |
| /*
 | |
|  * We want to start out with EFLAGS unambiguously cleared. Some BIOSes leave
 | |
|  * bits like NT set. This would confuse the debugger if this code is traced. So
 | |
|  * initialize them properly now before switching to protected mode. That means
 | |
|  * DF in particular (even though we have cleared it earlier after copying the
 | |
|  * command line) because GCC expects it.
 | |
|  */
 | |
| 	pushl $0
 | |
| 	popfl
 | |
| 
 | |
| /*
 | |
|  * New page tables may be in 4Mbyte page mode and may be using the global pages.
 | |
|  *
 | |
|  * NOTE! If we are on a 486 we may have no cr4 at all! Specifically, cr4 exists
 | |
|  * if and only if CPUID exists and has flags other than the FPU flag set.
 | |
|  */
 | |
| 	movl $-1,pa(X86_CPUID)		# preset CPUID level
 | |
| 	movl $X86_EFLAGS_ID,%ecx
 | |
| 	pushl %ecx
 | |
| 	popfl				# set EFLAGS=ID
 | |
| 	pushfl
 | |
| 	popl %eax			# get EFLAGS
 | |
| 	testl $X86_EFLAGS_ID,%eax	# did EFLAGS.ID remained set?
 | |
| 	jz .Lenable_paging		# hw disallowed setting of ID bit
 | |
| 					# which means no CPUID and no CR4
 | |
| 
 | |
| 	xorl %eax,%eax
 | |
| 	cpuid
 | |
| 	movl %eax,pa(X86_CPUID)		# save largest std CPUID function
 | |
| 
 | |
| 	movl $1,%eax
 | |
| 	cpuid
 | |
| 	andl $~1,%edx			# Ignore CPUID.FPU
 | |
| 	jz .Lenable_paging		# No flags or only CPUID.FPU = no CR4
 | |
| 
 | |
| 	movl pa(mmu_cr4_features),%eax
 | |
| 	movl %eax,%cr4
 | |
| 
 | |
| 	testb $X86_CR4_PAE, %al		# check if PAE is enabled
 | |
| 	jz .Lenable_paging
 | |
| 
 | |
| 	/* Check if extended functions are implemented */
 | |
| 	movl $0x80000000, %eax
 | |
| 	cpuid
 | |
| 	/* Value must be in the range 0x80000001 to 0x8000ffff */
 | |
| 	subl $0x80000001, %eax
 | |
| 	cmpl $(0x8000ffff-0x80000001), %eax
 | |
| 	ja .Lenable_paging
 | |
| 
 | |
| 	/* Clear bogus XD_DISABLE bits */
 | |
| 	call verify_cpu
 | |
| 
 | |
| 	mov $0x80000001, %eax
 | |
| 	cpuid
 | |
| 	/* Execute Disable bit supported? */
 | |
| 	btl $(X86_FEATURE_NX & 31), %edx
 | |
| 	jnc .Lenable_paging
 | |
| 
 | |
| 	/* Setup EFER (Extended Feature Enable Register) */
 | |
| 	movl $MSR_EFER, %ecx
 | |
| 	rdmsr
 | |
| 
 | |
| 	btsl $_EFER_NX, %eax
 | |
| 	/* Make changes effective */
 | |
| 	wrmsr
 | |
| 
 | |
| .Lenable_paging:
 | |
| 
 | |
| /*
 | |
|  * Enable paging
 | |
|  */
 | |
| 	movl $pa(initial_page_table), %eax
 | |
| 	movl %eax,%cr3		/* set the page table pointer.. */
 | |
| 	movl $CR0_STATE,%eax
 | |
| 	movl %eax,%cr0		/* ..and set paging (PG) bit */
 | |
| 	ljmp $__BOOT_CS,$1f	/* Clear prefetch and normalize %eip */
 | |
| 1:
 | |
| 	/* Shift the stack pointer to a virtual address */
 | |
| 	addl $__PAGE_OFFSET, %esp
 | |
| 
 | |
| /*
 | |
|  * start system 32-bit setup. We need to re-do some of the things done
 | |
|  * in 16-bit mode for the "real" operations.
 | |
|  */
 | |
| 	movl setup_once_ref,%eax
 | |
| 	andl %eax,%eax
 | |
| 	jz 1f				# Did we do this already?
 | |
| 	call *%eax
 | |
| 1:
 | |
| 
 | |
| /*
 | |
|  * Check if it is 486
 | |
|  */
 | |
| 	movb $4,X86			# at least 486
 | |
| 	cmpl $-1,X86_CPUID
 | |
| 	je .Lis486
 | |
| 
 | |
| 	/* get vendor info */
 | |
| 	xorl %eax,%eax			# call CPUID with 0 -> return vendor ID
 | |
| 	cpuid
 | |
| 	movl %eax,X86_CPUID		# save CPUID level
 | |
| 	movl %ebx,X86_VENDOR_ID		# lo 4 chars
 | |
| 	movl %edx,X86_VENDOR_ID+4	# next 4 chars
 | |
| 	movl %ecx,X86_VENDOR_ID+8	# last 4 chars
 | |
| 
 | |
| 	orl %eax,%eax			# do we have processor info as well?
 | |
| 	je .Lis486
 | |
| 
 | |
| 	movl $1,%eax		# Use the CPUID instruction to get CPU type
 | |
| 	cpuid
 | |
| 	movb %al,%cl		# save reg for future use
 | |
| 	andb $0x0f,%ah		# mask processor family
 | |
| 	movb %ah,X86
 | |
| 	andb $0xf0,%al		# mask model
 | |
| 	shrb $4,%al
 | |
| 	movb %al,X86_MODEL
 | |
| 	andb $0x0f,%cl		# mask mask revision
 | |
| 	movb %cl,X86_STEPPING
 | |
| 	movl %edx,X86_CAPABILITY
 | |
| 
 | |
| .Lis486:
 | |
| 	movl $0x50022,%ecx	# set AM, WP, NE and MP
 | |
| 	movl %cr0,%eax
 | |
| 	andl $0x80000011,%eax	# Save PG,PE,ET
 | |
| 	orl %ecx,%eax
 | |
| 	movl %eax,%cr0
 | |
| 
 | |
| 	lgdt early_gdt_descr
 | |
| 	ljmp $(__KERNEL_CS),$1f
 | |
| 1:	movl $(__KERNEL_DS),%eax	# reload all the segment registers
 | |
| 	movl %eax,%ss			# after changing gdt.
 | |
| 
 | |
| 	movl $(__USER_DS),%eax		# DS/ES contains default USER segment
 | |
| 	movl %eax,%ds
 | |
| 	movl %eax,%es
 | |
| 
 | |
| 	movl $(__KERNEL_PERCPU), %eax
 | |
| 	movl %eax,%fs			# set this cpu's percpu
 | |
| 
 | |
| 	xorl %eax,%eax
 | |
| 	movl %eax,%gs			# clear possible garbage in %gs
 | |
| 
 | |
| 	xorl %eax,%eax			# Clear LDT
 | |
| 	lldt %ax
 | |
| 
 | |
| 	call *(initial_code)
 | |
| 1:	jmp 1b
 | |
| SYM_FUNC_END(startup_32_smp)
 | |
| 
 | |
| #include "verify_cpu.S"
 | |
| 
 | |
| /*
 | |
|  *  setup_once
 | |
|  *
 | |
|  *  The setup work we only want to run on the BSP.
 | |
|  *
 | |
|  *  Warning: %esi is live across this function.
 | |
|  */
 | |
| __INIT
 | |
| setup_once:
 | |
| 	andl $0,setup_once_ref	/* Once is enough, thanks */
 | |
| 	ret
 | |
| 
 | |
| SYM_FUNC_START(early_idt_handler_array)
 | |
| 	# 36(%esp) %eflags
 | |
| 	# 32(%esp) %cs
 | |
| 	# 28(%esp) %eip
 | |
| 	# 24(%rsp) error code
 | |
| 	i = 0
 | |
| 	.rept NUM_EXCEPTION_VECTORS
 | |
| 	.if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0
 | |
| 	pushl $0		# Dummy error code, to make stack frame uniform
 | |
| 	.endif
 | |
| 	pushl $i		# 20(%esp) Vector number
 | |
| 	jmp early_idt_handler_common
 | |
| 	i = i + 1
 | |
| 	.fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc
 | |
| 	.endr
 | |
| SYM_FUNC_END(early_idt_handler_array)
 | |
| 	
 | |
| SYM_CODE_START_LOCAL(early_idt_handler_common)
 | |
| 	/*
 | |
| 	 * The stack is the hardware frame, an error code or zero, and the
 | |
| 	 * vector number.
 | |
| 	 */
 | |
| 	cld
 | |
| 
 | |
| 	incl %ss:early_recursion_flag
 | |
| 
 | |
| 	/* The vector number is in pt_regs->gs */
 | |
| 
 | |
| 	cld
 | |
| 	pushl	%fs		/* pt_regs->fs (__fsh varies by model) */
 | |
| 	pushl	%es		/* pt_regs->es (__esh varies by model) */
 | |
| 	pushl	%ds		/* pt_regs->ds (__dsh varies by model) */
 | |
| 	pushl	%eax		/* pt_regs->ax */
 | |
| 	pushl	%ebp		/* pt_regs->bp */
 | |
| 	pushl	%edi		/* pt_regs->di */
 | |
| 	pushl	%esi		/* pt_regs->si */
 | |
| 	pushl	%edx		/* pt_regs->dx */
 | |
| 	pushl	%ecx		/* pt_regs->cx */
 | |
| 	pushl	%ebx		/* pt_regs->bx */
 | |
| 
 | |
| 	/* Fix up DS and ES */
 | |
| 	movl	$(__KERNEL_DS), %ecx
 | |
| 	movl	%ecx, %ds
 | |
| 	movl	%ecx, %es
 | |
| 
 | |
| 	/* Load the vector number into EDX */
 | |
| 	movl	PT_GS(%esp), %edx
 | |
| 
 | |
| 	/* Load GS into pt_regs->gs (and maybe clobber __gsh) */
 | |
| 	movw	%gs, PT_GS(%esp)
 | |
| 
 | |
| 	movl	%esp, %eax	/* args are pt_regs (EAX), trapnr (EDX) */
 | |
| 	call	early_fixup_exception
 | |
| 
 | |
| 	popl	%ebx		/* pt_regs->bx */
 | |
| 	popl	%ecx		/* pt_regs->cx */
 | |
| 	popl	%edx		/* pt_regs->dx */
 | |
| 	popl	%esi		/* pt_regs->si */
 | |
| 	popl	%edi		/* pt_regs->di */
 | |
| 	popl	%ebp		/* pt_regs->bp */
 | |
| 	popl	%eax		/* pt_regs->ax */
 | |
| 	popl	%ds		/* pt_regs->ds (always ignores __dsh) */
 | |
| 	popl	%es		/* pt_regs->es (always ignores __esh) */
 | |
| 	popl	%fs		/* pt_regs->fs (always ignores __fsh) */
 | |
| 	popl	%gs		/* pt_regs->gs (always ignores __gsh) */
 | |
| 	decl	%ss:early_recursion_flag
 | |
| 	addl	$4, %esp	/* pop pt_regs->orig_ax */
 | |
| 	iret
 | |
| SYM_CODE_END(early_idt_handler_common)
 | |
| 
 | |
| /* This is the default interrupt "handler" :-) */
 | |
| SYM_FUNC_START(early_ignore_irq)
 | |
| 	cld
 | |
| #ifdef CONFIG_PRINTK
 | |
| 	pushl %eax
 | |
| 	pushl %ecx
 | |
| 	pushl %edx
 | |
| 	pushl %es
 | |
| 	pushl %ds
 | |
| 	movl $(__KERNEL_DS),%eax
 | |
| 	movl %eax,%ds
 | |
| 	movl %eax,%es
 | |
| 	cmpl $2,early_recursion_flag
 | |
| 	je hlt_loop
 | |
| 	incl early_recursion_flag
 | |
| 	pushl 16(%esp)
 | |
| 	pushl 24(%esp)
 | |
| 	pushl 32(%esp)
 | |
| 	pushl 40(%esp)
 | |
| 	pushl $int_msg
 | |
| 	call _printk
 | |
| 
 | |
| 	call dump_stack
 | |
| 
 | |
| 	addl $(5*4),%esp
 | |
| 	popl %ds
 | |
| 	popl %es
 | |
| 	popl %edx
 | |
| 	popl %ecx
 | |
| 	popl %eax
 | |
| #endif
 | |
| 	iret
 | |
| 
 | |
| hlt_loop:
 | |
| 	hlt
 | |
| 	jmp hlt_loop
 | |
| SYM_FUNC_END(early_ignore_irq)
 | |
| 
 | |
| __INITDATA
 | |
| 	.align 4
 | |
| SYM_DATA(early_recursion_flag, .long 0)
 | |
| 
 | |
| __REFDATA
 | |
| 	.align 4
 | |
| SYM_DATA(initial_code,		.long i386_start_kernel)
 | |
| SYM_DATA(setup_once_ref,	.long setup_once)
 | |
| 
 | |
| #ifdef CONFIG_PAGE_TABLE_ISOLATION
 | |
| #define	PGD_ALIGN	(2 * PAGE_SIZE)
 | |
| #define PTI_USER_PGD_FILL	1024
 | |
| #else
 | |
| #define	PGD_ALIGN	(PAGE_SIZE)
 | |
| #define PTI_USER_PGD_FILL	0
 | |
| #endif
 | |
| /*
 | |
|  * BSS section
 | |
|  */
 | |
| __PAGE_ALIGNED_BSS
 | |
| 	.align PGD_ALIGN
 | |
| #ifdef CONFIG_X86_PAE
 | |
| .globl initial_pg_pmd
 | |
| initial_pg_pmd:
 | |
| 	.fill 1024*KPMDS,4,0
 | |
| #else
 | |
| .globl initial_page_table
 | |
| initial_page_table:
 | |
| 	.fill 1024,4,0
 | |
| #endif
 | |
| 	.align PGD_ALIGN
 | |
| initial_pg_fixmap:
 | |
| 	.fill 1024,4,0
 | |
| .globl swapper_pg_dir
 | |
| 	.align PGD_ALIGN
 | |
| swapper_pg_dir:
 | |
| 	.fill 1024,4,0
 | |
| 	.fill PTI_USER_PGD_FILL,4,0
 | |
| .globl empty_zero_page
 | |
| empty_zero_page:
 | |
| 	.fill 4096,1,0
 | |
| EXPORT_SYMBOL(empty_zero_page)
 | |
| 
 | |
| /*
 | |
|  * This starts the data section.
 | |
|  */
 | |
| #ifdef CONFIG_X86_PAE
 | |
| __PAGE_ALIGNED_DATA
 | |
| 	/* Page-aligned for the benefit of paravirt? */
 | |
| 	.align PGD_ALIGN
 | |
| SYM_DATA_START(initial_page_table)
 | |
| 	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0	/* low identity map */
 | |
| # if KPMDS == 3
 | |
| 	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0
 | |
| 	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0
 | |
| 	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR+0x2000),0
 | |
| # elif KPMDS == 2
 | |
| 	.long	0,0
 | |
| 	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0
 | |
| 	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR+0x1000),0
 | |
| # elif KPMDS == 1
 | |
| 	.long	0,0
 | |
| 	.long	0,0
 | |
| 	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0
 | |
| # else
 | |
| #  error "Kernel PMDs should be 1, 2 or 3"
 | |
| # endif
 | |
| 	.align PAGE_SIZE		/* needs to be page-sized too */
 | |
| 
 | |
| #ifdef CONFIG_PAGE_TABLE_ISOLATION
 | |
| 	/*
 | |
| 	 * PTI needs another page so sync_initial_pagetable() works correctly
 | |
| 	 * and does not scribble over the data which is placed behind the
 | |
| 	 * actual initial_page_table. See clone_pgd_range().
 | |
| 	 */
 | |
| 	.fill 1024, 4, 0
 | |
| #endif
 | |
| 
 | |
| SYM_DATA_END(initial_page_table)
 | |
| #endif
 | |
| 
 | |
| .data
 | |
| .balign 4
 | |
| /*
 | |
|  * The SIZEOF_PTREGS gap is a convention which helps the in-kernel unwinder
 | |
|  * reliably detect the end of the stack.
 | |
|  */
 | |
| SYM_DATA(initial_stack,
 | |
| 		.long init_thread_union + THREAD_SIZE -
 | |
| 		SIZEOF_PTREGS - TOP_OF_KERNEL_STACK_PADDING)
 | |
| 
 | |
| __INITRODATA
 | |
| int_msg:
 | |
| 	.asciz "Unknown interrupt or fault at: %p %p %p\n"
 | |
| 
 | |
| #include "../../x86/xen/xen-head.S"
 | |
| 
 | |
| /*
 | |
|  * The IDT and GDT 'descriptors' are a strange 48-bit object
 | |
|  * only used by the lidt and lgdt instructions. They are not
 | |
|  * like usual segment descriptors - they consist of a 16-bit
 | |
|  * segment size, and 32-bit linear address value:
 | |
|  */
 | |
| 
 | |
| 	.data
 | |
| 	ALIGN
 | |
| # early boot GDT descriptor (must use 1:1 address mapping)
 | |
| 	.word 0				# 32 bit align gdt_desc.address
 | |
| SYM_DATA_START_LOCAL(boot_gdt_descr)
 | |
| 	.word __BOOT_DS+7
 | |
| 	.long boot_gdt - __PAGE_OFFSET
 | |
| SYM_DATA_END(boot_gdt_descr)
 | |
| 
 | |
| # boot GDT descriptor (later on used by CPU#0):
 | |
| 	.word 0				# 32 bit align gdt_desc.address
 | |
| SYM_DATA_START(early_gdt_descr)
 | |
| 	.word GDT_ENTRIES*8-1
 | |
| 	.long gdt_page			/* Overwritten for secondary CPUs */
 | |
| SYM_DATA_END(early_gdt_descr)
 | |
| 
 | |
| /*
 | |
|  * The boot_gdt must mirror the equivalent in setup.S and is
 | |
|  * used only for booting.
 | |
|  */
 | |
| 	.align L1_CACHE_BYTES
 | |
| SYM_DATA_START(boot_gdt)
 | |
| 	.fill GDT_ENTRY_BOOT_CS,8,0
 | |
| 	.quad 0x00cf9a000000ffff	/* kernel 4GB code at 0x00000000 */
 | |
| 	.quad 0x00cf92000000ffff	/* kernel 4GB data at 0x00000000 */
 | |
| SYM_DATA_END(boot_gdt)
 |