linux/arch/x86/mm
Henry Nestler b29c701dea x86: fix endless page faults in mount_block_root for Linux 2.6
Page faults in kernel address space between PAGE_OFFSET up to
VMALLOC_START should not try to map as vmalloc.

Fix rarely endless page faults inside mount_block_root for root
filesystem at boot time.

All 32bit kernels up to 2.6.25 can fail into this hole.
I can not present this under native linux kernel. I see, that the 64bit
has fixed the problem. I copied the same lines into 32bit part.

Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation):
http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recursive-page-fault-endless.txt
The physicaly memory was trimmed down to 192MB to better catch the bug.
More memory gets the bug more rarely.

Details, how every x86 32bit system can fail:

Start from "mount_block_root",
http://lxr.linux.no/linux/init/do_mounts.c#L297
There the variable "fs_names" got one memory page with 4096 bytes.
Variable "p" walks through the existing file system types. The first
string is no problem.
But, with the second loop in mount_block_root the offset of "p" is not
at beginning of page, the offset is for example +9, if "reiserfs" is the
first in list.
Than calls do_mount_root, and lands in sys_mount.
Remember: Variable "type_page" contains now "fs_type+9" and not contains
a full page.
The sys_mount copies 4096 bytes with function "exact_copy_from_user()":
http://lxr.linux.no/linux/fs/namespace.c#L1540

Mostly exist pages after the buffer "fs_names+4096+9" and the page fault
handler was not called. No problem.

In the case, if the page after "fs_names+4096" is not mapped, the page
fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320

The do_page_fault gots an address 0xc03b4000.
It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's
from "__getname()" alias "kmem_cache_alloc".
The "error_code" is 0. "vmalloc_fault" will be call:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332

"vmalloc_fault" tryed to find the physical page for a non existing
virtual memory area. The macro "pte_present" in vmalloc_fault()
got a next page fault for 0xc0000ed0 at:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282

No PTE exist for such virtual address. The page fault handler was trying
to sync the physical page for the PTE lockup.

This called vmalloc_fault() again for address 0xc000000, and that also
was not existing. The endless began...

In normal case the cpu would still loop with disabled interrrupts. Under
coLinux this was catched by a stack overflow inside printk debugs.

Signed-off-by: Henry Nestler <henry.nestler@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-12 21:26:07 +02:00
..
discontig_32.c x86: undo visws/numaq build changes 2008-05-04 20:04:45 +02:00
dump_pagetables.c "make namespacecheck" fixes 2008-04-24 23:15:44 +02:00
extable.c x86: unify extable_{32|64}.c 2008-01-30 13:31:41 +01:00
fault.c x86: fix endless page faults in mount_block_root for Linux 2.6 2008-06-12 21:26:07 +02:00
highmem_32.c x86: unexport kmap_atomic_to_page 2008-04-30 23:15:34 +02:00
hugetlbpage.c x86: stricter check in follow_huge_addr() 2008-03-27 16:08:45 +01:00
init_32.c x86: fix app crashes after SMP resume 2008-05-13 19:36:12 +02:00
init_64.c x86: fix pointer type warning in arch/x86/mm/init_64.c:early_memtest 2008-06-04 13:11:47 +02:00
ioremap.c x86: ioremap fix failing nesting check 2008-06-04 13:11:47 +02:00
k8topology_64.c acpi: get boot_cpu_id as early for k8_scan_nodes 2008-04-26 23:41:04 +02:00
Makefile x86: add common mm/pgtable.c 2008-04-24 23:57:30 +02:00
mmap.c x86: unify mmap_{32|64}.c 2008-01-30 13:31:10 +01:00
numa_64.c x86_64: fix setup_node_bootmem to support big mem excluding with memmap 2008-04-26 22:51:08 +02:00
pageattr-test.c x86: remove over noisy debug printk 2008-02-11 11:24:24 -08:00
pageattr.c x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range() 2008-04-30 23:15:35 +02:00
pat.c x86: section mismatch fix 2008-06-04 13:11:47 +02:00
pgtable_32.c x86: fix PAE pmd_bad bootup warning 2008-05-06 13:08:58 -07:00
pgtable.c x86: unify pgd ctor/dtor 2008-04-24 23:57:31 +02:00
srat_64.c "make namespacecheck" fixes 2008-04-24 23:15:44 +02:00