Fix warning from pmd_bad() at bootup on a HIGHMEM64G HIGHPTE x86_32.
That came from 9fc34113f6 x86: debug pmd_bad();
but we understand now that the typecasting was wrong for PAE in the previous
version: pagetable pages above 4GB looked bad and stopped Arjan from booting.
And revert that cded932b75 x86: fix pmd_bad
and pud_bad to support huge pages. It was the wrong way round: we shouldn't
weaken every pmd_bad and pud_bad check to let huge pages slip through - in
part they check that we _don't_ have a huge page where it's not expected.
Put the x86 pmd_bad() and pud_bad() definitions back to what they have long
been: they can be improved (x86_32 should use PTE_MASK, to stop PAE thinking
junk in the upper word is good; and x86_64 should follow x86_32's stricter
comparison, to stop thinking any subset of required bits is good); but that
should be a later patch.
Fix Hans' good observation that follow_page() will never find pmd_huge()
because that would have already failed the pmd_bad test: test pmd_huge in
between the pmd_none and pmd_bad tests. Tighten x86's pmd_huge() check?
No, once it's a hugepage entry, it can get quite far from a good pmd: for
example, PROT_NONE leaves it with only ACCESSED of the KERN_PGTABLE bits.
However... though follow_page() contains this and another test for huge
pages, so it's nice to keep it working on them, where does it actually get
called on a huge page? get_user_pages() checks is_vm_hugetlb_page(vma) to
to call alternative hugetlb processing, as does unmap_vmas() and others.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Earlier-version-tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeff Chua <jeff.chua.linux@gmail.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes:
sched: default to n for GROUP_SCHED and FAIR_GROUP_SCHED
sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
sched, x86: add HAVE_UNSTABLE_SCHED_CLOCK
sched: fix cpu clock
sched: fair-group: fix a Div0 error of the fair group scheduler
sched: fix missing locking in sched_domains code
sched: make clock sync tunable by architecture code
sched: fix debugging
sched: fix sched_info_switch not being called according to documentation
sched: fix hrtick_start_fair and CPU-Hotplug
sched: fix SCHED_FAIR wake-idle logic error
sched: fix RT task-wakeup logic
sched: add statics, don't return void expressions
sched: add debug checks to idle functions
sched: remove old sched doc
sched: make rt_sched_class, idle_sched_class static
sched: optimize calc_delta_mine()
sched: fix normalized sleeper
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86 PCI: call dmi_check_pciprobe()
x86/pci: add pci=skip_isa_align command lines.
x86/pci: remove flag in pci_cfg_space_size_ext
x86: fix section mismatch in pci_scan_bus
this change:
| commit 08f1c192c3
| Author: Muli Ben-Yehuda <muli@il.ibm.com>
| Date: Sun Jul 22 00:23:39 2007 +0300
|
| x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata
|
| This patch introduces struct pci_sysdata to x86 and x86-64, and
| converts the existing two users (NUMA, Calgary) to use it.
|
| This lays the groundwork for having other users of sysdata, such as
| the PCI domains work.
|
| The Calgary bits are tested, the NUMA bits just look ok.
replaces pcibios_scan_root with pci_scan_bus_parented...
but in pcibios_scan_root we have a DMI check:
dmi_check_system(pciprobe_dmi_table);
when when have several peer root buses this could be called multiple
times (which is bad), so move that call to pci_access_init().
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
so we don't align the io port start address for pci cards.
also move out dmi check out acpi.c, because it has nothing to do with acpi.
it could spare some calling when we have several peer root buses.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Fix x86 setup printk format warming:
next-20080430/arch/x86/kernel/setup.c:172: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'ssize_t'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add the missing MODULE_LICENSE("GPL").
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
http://bugzilla.kernel.org/show_bug.cgi?id=10547
Newer Dell OptiPlex 745s hang before rebooting after 'sudo reboot'.
A patch for some versions of the OptiPlex was proposed here --
http://lkml.org/lkml/2007/6/5/59 -- and is included in 2.6.23 and
later kernels, according to
http://lxr.linux.no/linux+v2.6.23/arch/i386/kernel/reboot.c . However,
the DMI_BOARD_NAME ("0WF810") is too restrictive. Newer OptiPlex
machines have a DMI_BOARD_NAME of "0RF703". I therefore suggest
adding another clause to reboot.c, similar to the one in the original
patch, but matching a DMI_BOARD_NAME of "0RF703".
On further inspection, it seems that there are other DMI_BOARD_NAMEs
for this same machine. They seem to change from time to time, which
means that the current code is fragile. Moreover, using bios reboot
should not break non-SFF OptiPlex 745s, and so a reasonable fix is to
simply drop the match on DMI_BOARD_NAME.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch makes the needlessly global additional_cpus static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In kernel/acpi/realmode/Makefile use the 'always'
variable to say that wakeup.bin should always
be made.
In acpi/Makefile we then do not need to specify the
requested target and we avoid the message from make:
`arch/x86/kernel/acpi/realmode/wakeup.bin' is up to date.
Add wakeup.lds to list af targets to avoid rebuilding
wakeup.bin - from Roland McGrath.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@suse.cz>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/pci/Makefile_32 has a nasty detail. VISWS and NUMAQ build
override the generic pci-y rules. This needs a proper cleanup, but
that needs more thoughts. Undo
commit 895d30935e
x86: numaq fix
do not override the existing pci-y rule when adding visws or
numaq rules.
There is also a stupid init function ordering problem vs. acpi.o
Add comments to the Makefile to avoid tripping over this again.
Remove the srat stub code in discontig_32.c to allow a proper NUMAQ
build.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since the pv_apic_ops are only present if CONFIG_X86_LOCAL_APIC is compiled
in, kvmclock failed to build without this option. This patch fixes this.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
nonpae guests can call rmap_write_protect twice per page (for page tables)
or four times per page (for page directories), triggering a bogus warning.
Remove the warning.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This make sure not to schedule in atomic during fx_init. I also
changed the name of fpu_init to fx_finit to avoid duplicating the name
with fpu_init that is already used in the kernel, this makes grep
simpler if nothing else.
Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Clear pending exceptions when setting new register values. This avoids
spurious exceptions after restoring a vcpu state or after
reset-on-triple-fault.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The in-kernel PIT emulation ignores pending timers if operating under
mode 4, which for example DragonFlyBSD uses (and Plan9 too, apparently).
Mode 4 seems to be similar to one-shot mode, other than the fact that it
starts counting after the next CLK pulse once programmed, while mode 1
starts counting immediately, so add a FIXME to enhance precision.
Fixes sourceforge bug 1952988.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The recent changes allowing memory operands with lmsw and smsw left
lmsw with writeback enabled. Since lmsw has no oridinary destination
operand, the dst pointer was not initialized, resulting in an oops.
Close the hole by disabling writeback for lmsw.
Signed-off-by: Avi Kivity <avi@qumranet.com>
[aliguory: plug leak]
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Currently EPT level is 4 for both pae and x86_64. The patch remove the #ifdef
for alloc root_hpa and free root_hpa to support EPT.
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The function get_tdp_level() provided the number of tdp level for EPT and
NPT rather than the NPT specific macro.
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Move some definitions to mmu.h in order to allow building common table
entries between EPT and non-EPT.
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This replaces the duplicated arch-specific versions of "sys_pipe()" with
one unified implementation. This removes almost 250 lines of duplicated
code.
It's marked __weak, so that *if* an architecture wants to override the
default implementation it can do so by simply having its own replacement
version, since many architectures use alternate calling conventions for
the 'pipe()' system call for legacy reasons (ie traditional UNIX
implementations often return the two file descriptors in registers)
I still haven't changed the cris version even though Linus says the BKL
isn't needed. The arch maintainer can easily do it if there are really
no obstacles.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename div64_64 to div64_u64 to make it consistent with the other divide
functions, so it clearly includes the type of the divide. Move its definition
to math64.h as currently no architecture overrides the generic implementation.
They can still override it of course, but the duplicated declarations are
avoided.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
So Ingo finally did figure out why UML broke with this option: UML
passes gcc the -fno-unit-at-a-time flag, and apparently that wreaks
havoc with gcc's inlining.
We could turn off -fno-unit-at-a-time for UML for gcc4+ (which is what
x86 does), but there's bad blood about this whole option, and it does
show that the thing is just fragile as heck.
So let tempers cool, and disable the thing, and we can revisit the
decision later.
Cc: Adrian Bunk <bunk@kernel.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch back to 8K stacks as the safer default. Out-of-memory
situations are less problematic than silent and hard to debug
stack corruption.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
bdd3cee2e4 (x86: ioremap(), extend check
to all RAM pages) breaks OLPC's ioremap call. The ioremap that OLPC uses is:
romsig = ioremap(0xffffffc0, 16);
The commit that breaks it is basically:
- for (pfn = phys_addr >> PAGE_SHIFT; pfn < max_pfn_mapped &&
- (pfn << PAGE_SHIFT) < last_addr; pfn++) {
+ for (pfn = phys_addr >> PAGE_SHIFT;
+ (pfn << PAGE_SHIFT) < last_addr; pfn++) {
+
Previously, the 'pfn < max_pfn_mapped' check would've caused us to not
enter the loop. Removing that check means we loop infinitely. The
reason for that is because pfn is 0xfffff, and last_addr is 0xffffffcf.
The remaining check that is used to exit the loop is not sufficient;
when pfn<<PAGE_SHIFT is 0xfffff000, that is less than 0xffffffcf; when
we increment pfn and it overflows (pfn == 0x100000), pfn<<PAGE_SHIFT
ends up being 0. That, of course, is less than last_addr. In effect,
pfn<<PAGE_SHIFT is never lower than last_addr.
The simple fix for this is to limit the last_addr check to the PAGE_MASK;
a patch is below.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Andrew noticed that OPTIMIZE_INLINING appeared in the toplevel
menu - fix it.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use UC_MINUS for ioremap(), ioremap_nocache() instead of strong UC.
Once all the X drivers move to ioremap_wc(), we can go back to strong
UC semantics for ioremap() and ioremap_nocache().
To avoid attribute aliasing issues, pci_mmap_page_range() will also
use UC_MINUS for default non write-combining mapping request.
Next steps:
a) change all the video drivers using ioremap() or ioremap_nocache()
and adding WC MTTR using mttr_add() to ioremap_wc()
b) for strict usage, we can go back to strong uc semantics
for ioremap() and ioremap_nocache() after some grace period for
completing step-a.
c) user level X server needs to use the appropriate method for setting
up WC mapping (like using resourceX_wc sysfs file instead of
adding MTRR for WC and using /dev/mem or resourceX under /sys)
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Daniel Drake <dsd@gentoo.org> reported:
In 2.6.23, if you unpacked a kernel source tarball and then
ran "make menuconfig" you'd be presented with this message:
# using defaults found in arch/i386/defconfig
and the default options would be set.
The same thing in 2.6.24 does not give you any "using defaults" message, and
the default config options within menuconfig are rather blank (e.g. no PCI
support). You can work around this by explicitly running "make defconfig"
before menuconfig, but it would be nice to have the behaviour the way it was
for 2.6.23 (and the way it still is for other archs).
Fixed by adding a x86 specific defconfig list to Kconfig.
Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=10470
Tested-by: dsd@gentoo.org
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Vegard Nossum reported a large (150 seconds) boot delay during bootup,
and bisected it to "x86: ioremap(), extend check to all RAM pages"
(commit bdd3cee2e4). Revert this commit for now.
Bisected-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The kernel prints the compat vdso address regardless of whether compat
vdso mode is enabled or not, which is confusing. Given that this
isn't very interesting information anyway, just remove the printk.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Gerhard Mack <gmack@innerfire.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Don't warn in read_apic_id() when preemptible but only one CPU online.
Signed-off-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The .asciz directive takes any number of strings, but each one is zero-
terminated, and string pasting is not done as in C. That results in only the
first line being output.
Replace .asciz with multiple .ascii directives and terminate with .asciz.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The iommu_sac_force variable is needlessly defined global,
and this patch makes it static. Additionally, this variable
needs not be explicitly initialized.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes one sparse warning by including the appropriate
header for the reboot_force symbol.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
the 'reboot_force' flag is a notion that non-PC subarchitectures do
not have.
also, unify the X86_BIOS_REBOOT option between 32-bit and 64-bit
and get rid of a few unnecessary Kconfig and Makefile complications
that way.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>