Some syscalls need to access the pt_regs structure, either to copy
user register state or to modifiy it. This patch adds stubs to load
the address of the pt_regs struct into the %eax register, and changes
the syscalls to take the pointer as an argument instead of relying on
the assumption that the pt_regs structure overlaps the function
arguments.
Drop the use of regparm(1) due to concern about gcc bugs, and to move
in the direction of the eventual removal of regparm(0) for asmlinkage.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
arch/x86/kernel/mpparse.c: In function ‘smp_scan_config’:
arch/x86/kernel/mpparse.c:696: warning: format ‘%08lx’ expects type ‘long unsigned int’, but argument 3 has type ‘phys_addr_t’
arch/x86/kernel/mpparse.c: In function ‘update_mp_table’:
arch/x86/kernel/mpparse.c:1014: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 2 has type ‘phys_addr_t’
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers: fix TIMER_ABSTIME for process wide cpu timers
timers: split process wide cpu clocks/timers, fix
x86: clean up hpet timer reinit
timers: split process wide cpu clocks/timers, remove spurious warning
timers: split process wide cpu clocks/timers
signal: re-add dead task accumulation stats.
x86: fix hpet timer reinit for x86_64
sched: fix nohz load balancer on cpu offline
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ptrace, x86: fix the usage of ptrace_fork()
i8327: fix outb() parameter order
x86: fix math_emu register frame access
x86: math_emu info cleanup
x86: include correct %gs in a.out core dump
x86, vmi: put a missing paravirt_release_pmd in pgd_dtor
x86: find nr_irqs_gsi with mp_ioapic_routing
x86: add clflush before monitor for Intel 7400 series
x86: disable intel_iommu support by default
x86: don't apply __supported_pte_mask to non-present ptes
x86: fix grammar in user-visible BIOS warning
x86/Kconfig.cpu: make Kconfig help readable in the console
x86, 64-bit: print DMI info in the oops trace
Ptrace_detach() races with __ptrace_unlink() if the traced task is
reaped while detaching. This might cause a double-free of the BTS
buffer.
Change the ptrace_detach() path to only do the memory accounting in
ptrace_bts_detach() and leave the buffer free to ptrace_bts_untrace()
which will be called from __ptrace_unlink().
The fix follows a proposal from Oleg Nesterov.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that no functions rely on struct pt_regs being passed by value,
various "no stack protector" annotations can be dropped.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some syscalls need to access the pt_regs structure, either to copy
user register state or to modifiy it. This patch adds stubs to load
the address of the pt_regs struct into the %eax register, and changes
the syscalls to regparm(1) to receive the pt_regs pointer as the
first argument.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The generic exception handler (error_code) passes in the pt_regs
pointer and the error code (unused in this case). The commit
"x86: fix math_emu register frame access" changed this to pass by
value, which doesn't work correctly with stack protector enabled.
Change it back to use the pt_regs pointer.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix x86_32 stack protector
Brian Gerst found out that %gs was being initialized to stack_canary
instead of stack_canary - 20, which basically gave the same canary
value for all threads. Fixing this also exposed the following bugs.
* cpu_idle() didn't call boot_init_stack_canary()
* stack canary switching in switch_to() was being done too late making
the initial run of a new thread use the old stack canary value.
Fix all of them and while at it update comment in cpu_idle() about
calling boot_init_stack_canary().
Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With refactoring of wake_cpu macros the 32bit code in tip doesn't
execute generic_apic_probe if CONFIG_X86_32_NON_STANDARD is not set.
Even on a x86 STANDARD cpu we need to execute the generic_apic_probe
function, as we rely on this function to execute the update_genapic
quirk which initilizes apic->wakeup_cpu.
Failing to do so results in we making a call to a null function in do_boot_cpu.
The stack trace without the patch goes like this.
Booting processor 1 APIC 0x1 ip 0x6000
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<(null)>] (null)
*pdpt = 0000000000839001 *pde = 0000000000c97067 *pte = 0000000000000163
Oops: 0000 [#1] SMP
last sysfs file:
Modules linked in:
Pid: 1, comm: swapper Not tainted (2.6.29-rc4-tip #18) VMware Virtual Platform
EIP: 0062:[<00000000>] EFLAGS: 00010293 CPU: 0
EIP is at 0x0
EAX: 00000001 EBX: 00006000 ECX: c077ed00 EDX: 00006000
ESI: 00000001 EDI: 00000001 EBP: ef04cf40 ESP: ef04cf1c
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 006a
Process swapper (pid: 1, ti=ef04c000 task=ef050000 task.ti=ef04c000)
Stack:
c0644e52 00000000 ef04cf24 ef04cf24 c064468d c0886dc0 00000000 c0702aea
ef055480 00000001 00000101 dead4ead ffffffff ffffffff c08af530 00000000
c0709715 ef04cf60 ef04cf60 00000001 00000000 00000000 dead4ead ffffffff
Call Trace:
[<c0644e52>] ? native_cpu_up+0x2de/0x45b
[<c064468d>] ? do_fork_idle+0x0/0x19
[<c0645c5e>] ? _cpu_up+0x88/0xe8
[<c0645d20>] ? cpu_up+0x42/0x4e
[<c07e7462>] ? kernel_init+0x99/0x14b
[<c07e73c9>] ? kernel_init+0x0/0x14b
[<c040375f>] ? kernel_thread_helper+0x7/0x10
Code: Bad EIP value.
EIP: [<00000000>] 0x0 SS:ESP 006a:ef04cf1c
I think we should call generic_apic_probe unconditionally for 32 bit now.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The constraint used for retrieving and restoring the parent function
pointer is incorrect. The parent variable is a pointer, and the
address of the pointer is modified by the asm statement and not
the pointer itself. It is incorrect to pass it in as an output
constraint since the asm will never update the pointer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix to prevent a kernel crash on fault
If for some reason the pointer to the parent function on the
stack takes a fault, the fix up code will not return back to
the original faulting code. This can lead to unpredictable
results and perhaps even a kernel panic.
A fault should not happen, but if it does, we should simply
disable the tracer, warn, and continue running the kernel.
It should not lead to a kernel crash.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
In i8237A_resume(), when resetting the DMA controller, the parameters to
dma_outb() were mixed up.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
[ cleaned up the file a tiny bit. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: stack protector for x86_32
Implement stack protector for x86_32. GDT entry 28 is used for it.
It's set to point to stack_canary-20 and have the length of 24 bytes.
CONFIG_CC_STACKPROTECTOR turns off CONFIG_X86_32_LAZY_GS and sets %gs
to the stack canary segment on entry. As %gs is otherwise unused by
the kernel, the canary can be anywhere. It's defined as a percpu
variable.
x86_32 exception handlers take register frame on stack directly as
struct pt_regs. With -fstack-protector turned on, gcc copies the
whole structure after the stack canary and (of course) doesn't copy
back on return thus losing all changed. For now, -fno-stack-protector
is added to all files which contain those functions. We definitely
need something better.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: pt_regs changed, lazy gs handling made optional, add slight
overhead to SAVE_ALL, simplifies error_code path a bit
On x86_32, %gs hasn't been used by kernel and handled lazily. pt_regs
doesn't have place for it and gs is saved/loaded only when necessary.
In preparation for stack protector support, this patch makes lazy %gs
handling optional by doing the followings.
* Add CONFIG_X86_32_LAZY_GS and place for gs in pt_regs.
* Save and restore %gs along with other registers in entry_32.S unless
LAZY_GS. Note that this unfortunately adds "pushl $0" on SAVE_ALL
even when LAZY_GS. However, it adds no overhead to common exit path
and simplifies entry path with error code.
* Define different user_gs accessors depending on LAZY_GS and add
lazy_save_gs() and lazy_load_gs() which are noop if !LAZY_GS. The
lazy_*_gs() ops are used to save, load and clear %gs lazily.
* Define ELF_CORE_COPY_KERNEL_REGS() which always read %gs directly.
xen and lguest changes need to be verified.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
On x86_32, %gs is handled lazily. It's not saved and restored on
kernel entry/exit but only when necessary which usually is during task
switch but there are few other places. Currently, it's done by
calling savesegment() and loadsegment() explicitly. Define
get_user_gs(), set_user_gs() and task_user_gs() and use them instead.
While at it, clean up register access macros in signal.c.
This cleans up code a bit and will help future changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Use .macro instead of cpp #define where approriate. This cleans up
code and will ease future changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
do_device_not_available() is the handler for #NM and it declares that
it takes a unsigned long and calls math_emu(), which takes a long
argument and surprisingly expects the stack frame starting at the zero
argument would match struct math_emu_info, which isn't true regardless
of configuration in the current code.
This patch makes do_device_not_available() take struct pt_regs like
other exception handlers and initialize struct math_emu_info with
pointer to it and pass pointer to the math_emu_info to math_emulate()
like normal C functions do. This way, unless gcc makes a copy of
struct pt_regs in do_device_not_available(), the register frame is
correctly accessed regardless of kernel configuration or compiler
used.
This doesn't fix all math_emu problems but it at least gets it
somewhat working.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Unstatic ioapic_write_entry and setup_ioapic_entry functions so that
the Xen code can do its own ioapic routing setup.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Add mp_find_ioapic_pin() to find an IO APIC's specific pin from a GSI,
and use this function within acpi/boot. Make it non-static so other
code can use it too.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] powernow-k8: Get transition latency from ACPI _PSS table
[CPUFREQ] Make ignore_nice_load setting of ondemand work as expected.
to prevent wrongly overwriting fixmap that still want to use.
ACPI used to rely on low mappings being all linearly mapped and
grew a habit: it never really unmapped certain kinds of tables
after use.
This can cause problems - for example the hypothetical case
when some spurious access still references it.
v2: remove prev_map and prev_size in __apci_map_table
v3: let acpi_os_unmap_memory() call early_iounmap too, so remove extral calling to
early_acpi_os_unmap_memory
v4: fix typo in one acpi_get_table_with_size calling
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On x86, __acpi_map_table uses early_ioremap() to create the mapping,
replacing the previous mapping with a new one. Once enough of the
kernel is up an running it switches to using normal ioremap(). At
that point, we need to clean up the final mapping to avoid a warning
from the early_ioremap subsystem.
This can be removed after all the instances in the ACPI code are fixed
that rely on early-ioremap's implicit overmapping of previously
mapped tables.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Always map acpi tables, rather than assuming we can use the normal
linear mapping to access the acpi tables. This is necessary in a
virtual environment where the linear mappings are to pseudo-physical
memory, but the acpi tables exist at a real physical address. It
doesn't hurt to map in the normal non-virtual case, so just do it
unconditionally.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
__acpi_map_table() effectively reimplements early_ioremap(). Rather
than have that duplication, just implement it in terms of
early_ioremap().
However, unlike early_ioremap(), __acpi_map_table() just maintains a
single mapping which gets replaced each call, and has no corresponding
unmap function. Implement this by just removing the previous mapping
each time its called. Unfortunately, this will leave a stray mapping
at the end.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 6194ba6ff6 ("x86: don't special-case
pmd allocations as much") made changes to the way we handle pmd allocations,
and while doing that it dropped a call to paravirt_release_pd on the
pgd page from the pgd_dtor code path.
As a result of this missing release, the hypervisor is now unaware of the
pgd page being freed, and as a result it ends up tracking this page as a
page table page.
After this the guest may start using the same page for other purposes, and
depending on what use the page is put to, it may result in various performance
and/or functional issues ( hangs, reboots).
Since this release is only required for VMI, I now release the pgd page from
the (vmi)_pgd_free hook.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Impact: find right nr_irqs_gsi on some systems.
One test-system has gap between gsi's:
[ 0.000000] ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 4, version 0, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: IOAPIC (id[0x05] address[0xfeafd000] gsi_base[48])
[ 0.000000] IOAPIC[1]: apic_id 5, version 0, address 0xfeafd000, GSI 48-54
[ 0.000000] ACPI: IOAPIC (id[0x06] address[0xfeafc000] gsi_base[56])
[ 0.000000] IOAPIC[2]: apic_id 6, version 0, address 0xfeafc000, GSI 56-62
...
[ 0.000000] nr_irqs_gsi: 38
So nr_irqs_gsi is not right. some irq for MSI will overwrite with io_apic.
need to get that with acpi_probe_gsi when acpi io_apic is used
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With the differences in interrupt handling hoisted into handle_irq(),
do_IRQ is more or less identical between 32 and 64 bit, so unify it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Xen uses a different interrupt path, so introduce handle_irq() to
allow interrupts to be inserted into the normal interrupt path. This
is handled slightly differently on 32 and 64-bit.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/early_printk.c: In function ‘early_dbgp_init’:
arch/x86/kernel/early_printk.c:827: error: ‘PAGE_KERNEL_NOCACHE’ undeclared (first use in this function)
arch/x86/kernel/early_printk.c:827: error: (Each undeclared identifier is reported only once
arch/x86/kernel/early_printk.c:827: error: for each function it appears in.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For Intel 7400 series CPUs, the recommendation is to use a clflush on the
monitored address just before monitor and mwait pair [1].
This clflush makes sure that there are no false wakeups from mwait when the
monitored address was recently written to.
[1] "MONITOR/MWAIT Recommendations for Intel Xeon Processor 7400 series"
section in specification update document of 7400 series
http://download.intel.com/design/xeon/specupdt/32033601.pdf
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup and bug fix
Use the linker to create symbols for certain per-cpu variables
that are offset by __per_cpu_load. This allows the removal of
the runtime fixup of the GDT pointer, which fixes a bug with
resume reported by Jiri Slaby.
Reported-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Without frame pointers enabled, the x86 stack traces should not
pretend to be reliable; instead they should just be what they are:
unreliable.
The effect of this is that they have a '?' printed in the stacktrace,
to warn the reader that these entries are guesses rather than known
based on more reliable information.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: find right nr_irqs_gsi on some systems.
One test-system has gap between gsi's:
[ 0.000000] ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 4, version 0, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: IOAPIC (id[0x05] address[0xfeafd000] gsi_base[48])
[ 0.000000] IOAPIC[1]: apic_id 5, version 0, address 0xfeafd000, GSI 48-54
[ 0.000000] ACPI: IOAPIC (id[0x06] address[0xfeafc000] gsi_base[56])
[ 0.000000] IOAPIC[2]: apic_id 6, version 0, address 0xfeafc000, GSI 56-62
...
[ 0.000000] nr_irqs_gsi: 38
So nr_irqs_gsi is not right. some irq for MSI will overwrite with io_apic.
need to get that with acpi_probe_gsi when acpi io_apic is used
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: make check-timer more robust potentially solve boot fragility
For edge trigger io-apic routing, we already unmasked the pin via
setup_IO_APIC_irq(), so don't unmask it again.
Also call local_irq_disable() between timer_irq_works(), because it
calls local_irq_enable() inside.
Also remove not needed apic version reading for 64-bit
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: make nr_irqs depend more on cards used in a system
depend on nr_irq_gsi more, and have a ratio for MSI.
v2: make nr_irqs less than NR_VECTORS * nr_cpu_ids
aka if only one cpu, we only can support nr_irqs = NR_VECTORS
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix compile problem:
CC arch/x86/kernel/early_printk.o
In file included from /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/early_printk.c:17:
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h: In function 'pmd_page':
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:516: error: implicit declaration of function '__pfn_to_section'
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:516: warning: initialization makes pointer from integer without a cast
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:516: error: implicit declaration of function '__section_mem_map_addr'
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:516: warning: return makes pointer from integer without a cast
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h: In function 'pud_page':
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:586: warning: initialization makes pointer from integer without a cast
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:586: warning: return makes pointer from integer without a cast
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h: In function 'pgd_page':
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:625: warning: initialization makes pointer from integer without a cast
/home/jeremy/hg/xen/paravirt/linux/arch/x86/include/asm/pgtable.h:625: warning: return makes pointer from integer without a cast
This is a cycling dependency between asm/pgtable.h and linux/mmzone.h
when using CONFIG_SPARSEMEM. Rather than hacking up the headers some
more, remove asm/pgtable.h, since early_printk.c doesn't actually need
it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Implement Linus's suggestion: introduce the hpet_cnt_ahead()
helper function to compare hpet time values - like other
wrapping counter comparisons are abstracted away elsewhere.
(jiffies, ktime_t, etc.)
Reported-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
only leave _default_ipi_xx etc in .h
Beyond the cleanup factor, this saves a bit of code size as well:
text data bss dec hex filename
7281931 1630144 1463304 10375379 9e50d3 vmlinux.before
7281753 1630144 1463304 10375201 9e5021 vmlinux.after
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
disable_ioapic_setup() in init/main.c is ugly as the function is
x86-specific. The #ifdef inline prototype there is ugly too.
Replace it with a generic arch_disable_smp_support() function - which
has a weak alias for non-x86 architectures and for non-ioapic x86 builds.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
At this time, the PowerNow! driver for K8 uses an experimentally
derived formula to calculate transition latency. The value it
provides is orders of magnitude too large on modern systems.
This patch replaces the formula with ACPI _PSS latency values
for more accuracy and better performance.
I've tested it on two 2nd generation Opteron systems, a 3rd
generation Operton system, and a Turion X2 without seeing any
stability problems.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
There's a small problem with hpet_rtc_reinit function - it checks
for the:
hpet_readl(HPET_COUNTER) - hpet_t1_cmp > 0
to continue increasing both the HPET_T1_CMP (register) and the
hpet_t1_cmp (variable).
But since the HPET_COUNTER is always 32-bit, if the hpet_t1_cmp
is 64-bit this condition will always be FALSE once the latter hits
the 32-bit boundary, and we can have a situation, when we don't
increase the HPET_T1_CMP register high enough.
The result - timer stops ticking, since HPET_T1_CMP becomes less,
than the COUNTER and never increased again.
The solution is (based on Linus's suggestion) to not compare 64-bits
(on 64-bit x86), but to do the comparison on 32-bit signed
integers.
Reported-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch echoes what we already do on 32-bit since
90f7d25c6b, and prints the DMI
product name in show_regs, so that system specific problems can be
easily identified.
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
They were long enough set deprecated...
Update Documentation/cpu-freq/users-guide.txt:
The deprecated files listed there seen not to exist for some time anymore
already.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Impact: reduce kernel BSS size by 7 pages, improve code readability
Two page tables are used in current x86_64 kexec implementation. One
is used to jump from kernel virtual address to identity map address,
the other is used to map all physical memory. In fact, on x86_64,
there is no conflict between kernel virtual address space and physical
memory space, so just one page table is sufficient. The page table
pages used to map control page are dynamically allocated to save
memory if kexec image is not loaded. ASM code used to map control page
is replaced by C code too.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: fix to enable APIC for AMD Fam10h on chipsets with a missing/b0rked
ACPI MP table (MADT)
Booting a 32bit kernel on an AMD Fam10h CPU running on chipsets with
missing/b0rked MP table leads to a hang pretty early in the boot process
due to the APIC not being initialized. Fix that by falling back to the
default APIC base address in 32bit code, as it is done in the 64bit
codepath.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Fixes dumpstack and KDB on 64 bits
This re-adds the old stack pointer to the top of the irqstack to help
with unwinding. It was removed in commit d99015b1ab
as part of the save_args out-of-line work.
Both dumpstack and KDB require this information.
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Zach says:
> Enable/Disable have no clobbers at all.
> Save clobbers only return value, %eax
> Restore also clobbers nothing.
This is precisely compatible with the calling convention, so we can
just call them directly without wrapping.
(Compile tested only.)
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Eric Paris reported:
> I have an hp dl785g5 which is unable to successfully run
> 2.6.29-0.66.rc3.fc11.x86_64 or 2.6.29-rc2-next-20090126. During bootup
> (early in userspace daemons starting) I get the below BUG, which quickly
> renders the machine dead. I assume it is because sparse_irq_lock never
> gets released when the BUG kills that task.
Adjust lock sequence when migrating a descriptor with
CONFIG_NUMA_MIGRATE_IRQ_DESC enabled.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, ds, bts: cleanup/fix DS configuration
ring-buffer: reset timestamps when ring buffer is reset
trace: set max latency variable to zero on default
trace: stop all recording to ring buffer on ftrace_dump
trace: print ftrace_dump at KERN_EMERG log level
ring_buffer: reset write when reserve buffer fail
tracing/function-graph-tracer: fix a regression while suspend to disk
ring-buffer: fix alignment problem
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86 setup: fix asm constraints in vesa_store_edid
xen: make sysfs files behave as their names suggest
x86: tone down mtrr_trim_uncached_memory() warning
x86: correct the CPUID pattern for MSR_IA32_MISC_ENABLE availability
[
mingo@elte.hu: these fixes are a subset of changes cherry-picked from:
git://git.kernel.org:/pub/scm/linux/kernel/git/jejb/voyager-2.6.git
They fix various problems that recent x86 changes caused in the Voyager
subarchitecture: both APIC changes and cpumask changes and certain
cleanups caused subarch assumptions to break.
Most of these changes are obsolete as the subarch code has been removed
from the x86 development tree - but we merge them upstream to make Voyager
build and boot.
]
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: split out a function, no functional change
Xen needs to be able to access percpu data from very early on. For
various reasons, it cannot also load the gdt at that time. It does,
however, have a pefectly functional gdt at that point, so there's no
pressing need to reload the gdt.
Split the function to load the segment registers off, so Xen can call
it directly.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup, prepare for xen boot fix.
Xen needs to call this function very early to setup the GDT and
per-cpu segments. Remove the call to smp_processor_id() and just
pass in the cpu number.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: fix possible tlb mis-flushing on UV
uv_flush_send_and_wait() should return a pointer if the broadcast
remote tlb shootdown requests fail. That causes the conventional IPI
method of shootdown to be used.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: Optimization
In the native case, pte_val, make_pte, etc are all just identity
functions, so there's no need to clobber a lot of registers over them.
(This changes the 32-bit callee-save calling convention to return both
EAX and EDX so functions can return 64-bit values.)
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Optimization
One of the problems with inserting a pile of C calls where previously
there were none is that the register pressure is greatly increased.
The C calling convention says that the caller must expect a certain
set of registers may be trashed by the callee, and that the callee can
use those registers without restriction. This includes the function
argument registers, and several others.
This patch seeks to alleviate this pressure by introducing wrapper
thunks that will do the register saving/restoring, so that the
callsite doesn't need to worry about it, but the callee function can
be conventional compiler-generated code. In many cases (particularly
performance-sensitive cases) the callee will be in assembler anyway,
and need not use the compiler's calling convention.
Standard calling convention is:
arguments return scratch
x86-32 eax edx ecx eax ?
x86-64 rdi rsi rdx rcx rax r8 r9 r10 r11
The thunk preserves all argument and scratch registers. The return
register is not preserved, and is available as a scratch register for
unwrapped callee code (and of course the return value).
Wrapped function pointers are themselves wrapped in a struct
paravirt_callee_save structure, in order to get some warning from the
compiler when functions with mismatched calling conventions are used.
The most common paravirt ops, both statically and dynamically, are
interrupt enable/disable/save/restore, so handle them first. This is
particularly easy since their calls are handled specially anyway.
XXX Deal with VMI. What's their calling convention?
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Fix latent bug
The clobber is trying to say that anything except RDI is available for
clobbering, but actually clobbers everything. This hasn't mattered
because the clobbers were basically ignored, but subsequent patches
will rely on them.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Optimization
Several paravirt ops implementations simply return their arguments,
the most obvious being the make_pte/pte_val class of operations on
native.
On 32-bit, the identity function is literally a no-op, as the calling
convention uses the same registers for the first argument and return.
On 64-bit, it can be implemented with a single "mov".
This patch adds special identity functions for 32 and 64 bit argument,
and machinery to recognize them and replace them with either nops or a
mov as appropriate.
At the moment, the only users for the identity functions are the
pagetable entry conversion functions.
The result is a measureable improvement on pagetable-heavy benchmarks
(2-3%, reducing the pvops overhead from 5 to 2%).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: fix rare crash on 32-bit
The 32-bit APIC drivers had their send_IPI_self vectors set to NULL,
but ioapic_retrigger_irq() depends on it being always set. Fix it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
just like 64 bit switch from flat logical APIC messages to
flat physical mode automatically.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: 32-bit should use logical version
there are two version: for default_send_IPI_mask_sequence/allbutself
one in ipi.h and one in ipi.c for 32bit
it seems .h version overwrote ipi.c for a while.
restore it so 32 bit could use its old logical version.
also remove dupicated functions in .c
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move DMA-mapping.txt to Documentation/PCI/.
DMA-mapping.txt was supposed to be moved from Documentation/ to
Documentation/PCI/. The 00-INDEX files in those two directories
were updated, along with a few other text files, but the file
itself somehow escaped being moved, so move it and update more
text files and source files with its new location.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: Cleans up printk formatting
When LOCAL APIC was calibrated, the debug message is displayed as follows.
CPU0: Intel(R) Xeon(R) CPU 5110 @ 1.60GHz stepping 06
Using local APIC timer interrupts.
calibrating APIC timer ...
... lapic delta = 3773131
... PM timer delta = 812434
APIC calibration not consistent with PM Timer: 226ms instead of 100ms
APIC delta adjusted to PM-Timer: 1662420 (3773131)
TSC delta adjusted to PM-Timer: 159592409 (362220564)
..... delta 1662420
..... mult: 71411249
..... calibration result: 265987
..... CPU clock speed is 1595.0924 MHz.
..... host bus clock speed is 265.0987 MHz.
There are three type of PM-Timer (PM-Timer, PM Timer, and PM timer),
in this message. This patch unifies those messages to PM-Timer.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: Fixes incorrect printk
LOCAL APIC is corrected by PM-Timer, when SMI occurred while LOCAL APIC is calibrated.
In this case, LOCAL APIC debug message(Boot with apic=debug) is displayed correctly,
however, CPU clock speed debug message is displayed wrongly .
When SMI occured on my machine, which has 1.6GHz CPU, CPU clock speed is displayed
3622.0205 MHz as follow.
CPU0: Intel(R) Xeon(R) CPU 5110 @ 1.60GHz stepping 06
Using local APIC timer interrupts.
calibrating APIC timer ...
... lapic delta = 3773130
... PM timer delta = 812434
APIC calibration not consistent with PM Timer: 226ms instead of 100ms
APIC delta adjusted to PM-Timer: 1662420 (3773130)
..... delta 1662420
..... mult: 71411249
..... calibration result: 265987
..... CPU clock speed is 3622.0205 MHz. =====> here
..... host bus clock speed is 265.0987 MHz.
This patch fixes to displaying CPU clock speed correctly as follow.
CPU0: Intel(R) Xeon(R) CPU 5110 @ 1.60GHz stepping 06
Using local APIC timer interrupts.
calibrating APIC timer ...
... lapic delta = 3773131
... PM timer delta = 812434
APIC calibration not consistent with PM Timer: 226ms instead of 100ms
APIC delta adjusted to PM-Timer: 1662420 (3773131)
TSC delta adjusted to PM-Timer: 159592409 (362220564)
..... delta 1662420
..... mult: 71411249
..... calibration result: 265987
..... CPU clock speed is 1595.0924 MHz.
..... host bus clock speed is 265.0987 MHz.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
X86_PC is the only remaining 'sub' architecture, so we dont need
it anymore.
This also cleans up a few spurious references to X86_PC in the
driver space - those certainly should be X86.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
xapic fix for 32bit platform with less than 8 cpu's.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
We may use macros from processor-flags.h instead
of hardcoding bits. Actually it's not direct mapping
of old instructions with new ones -- BTS does change
CF flag while MOV does not. But i didn't find any
dependency on CF in this code.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
X86_GENERICARCH is a misnomer - it contains non-PC 32-bit architectures
that are not included in the default build.
Rename it to X86_32_NON_STANDARD.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86/Voyager had this Kconfig quirk:
config X86_FIND_SMP_CONFIG
def_bool y
depends on X86_MPPARSE || X86_VOYAGER
Which splits off the find_smp_config() callback into a build-time quirk.
Voyager should use the existing x86_quirks.mach_find_smp_config() callback
to introduce SMP-config quirks. NUMAQ-32 and VISWS already use this.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Voyager has this Kconfig quirk:
config X86_BIOS_REBOOT
bool
depends on !X86_VOYAGER
default y
Voyager should use the existing machine_ops.emergency_restart reboot
quirk mechanism instead of a build-time quirk.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86/Voyager can boot on non-zero processors. While that can probably
be fixed by properly remapping the physical CPU IDs, keep boot_cpu_id
for now for easier transition - and expand it to all of x86.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The x86/Voyager subarch used to have this distinction between
'x86 SMP support' and 'Voyager SMP support':
config X86_SMP
bool
depends on SMP && ((X86_32 && !X86_VOYAGER) || X86_64)
This is a pointless distinction - Voyager can (and already does) use
smp_ops to implement various SMP quirks it has - and it can be extended
more to cover all the specialities of Voyager.
So remove this complication in the Kconfig space.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the 32-bit subarchitecture support code.
All subarchitectures but Voyager have been converted. Voyager will be
done later or will be removed.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We are getting rid of subarchitecture support - move the hook files
to asm/. (These are now stale and should be replaced with more explicit
runtime mechanisms - but the transition is simpler this way.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>