Refactor the ->phys_pkg_id() methods:
- namespace separation
- macro wrapper removal
- open-coded calls to the methods in the generic code
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- unify the call signature of 64-bit to that of 32-bit
- clean up the types all around
- clean up namespace contamination
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- eliminate the needless es7000_enable_apic_mode() complication which
was not apparent prior the namespace cleanups
- clean up the control flow in es7000_enable_apic_mode()
- other cleanups
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Only ES7000 has a real ->enable_apic_mode() method, the other
subarchitectures define it but keep it empty.
So mark the vector as NULL, extend the generic code to handle
NULL -setup_portio_remap() entries and remove all the empty
handlers.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- spread out the namespace to per driver methods
- extend it to 64-bit as well so that we can use
apic->check_phys_apicid_present() unconditionally
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Only NUMAQ has a real ->setup_portio_remap() method, the other
subarchitectures define it but keep it empty.
So mark the vector as NULL, extend the generic code to handle
NULL -setup_portio_remap() entries and remove all the empty
handlers.
Also move the NUMAQ method from the header file into the
apic driver .c file.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
only NUMAQ uses this quirk: to prevent the timer IRQ from being added
on secondary nodes.
All other genapic templates can have a NULL ->multi_timer_check()
callback.
Also, extend the generic code to treat a NULL pointer accordingly.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
The bigsmp and es7000 subarchitectures un-defined APIC_DEST_LOGICAL in
a rather nasty way by re-defining it to zero. That is infinitely
fragile and makes it very hard to see what to code really does in
a given context. The very same constant has different meanings and
values - depending on which subarch is enabled.
Untangle this mess by never undefining the constant, but instead
propagating the right values into the genapic driver templates.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the ->ESR_DISABLE shouting variant was used to enable the esr_disable
macro wrappers. Those ugly macros are removed now so we can rename
->ESR_DISABLE to ->disable_esr
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Most subarchitectures want to disable the APIC ESR (Error Status Register),
because they generally have hardware hacks that wrap standard CPUs into
a bigger system and hence the APIC bus is quite non-standard and weirdnesses
(lockups) have been seen with ESR reporting.
Remove the esr_disable macros and put the desired flag into each
subarchitecture's genapic template directly.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Clean up all the target_cpus() namespace overlap that exists
between bigsmp, es7000, mach-default, numaq and summit - by
separating the different functions into different names.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the wrapper macros IRQ_DEST_MODE and IRQ_DELIVERY_MODE.
The typical 32-bit and the 64-bit build all dereference via the genapic,
so it's pointless to hide that indirection via these ugly macros.
Furthermore, it also obscures subarchitecture details.
So replace it with apic->irq_dest_mode / etc. accesses.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
int_delivery_mode is supposed to mean 'interrupt delivery mode', but
it's quite a misnomer as 'int' we usually think of as an integer type ...
The standard naming for such attributes is 'irq' - so rename the following
fields and macros:
int_delivery_mode => irq_delivery_mode
INT_DELIVERY_MODE => IRQ_DELIVERY_MODE
int_dest_mode => irq_dest_mode
INT_DEST_MODE => IRQ_DEST_MODE
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
x86 subarchitectures each defined a "apic_id_registered()" method,
which could be an inline function depending on which subarch we build
for, and which was also the name of a genapic field.
Untangle this namespace spaghetti by giving each of the instances
a separate name.
Also remove wrapper macro obfuscation.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: refactor code
x86 subarchitectures each defined a "acpi_madt_oem_check()" method,
which could be an inline function, or an extern, or a static function,
and which was also the name of a genapic field.
Untangle this namespace spaghetti by setting ->acpi_madt_oem_check()
to NULL on those subarchitectures that have no detection quirks,
and rename the other ones (summit, es7000) that do.
Also change default_acpi_madt_oem_check() to handle NULL entries,
and clean its control flow up as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The APIC_INIT() / APICFUNC / IPIFUNC macros were ugly and obfuscated
the true identity of various APIC driver methods.
Now that they are not used anymore, remove them.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rename genapic-> to apic-> references because in a future chagne we'll
open-code all the indirect calls (instead of obscuring them via macros),
so we want this reference to be as short as possible.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: pre unification cleanup
Make genapic_32.h similar to genapic_64.h: reorder fields, unify types
and bring in new entries.
No existing functionality is affected.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: pre unification cleanup
Make genapic_64.h similar to genapic_32.h: reorder fields, unify types
and bring in new entries.
No existing functionality is affected.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: sync 32 and 64-bit code
Merge load_gs_base() into switch_to_new_gdt(). Load the GDT and
per-cpu state for the boot cpu when its new area is set up.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: optimization
mb() generates an mfence instruction, which is not needed here. Only
a compiler barrier is needed, and that is handled by the memory clobber
in the wrmsrl function.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Rename init_gdt() to setup_percpu_segment(), and move it to
setup_percpu.c.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: Code movement, no functional change.
Move setup_cpu_local_masks() to kernel/cpu/common.c, where the
masks are defined.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: Code movement, no functional change.
Move the 64-bit NUMA code from setup_percpu.c to numa_64.c
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits)
xen: unitialised return value in xenbus_write_transaction
x86: fix section mismatch warning
x86: unmask CPUID levels on Intel CPUs, fix
x86: work around PAGE_KERNEL_WC not getting WC in iomap_atomic_prot_pfn.
x86: use standard PIT frequency
xen: handle highmem pages correctly when shrinking a domain
x86, mm: fix pte_free()
xen: actually release memory when shrinking domain
x86: unmask CPUID levels on Intel CPUs
x86: add MSR_IA32_MISC_ENABLE bits to <asm/msr-index.h>
x86: fix PTE corruption issue while mapping RAM using /dev/mem
x86: mtrr fix debug boot parameter
x86: fix page attribute corruption with cpa()
Revert "x86: signal: change type of paramter for sys_rt_sigreturn()"
x86: use early clobbers in usercopy*.c
x86: remove kernel_physical_mapping_init() from init section
fix: crash: IP: __bitmap_intersects+0x48/0x73
cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
work_on_cpu: Use our own workqueue.
work_on_cpu: don't try to get_online_cpus() in work_on_cpu.
...
The current version of __raw_read_trylock starts with decrementing the lock
and read its new value as a separate operation after that.
That makes 3 dereferences (read, write (after sub), read) whereas
a single atomic_dec_return does only two pointers dereferences (read, write).
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the RDC and ELAN platforms use slighly different PIT clocks, resulting in
a timex.h hack that changes PIT_TICK_RATE during build time. But if a
tester enables any of these platform support .config options, the PIT
will be miscalibrated on standard PC platforms.
So use one frequency - in a subsequent patch we'll add a quirk to allow
x86 platforms to define different PIT frequencies.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Cleanup
When PAT was originally introduced, it was handled specially for a few
reasons:
- PAT bugs are hard to track down, so we wanted to maintain a
whitelist of CPUs.
- The i386 and x86-64 CPUID code was not yet unified.
Both of these are now obsolete, so handle PAT like any other features,
including ordinary feature blacklisting due to known bugs.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: Whitespace cleanup only
Clean up a stray space character in arch/x86/include/asm/processor.h.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: introduce new uaccess exception handling framework
Introduce {get|put}_user_try and {get|put}_user_catch as new uaccess exception
handling framework.
{get|put}_user_try begins exception block and {get|put}_user_catch(err) ends
the block and gets err if an exception occured in {get|put}_user_ex() in the
block. The exception is stored thread_info->uaccess_err.
The example usage of this framework is below;
int func()
{
int err = 0;
get_user_try {
get_user_ex(...);
get_user_ex(...);
:
} get_user_catch(err);
return err;
}
Note: get_user_ex() is not clear the value when an exception occurs, it's
different from the behavior of __get_user(), but I think it doesn't matter.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
On -rt we were seeing spurious bad page states like:
Bad page state in process 'firefox'
page:c1bc2380 flags:0x40000000 mapping:c1bc2390 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 503, comm: firefox Not tainted 2.6.26.8-rt13 #3
[<c043d0f3>] ? printk+0x14/0x19
[<c0272d4e>] bad_page+0x4e/0x79
[<c0273831>] free_hot_cold_page+0x5b/0x1d3
[<c02739f6>] free_hot_page+0xf/0x11
[<c0273a18>] __free_pages+0x20/0x2b
[<c027d170>] __pte_alloc+0x87/0x91
[<c027d25e>] handle_mm_fault+0xe4/0x733
[<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
[<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
[<c0218875>] do_page_fault+0x36f/0x88a
This is the case where a concurrent fault already installed the PTE and
we get to free the newly allocated one.
This is due to pgtable_page_ctor() doing the spin_lock_init(&page->ptl)
which is overlaid with the {private, mapping} struct.
union {
struct {
unsigned long private;
struct address_space *mapping;
};
spinlock_t ptl;
struct kmem_cache *slab;
struct page *first_page;
};
Normally the spinlock is small enough to not stomp on page->mapping, but
PREEMPT_RT=y has huge 'spin'locks.
But lockdep kernels should also be able to trigger this splat, as the
lock tracking code grows the spinlock to cover page->mapping.
The obvious fix is calling pgtable_page_dtor() like the regular pte free
path __pte_free_tlb() does.
It seems all architectures except x86 and nm10300 already do this, and
nm10300 doesn't seem to use pgtable_page_ctor(), which suggests it
doesn't do SMP or simply doesnt do MMU at all or something.
Signed-off-by: Peter Zijlstra <a.p.zijlsta@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Impact: better code generation and removal of unused field for 32bit
In general, use the 64-bit version.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
APIC definitions aren't needed here. Remove the include and fix
up the fallout.
tj: added include to mce_intel_64.c.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: bogus irq_cpustat field removed
idle_timestamp is left over from the removed irqbalance code.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
It's not necessary to deconstruct and reconstruct a pte every time its
flags are being updated. Introduce pte_set_flags and pte_clear_flags
to set and clear flags in a pte. This allows the flag manipulation
code to be inlined, and avoids calls via paravirt-ops.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pte_flags() was introduced as a new pvop in order to extract just the
flags portion of a pte, which is a potentially cheaper operation than
extracting the page number as well. It turns out this operation is
not needed, because simply using a mask to extract the flags from a
pte is sufficient for all current users.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix/extend ioremap_wc() beyond 4GB aperture on 32-bit
ioremap_wc() was taking in unsigned long parameter, where as it should take
64-bit resource_size_t parameter like other ioremap variants.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: None (new bit definitions currently unused)
Add bit definitions for the MSR_IA32_MISC_ENABLE MSRs to
<asm/msr-index.h>.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Make X86 SGI Ultraviolet support configurable. Saves about 13K of text size
on my modest config.
text data bss dec hex filename
6770537 1158680 694356 8623573 8395d5 vmlinux
6757492 1157664 694228 8609384 835e68 vmlinux.nouv
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
arch/x86/kernel/tlb_32.c
Merge it here because both the cpumask changes and the ongoing percpu
work is touching the TLB code. The percpu changes take precedence, as
they eliminate tlb_32.c altogether.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 4217458daf.
Justin Madru bisected this commit, it was causing weird Firefox
crashes.
The reason is that GCC mis-optimizes (re-uses) the on-stack parameters of
the calling frame, which corrupts the syscall return pt_regs state and
thus corrupts user-space register state.
So we go back to the slightly less clean but more optimization-safe
method of getting to pt_regs. Also add a comment to explain this.
Resolves: http://bugzilla.kernel.org/show_bug.cgi?id=12505
Reported-and-bisected-by: Justin Madru <jdm64@gawab.com>
Tested-by: Justin Madru <jdm64@gawab.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: less contention when issuing invalidate IPI, cleanup
Make x86_32 use the same tlb code as 64bit. The 64bit code uses
multiple IPI vectors for tlb shootdown to reduce contention. This
patch makes x86_32 allocate the same 8 IPIs as x86_64 and share the
code paths.
Note that the usage of asmlinkage is inconsistent for x86_32 and 64
and calls for further cleanup. This has been noted with a FIXME
comment in tlb_64.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: clean up, ipi vector number reordering for x86_32
Make the following changes to prepare for tlb merge.
* reorder x86_32 ip vectors
* adjust tlb_32.c and tlb_64.c such that their logics coincide exactly
- on spurious invalidate ipi, tlb_32 acks the irq
- tlb_64 now has proper memory barriers around clearing
flush_cpumask (no change in generated code)
* unexport flush_tlb_page from tlb_32.c, there's no user
* use unsigned int for cpu id
* drop unnecessary includes from tlb_64.c
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Make the following uv related cleanups.
* collect visible uv related definitions and interfaces into uv/uv.h
and use it. this cleans up the messy situation where on 64bit, uv
is defined properly, on 32bit generic it's dummy and on the rest
undefined. after this clean up, uv is defined on 64 and dummy on
32.
* update uv_flush_tlb_others() such that it takes cpumask of
to-be-flushed cpus as argument, instead of that minus self, and
returns yet-to-be-flushed cpumask, instead of modifying the passed
in parameter. this interface change will ease dummy implementation
of uv_flush_tlb_others() and makes uv tlb flush related stuff
defined in tlb_uv proper.
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup, better irq_regs code generation for x86_64
Make 64-bit use the same optimizations as 32-bit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
tj: * changed cpu to unsigned as was done on mmu_context_64.h as cpu
id is officially unsigned int
* added missing ';' to 32bit version of deactivate_mm()
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: slightly better code generation for percpu_to_op()
The processor will sign-extend 32-bit immediate values in 64-bit
operations. Use the 'e' constraint ("32-bit signed integer constant,
or a symbolic reference known to fit that range") for 64-bit constants.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
In switch_to(), instead of taking offset to irq_stack_union.stack,
make it a proper percpu access using __percpu_arg() and per_cpu_var().
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Remove byte locks implementation, which was introduced by Jeremy in
8efcbab6 ("paravirt: introduce a "lock-byte" spinlock implementation"),
but turned out to be dead code that is not used by any in-kernel
virtualization guest (Xen uses its own variant of spinlocks implementation
and KVM is not planning to move to byte locks).
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: x86_64 percpu area layout change, irq_stack now at the beginning
Now that the PDA is empty except for the stack canary, it can be removed.
The irqstack is moved to the start of the per-cpu section. If the stack
protector is enabled, the canary overlaps the bottom 48 bytes of the irqstack.
tj: * updated subject
* dropped asm relocation of irq_stack_ptr
* updated comments a bit
* rebased on top of stack canary changes
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Copy the code to cpu_init() to satisfy the requirement that the cpu
be reinitialized. Remove all other calls, since the segments are
already initialized in head_64.S.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: no unnecessary stack canary swapping during context switch
There's no point in moving stack_canary around during context switch
if it's not enabled. Conditionalize it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Make the following cleanups.
* remove duplicate comment from boot_init_stack_canary() which fits
better in the other place - cpu_idle().
* move stack_canary offset check from __switch_to() to
boot_init_stack_canary().
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Move/remove leftover RDC321 files. Now that it's not a subarch anymore,
arch/x86/mach-rdc321x and arch/x86/include/asm/mach-rdc321x/ are not
needed.
One include file was still in use: rdc321x_defs.h, move that to the
generic x86 asm header directory.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Accessing memory through %gs should not use rip-relative addressing.
Adding a P prefix for the argument tells gcc to not add (%rip) to
the memory references.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
tj: * in asm-offsets_64.c, pda.h inclusion shouldn't be removed as pda
is still referenced in the file
* s/oldrsp/old_rsp/
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Also clean up PER_CPU_VAR usage in xen-asm_64.S
tj: * remove now unused stack_thread_info()
* s/kernelstack/kernel_stack/
* added FIXME comment in xen-asm_64.S
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
tj: moved cpu_number definition out of CONFIG_HAVE_SETUP_PER_CPU_AREA
for voyager.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Move the irqstackptr variable from the PDA to per-cpu. Make the
stacks themselves per-cpu, removing some specific allocation code.
Add a seperate flag (is_boot_cpu) to simplify the per-cpu boot
adjustments.
tj: * sprinkle some underbars around.
* irq_stack_ptr is not used till traps_init(), no reason to
initialize it early. On SMP, just leaving it NULL till proper
initialization in setup_per_cpu_areas() works. Dropped
is_boot_cpu and early irq_stack_ptr initialization.
* do DECLARE/DEFINE_PER_CPU(char[IRQ_STACK_SIZE], irq_stack)
instead of (char, irq_stack[IRQ_STACK_SIZE]).
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
It is an optimization and a cleanup, and adds the following new
generic percpu methods:
percpu_read()
percpu_write()
percpu_add()
percpu_sub()
percpu_and()
percpu_or()
percpu_xor()
and implements support for them on x86. (other architectures will fall
back to a default implementation)
The advantage is that for example to read a local percpu variable,
instead of this sequence:
return __get_cpu_var(var);
ffffffff8102ca2b: 48 8b 14 fd 80 09 74 mov -0x7e8bf680(,%rdi,8),%rdx
ffffffff8102ca32: 81
ffffffff8102ca33: 48 c7 c0 d8 59 00 00 mov $0x59d8,%rax
ffffffff8102ca3a: 48 8b 04 10 mov (%rax,%rdx,1),%rax
We can get a single instruction by using the optimized variants:
return percpu_read(var);
ffffffff8102ca3f: 65 48 8b 05 91 8f fd mov %gs:0x7efd8f91(%rip),%rax
I also cleaned up the x86-specific APIs and made the x86 code use
these new generic percpu primitives.
tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
* added percpu_and() for completeness's sake
* made generic percpu ops atomic against preemption
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Tejun Heo <tj@kernel.org>
Do the following cleanups:
* kill x86_64_init_pda() which now is equivalent to pda_init()
* use per_cpu_offset() instead of cpu_pda() when initializing
initial_gs
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pda is now a percpu variable and there's no reason it can't use plain
x86 percpu accessors. Add x86_test_and_clear_bit_percpu() and replace
pda op implementations with wrappers around x86 percpu accessors.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Based on original patch from Christoph Lameter and Mike Travis. ]
As pda is now allocated in percpu area, it can easily be made a proper
percpu variable. Make it so by defining per cpu symbol from linker
script and declaring it in C code for SMP and simply defining it for
UP. This change cleans up code and brings SMP and UP closer a bit.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that pda is allocated as part of percpu, percpu doesn't need to be
accessed through pda. Unify x86_64 SMP percpu access with x86_32 SMP
one. Other than the segment register, operand size and the base of
percpu symbols, they behave identical now.
This patch replaces now unnecessary pda->data_offset with a dummy
field which is necessary to keep stack_canary at its place. This
patch also moves per_cpu_offset initialization out of init_gdt() into
setup_per_cpu_areas(). Note that this change also necessitates
explicit per_cpu_offset initializations in voyager_smp.c.
With this change, x86_OP_percpu()'s are as efficient on x86_64 as on
x86_32 and also x86_64 can use assembly PER_CPU macros.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Based on original patch from Christoph Lameter and Mike Travis. ]
Currently pdas and percpu areas are allocated separately. %gs points
to local pda and percpu area can be reached using pda->data_offset.
This patch folds pda into percpu area.
Due to strange gcc requirement, pda needs to be at the beginning of
the percpu area so that pda->stack_canary is at %gs:40. To achieve
this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
added and used to reserve pda sized chunk at the start of the percpu
area.
After this change, for boot cpu, %gs first points to pda in the
data.init area and later during setup_per_cpu_areas() gets updated to
point to the actual pda. This means that setup_per_cpu_areas() need
to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
already has modified it when control reaches setup_per_cpu_areas().
This patch also removes now unnecessary get_local_pda() and its call
sites.
A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
per cpu area" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
_cpu_pda array first uses statically allocated storage in data.init
and then switches to allocated bootmem to conserve space. However,
after folding pda area into percpu area, _cpu_pda array will be
removed completely. Drop the reallocation part to simplify the code
for soon-to-follow changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Based on original patch from Christoph Lameter and Mike Travis. ]
CPU startup code in head_64.S loaded address of a zero page into %gs
for temporary use till pda is loaded but address to the actual pda is
available at the point. Load the real address directly instead.
This will help unifying percpu and pda handling later on.
This patch is mostly taken from Mike Travis' "x86_64: Fold pda into
per cpu area" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Make early_per_cpu() a lvalue as per_cpu() is and use it where
applicable.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There's no instruction to move a 64bit immediate into memory location.
Drop "i".
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Debugging and original patch from Nick Piggin <npiggin@suse.de>
The early fixmap pmd entry inserted at the very top of the KVA is causing the
subsequent fixmap mapping code to not provide physically linear pte pages over
the kmap atomic portion of the fixmap (which relies on said property to
calculate pte addresses).
This has caused weird boot failures in kmap_atomic much later in the boot
process (initial userspace faults) on a 32-bit PAE system with a larger number
of CPUs (smaller CPU counts tend not to run over into the next page so don't
show up the problem).
Solve this by attempting to clear out the page table, and copy any of its
entries to the new one. Also, add a bug if a nonlinear condition is encountered
and can't be resolved, which might save some hours of debugging if this fragile
scheme ever breaks again...
Once we have such logic, we can also use it to eliminate the early ioremap
trickery around the page table setup for the fixmap area. This also fixes
potential issues with FIX_* entries sharing the leaf page table with the early
ioremap ones getting discarded by early_ioremap_clear() and not restored by
early_ioremap_reset(). It at once eliminates the temporary (and configuration,
namely NR_CPUS, dependent) unavailability of early fixed mappings during the
time the fixmap area page tables get constructed.
Finally, also replace the hard coded calculation of the initial table space
needed for the fixmap area with a proper one, allowing kernels configured for
large CPU counts to actually boot.
Based-on: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix build warning
The macro cpu_to_node did not reference it's argument, and instead
simply returned a 0. This causes a "unused variable" warning if
it's the only reference in a function (show_cache_disable).
Replace it with the more correct inline function.
Signed-off-by: Mike Travis <travis@sgi.com>
Add swab.h to kbuild.asm and remove the individual entries from
each arch, mark as unifdef as some arches have some kernel-only
bits inside.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: cleanup
'make headers_check' warn us about leaking of kernel private
(mostly compile time vars) data to userspace in headers. Fix it.
Guard this one by __KERNEL__.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: cleanup
'make headers_check' warn us about lack of linux/types.h
here. Lets add it.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: cleanup (internal kernel function exported)
'make headers_check' warn us about leaking of kernel private
(mostly compile time vars) data to userspace in headers. Fix it.
sys_arch_prctl is completely removed from
header since frankly I don't even understand why we
describe it here. It is described like
__SYSCALL(__NR_arch_prctl, sys_arch_prctl) in unistd_64.h
and implemented in process_64.c. User-mode linux involved?
So this one in fact is suspicious.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: clean up to be same as 64bit
32-bit is using per-cpu vector too, so don't use default NR_IRQS.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Move the new memtype old memtype allowed check to header so that is can be
shared by other users. Subsequent patch uses this in pat.c in remap_pfn_range()
code path. No functionality change in this patch.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: reduce kernel image size
Hugh Dickins noticed that older gcc versions when the kernel
is built for code size didn't inline some of the bitops.
Mark all complex x86 bitops that have more than a single
asm statement or two as always inline to avoid this problem.
Probably should be done for other architectures too.
Ingo then found a better fix that only requires
a single line change, but it unfortunately only
works on gcc 4.3.
On older gccs the original patch still makes a ~0.3% defconfig
difference with CONFIG_OPTIMIZE_INLINING=y.
With gcc 4.1 and a defconfig like build:
6116998 1138540 883788 8139326 7c323e vmlinux-oi-with-patch
6137043 1138540 883788 8159371 7c808b vmlinux-optimize-inlining
~20k / 0.3% difference.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix this by reintroducing asm/smp.h include in apic.c - later on
I will fix this by removing non-smp data from smp.h
Also fix the __inquire_remote_apic() prototype/inline.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar wrote:
> All non-x86 architectures fail to build:
>
> In file included from /home/mingo/tip/include/linux/random.h:11,
> from /home/mingo/tip/include/linux/stackprotector.h:6,
> from /home/mingo/tip/init/main.c:17:
> /home/mingo/tip/include/linux/irqnr.h:26:63: error: asm/irq_vectors.h: No such file or directory
Do not include asm/irq_vectors.h in generic code - it's not available
on all architectures.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Reduce memory usage.
This is the second half of the changes to make the irq_desc_ptrs be
variable sized based on nr_cpu_ids. This is done by adding a new
"max_nr_irqs" macro to irq_vectors.h (and a dummy in irqnr.h) to
return a max NR_IRQS value based on NR_CPUS or nr_cpu_ids.
This necessitated moving the define of MAX_IO_APICS to a separate
file (asm/apicnum.h) so it could be included without the baggage
of the other asm/apicdef.h declarations.
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: reduce stack usage, use new cpumask API.
This is made a little more tricky by uv_flush_tlb_others which
actually alters its argument, for an IPI to be sent to the remaining
cpus in the mask.
I solve this by allocating a cpumask_var_t for this case and falling back
to IPI should this fail.
To eliminate temporaries in the caller, all flush_tlb_others implementations
now do the this-cpu-elimination step themselves.
Note also the curious "cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask)"
which has been there since pre-git and yet f->flush_cpumask is always zero
at this point.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: micro-optimization
The patch below removes an unnecessary locked instruction from
switch_to(). TIF_FORK is only ever set in copy_thread() on initial
process creation, and gets cleared during the first scheduling of the
process. As such, it is safe to use an unlocked test for the flag
within switch_to().
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
x86: fix section mismatch warnings in mcheck/mce_amd_64.c
x86: offer frame pointers in all build modes
x86: remove duplicated #include's
x86: k8 numa register active regions later
x86: update Alan Cox's email addresses
x86: rename all fields of mpc_table mpc_X to X
x86: rename all fields of mpc_oemtable oem_X to X
x86: rename all fields of mpc_bus mpc_X to X
x86: rename all fields of mpc_cpu mpc_X to X
x86: rename all fields of mpc_intsrc mpc_X to X
x86: rename all fields of mpc_lintsrc mpc_X to X
x86: rename all fields of mpc_iopic mpc_X to X
x86: irqinit_64.c init_ISA_irqs should be static
Documentation/x86/boot.txt: payload length was changed to payload_length
x86: setup_percpu.c fix style problems
x86: irqinit_64.c fix style problems
x86: irqinit_32.c fix style problems
x86: i8259.c fix style problems
x86: irq_32.c fix style problems
x86: ioport.c fix style problems
...
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
[IA64] fix typo in cpumask_of_pcibus()
x86: fix x86_32 builds for summit and es7000 arch's
cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
cpumask: use cpumask_var_t in acpi-cpufreq.c
cpumask: use work_on_cpu in acpi/cstate.c
cpumask: convert struct cpufreq_policy to cpumask_var_t
cpumask: replace CPUMASK_ALLOC etc with cpumask_var_t
x86: cleanup remaining cpumask_t ops in smpboot code
cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
cpumask: update local_cpus_show to use new cpumask API
ia64: cpumask fix for is_affinity_mask_valid()
Ingo noticed that using signed arithmetic seems to confuse the gcc
inliner, and make it potentially decide that it's all too complicated.
(Yeah, yeah, it's a constant. It's always positive. Still..)
Based-on: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpf->mpf_X fields to
mpf->X - that alone would give 4 characters per usage site.
(we already know that it's an 'mpf' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
intel_mp_floating should be renamed to mpf_intel.
The reason: the 'f' in MPF already means 'floating'
which means MP Floating pointer structure -
no need to repeat that in the type name.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: fix rcutorture bug
rcu: eliminate synchronize_rcu_xxx macro
rcu: make treercu safe for suspend and resume
rcu: fix rcutree grace-period-latency bug on small systems
futex: catch certain assymetric (get|put)_futex_key calls
futex: make futex_(get|put)_key() calls symmetric
locking, percpu counters: introduce separate lock classes
swiotlb: clean up EXPORT_SYMBOL usage
swiotlb: remove unnecessary declaration
swiotlb: replace architecture-specific swiotlb.h with linux/swiotlb.h
swiotlb: add support for systems with highmem
swiotlb: store phys address in io_tlb_orig_addr array
swiotlb: add hwdev to swiotlb_phys_to_bus() / swiotlb_sg_to_bus()
The atomic_t type cannot currently be used in some header files because it
would create an include loop with asm/atomic.h. Move the type definition
to linux/types.h to break the loop.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the following build errors reported by Yinghai Lu:
| In file included from arch/x86/mach-generic/summit.c:16:
| tip/linux-2.6/arch/x86/include/asm/summit/apic.h:
| In function 'cpu_mask_to_apicid_and':
| tip/linux-2.6/arch/x86/include/asm/summit/apic.h:179:
| error: 'GFP_ATOMIC' undeclared (first use in this function)
Reported-by: Yinghai Lu <Yinghai.Lu@Sun.COM>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpc->mpc_X fields to
mpc->X - that alone would give 4 characters per usage site.
(we already know that it's an 'mpc' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpc->oem_X fields to
mpc->X - that alone would give 4 characters per usage site.
(we already know that it's an 'oem' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpc->mpc_X fields to
mpc->X - that alone would give 4 characters per usage site.
(we already know that it's an 'mpc' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpc->mpc_X fields to
mpc->X - that alone would give 4 characters per usage site.
(we already know that it's an 'mpc' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpc->mpc_X fields to
mpc->X - that alone would give 4 characters per usage site.
(we already know that it's an 'mpc' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpc->mpc_X fields to
mpc->X - that alone would give 4 characters per usage site.
(we already know that it's an 'mpc' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
It would be cleaner to rename all the mpc->mpc_X fields to
mpc->X - that alone would give 4 characters per usage site.
(we already know that it's an 'mpc' entity -
no need to duplicate that in the field too)
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: use new cpumask API to reduce memory and stack usage
Allocate the following local cpumasks based on the number of cpus that
are present. References will use new cpumask API. (Currently only
modified for x86_64, x86_32 continues to use the *_map variants.)
cpu_callin_mask
cpu_callout_mask
cpu_initialized_mask
cpu_sibling_setup_mask
Provide the following accessor functions:
struct cpumask *cpu_sibling_mask(int cpu)
struct cpumask *cpu_core_mask(int cpu)
Other changes are when setting or clearing the cpu online, possible
or present maps, use the accessor functions.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
mpc_config_oemtable should be renamed to mpc_oemtable.
The reason: the 'c' in MPC already means 'config' -
no need to repeat that in the type name.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
mpc_config_lintsrc should be renamed to mpc_lintsrc.
The reason: the 'c' in MPC already means 'config' -
no need to repeat that in the type name.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
mpc_config_intsrc should be renamed to mpc_intsrc.
The reason: the 'c' in MPC already means 'config' -
no need to repeat that in the type name.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
mpc_config_ioapic should be renamed to mpc_ioapic.
The reason: the 'c' in MPC already means 'config' -
no need to repeat that in the type name.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
mpc_config_processor should be renamed to mpc_cpu.
The reason: the 'c' in MPC already means 'config' -
no need to repeat that in the type name.
Plus 'processor' is a lot longer than 'cpu' - so we try to use 'cpu' in all
type names, as much as possible.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
mpc_config_bus should be renamed to mpc_bus.
The reason: the 'c' in MPC already means 'config' -
no need to repeat that in the type name.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, solve 80 columns wrap problems
mp_config_table should be renamed to mpc_table.
The reason: the 'c' in MPC already means 'config' -
no need to repeat that in the type name.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits)
x86: setup_per_cpu_areas() cleanup
cpumask: fix compile error when CONFIG_NR_CPUS is not defined
cpumask: use alloc_cpumask_var_node where appropriate
cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
x86: use cpumask_var_t in acpi/boot.c
x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids
sched: put back some stack hog changes that were undone in kernel/sched.c
x86: enable cpus display of kernel_max and offlined cpus
ia64: cpumask fix for is_affinity_mask_valid()
cpumask: convert RCU implementations, fix
xtensa: define __fls
mn10300: define __fls
m32r: define __fls
h8300: define __fls
frv: define __fls
cris: define __fls
cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
cpumask: zero extra bits in alloc_cpumask_var_node
cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/
cpumask: convert mm/
...
Impact: Reduce future system panics due to cpumask operations using NR_CPUS
Insure that code does not look at bits >= nr_cpu_ids as when cpumasks are
allocated based on nr_cpu_ids, these extra bits will not be defined.
Also some other minor updates:
* change in to use cpu accessor function set_cpu_present() instead of
directly accessing cpu_present_map w/cpu_clear() [arch/x86/kernel/reboot.c]
* use cpumask_of() instead of &cpumask_of_cpu() [arch/x86/kernel/reboot.c]
* optimize some cpu_mask_to_apicid_and functions.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Imapct: add a new struct member to 'struct protection_domain'
When using protection domains for dma_ops and KVM its better to know for
which subsystem it was allocated. Add a flags member to struct
protection domain for that purpose.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
x86: export vector_used_by_percpu_irq
x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
sched: nominate preferred wakeup cpu, fix
x86: fix lguest used_vectors breakage, -v2
x86: fix warning in arch/x86/kernel/io_apic.c
sched: fix warning in kernel/sched.c
sched: move test_sd_parent() to an SMP section of sched.h
sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
sched: activate active load balancing in new idle cpus
sched: bias task wakeups to preferred semi-idle packages
sched: nominate preferred wakeup cpu
sched: favour lower logical cpu number for sched_mc balance
sched: framework for sched_mc/smt_power_savings=N
sched: convert BALANCE_FOR_xx_POWER to inline functions
x86: use possible_cpus=NUM to extend the possible cpus allowed
x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
x86: update io_apic.c to the new cpumask code
x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
x86: xen: use smp_call_function_many()
x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
...
Fixed up trivial conflict in kernel/time/tick-sched.c manually
If the guest executes invlpg, peek into the pagetable and attempt to
prepopulate the shadow entry.
Also stop dirty fault updates from interfering with the fork detector.
2% improvement on RHEL3/AIM7.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Skip syncing global pages on cr3 switch (but not on cr4/cr0). This is
important for Linux 32-bit guests with PAE, where the kmap page is
marked as global.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of invoking the handler directly collect pages into
an array so the caller can work with it.
Simplifies TLB flush collapsing.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Instruction like shld has three operands, so we need to add a Src2
decode set. We start with Src2None, Src2CL, and Src2ImmByte, Src2One to
support shld/shrd and we will expand it later.
Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
This function can be used by the reboot or kdump code to forcibly
disable SVM on the CPU.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Use a trick to keep the printk()s on has_svm() working as before. gcc
will take care of not generating code for the 'msg' stuff when the
function is called with a NULL msg argument.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add cpu_emergency_vmxoff() and its friends: cpu_vmx_enabled() and
__cpu_emergency_vmxoff().
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Unfortunately we can't use exactly the same code from vmx
hardware_disable(), because the KVM function uses the
__kvm_handle_fault_on_reboot() tricks.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
It will be used by core code on kdump and reboot, to disable
vmx if needed.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Those definitions will be used by code outside KVM, so move it outside
of a KVM-specific source file.
Those definitions are used only on kvm/vmx.c, that already includes
asm/vmx.h, so they can be moved safely.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
svm.h will be used by core code that is independent of KVM, so I am
moving it outside the arch/x86/kvm directory.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
vmx.h will be used by core code that is independent of KVM, so I am
moving it outside the arch/x86/kvm directory.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Some areas of kvm x86 mmu are using gfn offset inside a slot without
unaliasing the gfn first. This patch makes sure that the gfn will be
unaliased and add gfn_to_memslot_unaliased() to save the calculating
of the gfn unaliasing in case we have it unaliased already.
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
As suggested by Avi, this patch introduces a counter of VCPUs that have
LVT0 set to NMI mode. Only if the counter > 0, we push the PIT ticks via
all LAPIC LVT0 lines to enable NMI watchdog support.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Otherwise set_bit() for private memory slot(above KVM_MEMORY_SLOTS) would
corrupted memory in 32bit host.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>