Commit Graph

12419 Commits

Author SHA1 Message Date
Tejun Heo
4e62445b90 x86: Fix build failure on X86_UP_APIC
Commit 4c321ff8 (x86: Replace cpu_2_logical_apicid[] with early
percpu variable) and following changes introduced and used
x86_cpu_to_logical_apicid percpu variable.  It was declared and
defined inside CONFIG_SMP && CONFIG_X86_32 but if
CONFIG_X86_UP_APIC is set UP configuration makes use of it and
build fails.

Fix it by declaring and defining it inside CONFIG_X86_LOCAL_APIC
&& CONFIG_X86_32.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <20110128162248.GA25746@htj.dyndns.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 17:24:49 +01:00
Tejun Heo
8db78cc4b4 x86: Unify NUMA initialization between 32 and 64bit
Now that everything else is unified, NUMA initialization can be
unified too.

* numa_init_array() and init_cpu_to_node() are moved from
  numa_64 to numa.

* numa_32::initmem_init() is updated to call numa_init_array()
  and setup_arch() to call init_cpu_to_node() on 32bit too.

* x86_cpu_to_node_map is now initialized to NUMA_NO_NODE on
  32bit too. This is safe now as numa_init_array() will initialize
  it early during boot.

This makes NUMA mapping fully initialized before
setup_per_cpu_areas() on 32bit too and thus makes the first
percpu chunk which contains all the static variables and some of
dynamic area allocated with NUMA affinity correctly considered.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-17-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
2011-01-28 14:54:10 +01:00
Tejun Heo
de2d9445f1 x86: Unify node_to_cpumask_map handling between 32 and 64bit
x86_32 has been managing node_to_cpumask_map explicitly from
map_cpu_to_node() and friends in a rather ugly way.  With
previous changes, it's now possible to share the code with
64bit.

* When CONFIG_NUMA_EMU is disabled, numa_add/remove_cpu() are
  implemented in numa.c and shared by 32 and 64bit.  CONFIG_NUMA_EMU
  versions still live in numa_64.c.

  NUMA_EMU's dependency on 64bit is planned to be removed and the
  above should go away together.

* identify_cpu() now calls numa_add_cpu() for 32bit too.  This
  makes the explicit mask management from map_cpu_to_node() unnecessary.

* The whole x86_32 specific map_cpu_to_node() chunk is no longer
  necessary.  Dropped.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-16-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: David Rientjes <rientjes@google.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
2011-01-28 14:54:10 +01:00
Tejun Heo
645a79195f x86: Unify CPU -> NUMA node mapping between 32 and 64bit
Unlike 64bit, 32bit has been using its own cpu_to_node_map[] for
CPU -> NUMA node mapping.  Replace it with early_percpu variable
x86_cpu_to_node_map and share the mapping code with 64bit.

* USE_PERCPU_NUMA_NODE_ID is now enabled for 32bit too.

* x86_cpu_to_node_map and numa_set/clear_node() are moved from
  numa_64 to numa.  For now, on 32bit, x86_cpu_to_node_map is initialized
  with 0 instead of NUMA_NO_NODE.  This is to avoid introducing unexpected
  behavior change and will be updated once init path is unified.

* srat_detect_node() is now enabled for x86_32 too.  It calls
  numa_set_node() and initializes the mapping making explicit
  cpu_to_node_map[] updates from map/unmap_cpu_to_node() unnecessary.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-15-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: David Rientjes <rientjes@google.com>
2011-01-28 14:54:09 +01:00
Tejun Heo
bbc9e2f452 x86: Unify cpu/apicid <-> NUMA node mapping between 32 and 64bit
The mapping between cpu/apicid and node is done via
apicid_to_node[] on 64bit and apicid_2_node[] +
apic->x86_32_numa_cpu_node() on 32bit. This difference makes it
difficult to further unify 32 and 64bit NUMA handling.

This patch unifies it by replacing both apicid_to_node[] and
apicid_2_node[] with __apicid_to_node[] array, which is accessed
by two accessors - set_apicid_to_node() and numa_cpu_node().  On
64bit, numa_cpu_node() always consults __apicid_to_node[]
directly while 32bit goes through apic->numa_cpu_node() method
to allow apic implementations to override it.

srat_detect_node() for amd cpus contains workaround for broken
NUMA configuration which assumes relationship between APIC ID,
HT node ID and NUMA topology.  Leave it to access
__apicid_to_node[] directly as mapping through CPU might result
in undesirable behavior change.  The comment is reformatted and
updated to note the ugliness.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-14-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: David Rientjes <rientjes@google.com>
2011-01-28 14:54:09 +01:00
Tejun Heo
89e5dc218e x86: Replace apic->apicid_to_node() with ->x86_32_numa_cpu_node()
apic->apicid_to_node() is 32bit specific apic operation which
determines NUMA node for a CPU.  Depending on the APIC
implementation, it can be easier to determine NUMA node from
either physical or logical apicid.  Currently,
->apicid_to_node() takes @logical_apicid and calls
hard_smp_processor_id() if the physical apicid is needed.

This prevents NUMA mapping from being queried from a different
CPU, which in turn makes it impossible to initialize NUMA
mapping before SMP bringup.

This patch replaces apic->apicid_to_node() with
->x86_32_numa_cpu_node() which takes @cpu, from which both
logical and physical apicids can easily be determined.  While at
it, drop duplicate implementations from bigsmp_32 and summit_32,
and use the default one.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-13-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:08 +01:00
Tejun Heo
df04cf011b x86: Implement x86_32_early_logical_apicid() for numaq_32
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-12-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:08 +01:00
Tejun Heo
3b39d93784 x86: Implement x86_32_early_logical_apicid() for summit_32
Factor out logical apic id calculation from
summit_init_apic_ldr() and use it for the
x86_32_early_logical_apicid() callback.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-11-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:07 +01:00
Tejun Heo
12bf24a47c x86: Implement x86_32_early_logical_apicid() for bigsmp_32
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-10-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:07 +01:00
Tejun Heo
3f6f679888 x86: Implement the default x86_32_early_logical_apicid()
Implement x86_32_early_logical_apicid() for the default apic
flat routing.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-9-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:07 +01:00
Tejun Heo
acb8bc09c6 x86: Add apic->x86_32_early_logical_apicid()
On x86_32, the mapping between cpu and logical apic ID differs
depending on the specific apic implementation in use.  The
mapping is initialized while bringing up CPUs; however, this
makes early inits ignore memory topology.

Add a x86_32 specific apic->x86_32_early_logical_apicid() which
is called early during boot to query the mapping.  The mapping
is later verified against the result of init_apic_ldr().  The
method is allowed to return BAD_APICID if it can't be determined
early.

noop variant which always returns BAD_APICID is implemented and
added to all x86_32 apic implementations.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-8-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:06 +01:00
Tejun Heo
7632611f53 x86: Kill apic->cpu_to_logical_apicid()
After the previous patch, apic->cpu_to_logical_apicid() is no
longer used.  Kill it.

For apic types with custom cpu_to_logical_apicid() which is also
used for other purposes, remove the function and modify its
users to do the mapping directly.

#ifdef's on CONFIG_SMP in es7000_32 and summit_32 are ignored
during conversion as they are not used for UP kernels.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-7-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:06 +01:00
Tejun Heo
6f802c4bfa x86: Always use x86_cpu_to_logical_apicid for cpu -> logical apic id
Currently, cpu -> logical apic id translation is done by
apic->cpu_to_logical_apicid() callback which may or may not use
x86_cpu_to_logical_apicid.  This is unnecessary as it should
always equal logical_smp_processor_id() which is known early
during CPU bring up.

Initialize x86_cpu_to_logical_apicid after apic->init_apic_ldr()
in setup_local_APIC() and always use x86_cpu_to_logical_apicid
for cpu -> logical apic id mapping.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-6-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:05 +01:00
Tejun Heo
4c321ff8a0 x86: Replace cpu_2_logical_apicid[] with early percpu variable
Unlike x86_64, on x86_32, the mapping from cpu to logical apicid
may vary depending on apic in use.  cpu_2_logical_apicid[] array
is used for this mapping.  Replace it with early percpu variable
x86_cpu_to_logical_apicid to make it better aligned with other
mappings.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-5-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:05 +01:00
Tejun Heo
1245e1668c x86: Make default_send_IPI_mask_sequence/allbutself_logical() 32bit only
Both functions are used only in 32bit.  Put them inside
CONFIG_X86_32. This is to prepare for logical apicid handling
update.

- Cyrill Gorcunov spotted that I forgot to move declarations in
ipi.h   under CONFIG_X86_32.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: brgerst@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-4-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:05 +01:00
Tejun Heo
b78aa66b1f x86: Drop x86_32 MAX_APICID
Commit 56d91f13 (x86, acpi: Add MAX_LOCAL_APIC for 32bit) added
MAX_LOCAL_APIC for x86_32 but didn't replace MAX_APICID users
with it. Convert MAX_APICID users to MAX_LOCAL_APIC and drop
MAX_APICID.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-3-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:04 +01:00
Tejun Heo
bd22a2f198 x86: Kill unused static boot_cpu_logical_apicid in smpboot.c
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-2-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:04 +01:00
Linus Torvalds
1f0324caef Merge branch 'stable/bug-fixes-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug-fixes-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/setup: Route halt operations to safe_halt pvop.
  xen/e820: Guard against E820_RAM not having page-aligned size or start.
  xen/p2m: Mark INVALID_P2M_ENTRY the mfn_list past max_pfn.
2011-01-28 12:24:34 +10:00
Linus Torvalds
f7b548fa3d Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  percpu, x86: Fix percpu_xchg_op()
  x86: Remove left over system_64.h
  x86-64: Don't use pointer to out-of-scope variable in dump_trace()
2011-01-28 06:43:41 +10:00
Stefano Stabellini
23febeddbe xen/setup: Route halt operations to safe_halt pvop.
With this patch, the cpuidle driver does not load and
does not issue the mwait operations. Instead the hypervisor
is doing them (b/c we call the safe_halt pvops call).

This fixes quite a lot of bootup issues wherein the user had
to force interrupts for the continuation of the bootup.

Details are discussed in:

http://lists.xensource.com/archives/html/xen-devel/2011-01/msg00535.html

[v2: Wrote the commit description]

Reported-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Tested-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-27 12:00:24 -05:00
Stefano Stabellini
7cb31b752c xen/e820: Guard against E820_RAM not having page-aligned size or start.
Under Dell Inspiron 1525, and Intel SandyBridge SDP's the
BIOS e820 RAM is not page-aligned:

[   0.000000]  Xen: 0000000000100000 - 00000000df66d800 (usable)

We were not handling that and ended up setting up a pagetable
that included up to df66e000 with the disastrous effect that when

        memset(NODE_DATA(nodeid), 0, sizeof(pg_data_t));

tried to clear the page it would crash at the 2K mark.

Initially reported by Michael Young @
http://lists.xensource.com/archives/html/xen-devel/2011-01/msg00108.html

The fix is to page-align the size and also take into consideration
the start of the E820 (in case that is not page-aligned either). This
fixes the bootup failure on those affected machines.

This patch is a rework of the Micheal A Young initial patch and
considers the case if the start is not page-aligned.

Reported-by: Michael A Young <m.a.young@durham.ac.uk>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Michael A Young <m.a.young@durham.ac.uk>
2011-01-27 10:49:35 -05:00
Stefan Bader
cf04d120d9 xen/p2m: Mark INVALID_P2M_ENTRY the mfn_list past max_pfn.
In case the mfn_list does not have enough entries to fill
a p2m page we do not want the entries from max_pfn up to
the boundary to be filled with unknown values. Hence
set them to INVALID_P2M_ENTRY.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-27 10:49:34 -05:00
Eric Dumazet
889a7a6a5d percpu, x86: Fix percpu_xchg_op()
These recent percpu commits:

  2485b6464c: x86,percpu: Move out of place 64 bit ops into X86_64 section
  8270137a0d: cpuops: Use cmpxchg for xchg to avoid lock semantics

Caused this 'perf top' crash:

 Kernel panic - not syncing: Fatal exception in interrupt
 Pid: 0, comm: swapper Tainted: G     D
 2.6.38-rc2-00181-gef71723 #413 Call Trace: <IRQ> [<ffffffff810465b5>]
    ? panic
    ? kmsg_dump
    ? kmsg_dump
    ? oops_end
    ? no_context
    ? __bad_area_nosemaphore
    ? perf_output_begin
    ? bad_area_nosemaphore
    ? do_page_fault
    ? __task_pid_nr_ns
    ? perf_event_tid
    ? __perf_event_header__init_id
    ? validate_chain
    ? perf_output_sample
    ? trace_hardirqs_off
    ? page_fault
    ? irq_work_run
    ? update_process_times
    ? tick_sched_timer
    ? tick_sched_timer
    ? __run_hrtimer
    ? hrtimer_interrupt
    ? account_system_vtime
    ? smp_apic_timer_interrupt
    ? apic_timer_interrupt
 ...

Looking at assembly code, I found:

list = this_cpu_xchg(irq_work_list, NULL);

gives this wrong code : (gcc-4.1.2 cross compiler)

ffffffff810bc45e:
	mov    %gs:0xead0,%rax
	cmpxchg %rax,%gs:0xead0
	jne    ffffffff810bc45e <irq_work_run+0x3e>
	test   %rax,%rax
	je     ffffffff810bc4aa <irq_work_run+0x8a>

Tell gcc we dirty eax/rax register in percpu_xchg_op()

Compiler must use another register to store pxo_new__

We also dont need to reload percpu value after a jump,
since a 'failed' cmpxchg already updated eax/rax

Wrong generated code was :
	xor     %rax,%rax   /* load 0 into %rax */
1:	mov     %gs:0xead0,%rax
	cmpxchg %rax,%gs:0xead0
	jne     1b
	test    %rax,%rax

After patch :

	xor     %rdx,%rdx   /* load 0 into %rdx */
	mov     %gs:0xead0,%rax
1:	cmpxchg %rdx,%gs:0xead0
	jne     1b:
	test    %rax,%rax

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
LKML-Reference: <1295973114.3588.312.camel@edumazet-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-26 08:10:49 +01:00
Yinghai Lu
9a57c3e487 x86: Remove left over system_64.h
Left-over from the x86 merge ...

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D3E23D1.7010405@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-26 08:05:58 +01:00
Andrea Arcangeli
cacf061c5e thp: fix PARAVIRT x86 32bit noPAE
This fixes TRANSPARENT_HUGEPAGE=y with PARAVIRT=y and HIGHMEM64=n.

The #ifdef that this patch removes was erratically introduced to fix a
build error for noPAE (where pmd.pmd doesn't exist).  So then the kernel
built but it failed at runtime because set_pmd_at was a noop.  This will
correct it by enabling set_pmd_at for noPAE mode too.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: werner <w.landgraf@ru.ru>
Reported-by: Minchan Kim <minchan.kim@gmail.com>
Tested-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-26 10:49:57 +10:00
Jesper Juhl
2e5aa6824d x86-64: Don't use pointer to out-of-scope variable in dump_trace()
In arch/x86/kernel/dumpstack_64.c::dump_trace() we have this code:

...
  		if (!stack) {
  			unsigned long dummy;
  			stack = &dummy;
  			if (task && task != current)
  				stack = (unsigned long *)task->thread.sp;
  		}

  		bp = stack_frame(task, regs);
  		/*
  		 * Print function call entries in all stacks, starting at the
  		 * current stack address. If the stacks consist of nested
  		 * exceptions
  		 */
  		tinfo = task_thread_info(task);

  		for (;;) {
  			char *id;
  			unsigned long *estack_end;
  			estack_end = in_exception_stack(cpu, (unsigned long)stack,
  							&used, &id);
...

You'll notice that we assign to 'stack' the address of the variable
'dummy' which is only in-scope inside the 'if (!stack)'. So when we later
access stack (at the end of the above, and assuming we did not take the
'if (task && task != current)' branch) we'll be using the address of a
variable that is no longer in scope. I believe this patch is the proper
fix, but I freely admit that I'm not 100% certain.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
LKML-Reference: <alpine.LNX.2.00.1101242232590.10252@swampdragon.chaosbits.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-01-24 13:46:15 -08:00
Linus Torvalds
4398f31ca7 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix jump label with RO/NX module protection crash
  x86, hotplug: Fix powersavings with offlined cores on AMD
  x86, mcheck, therm_throt.c: Export symbol platform_thermal_notify to allow coretemp to handler intr
  x86: Use asm-generic/cacheflush.h
  x86: Update CPU cache attributes table descriptors
2011-01-25 05:24:12 +10:00
matthieu castet
8969691343 x86: Fix jump label with RO/NX module protection crash
If we use jump table in module init, there are marked
as removed in __jump_table section after init is done.

But we already applied ro permissions on the module, so
we can't modify a read only section (crash in
remove_jump_label_module_init).

Make the __jump_table section rw.

Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr>
Cc: Xiaotian Feng <xtfeng@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Siarhei Liakh <sliakh.lkml@gmail.com>
Cc: Xuxian Jiang <jiang@cs.ncsu.edu>
Cc: James Morris <jmorris@namei.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dave Jones <davej@redhat.com>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4D3C3F20.7030203@free.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-23 16:12:45 +01:00
Borislav Petkov
93789b32db x86, hotplug: Fix powersavings with offlined cores on AMD
ea53069231 made a CPU use monitor/mwait
when offline. This is not the optimal choice for AMD wrt to powersavings
and we'd prefer our cores to halt (i.e. enter C1) instead. For this, the
same selection whether to use monitor/mwait has to be used as when we
select the idle routine for the machine.

With this patch, offlining cores 1-5 on a X6 machine allows core0 to
boost again.

[ hpa: putting this in urgent since it is a (power) regression fix ]

Reported-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: stable@kernel.org # 37.x
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.hl>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1295534572-10730-1-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-01-21 18:14:54 -08:00
Linus Torvalds
ebe0d80507 Merge branch 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  x86,percpu: Move out of place 64 bit ops into X86_64 section
2011-01-21 13:43:21 -08:00
Linus Torvalds
cfd74486ea Merge branch 'stable/bug-fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug-fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: p2m: correctly initialize partial p2m leaf
  xen: fix non-ANSI function warning in irq.c
2011-01-21 13:35:10 -08:00
Stefan Bader
8e1b4cf210 xen: p2m: correctly initialize partial p2m leaf
After changing the p2m mapping to a tree by

  commit 58e05027b5
    xen: convert p2m to a 3 level tree

and trying to boot a DomU with 615MB of memory, the following crash was
observed in the dump:

kernel direct mapping tables up to 26f00000 @ 1ec4000-1fff000
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c0107397>] xen_set_pte+0x27/0x60
*pdpt = 0000000000000000 *pde = 0000000000000000

Adding further debug statements showed that when trying to set up
pfn=0x26700 the returned mapping was invalid.

pfn=0x266ff calling set_pte(0xc1fe77f8, 0x6b3003)
pfn=0x26700 calling set_pte(0xc1fe7800, 0x3)

Although the last_pfn obtained from the startup info is 0x26700, which
should in turn not be hit, the additional 8MB which are added as extra
memory normally seem to be ok. This lead to looking into the initial
p2m tree construction, which uses the smaller value and assuming that
there is other code handling the extra memory.

When the p2m tree is set up, the leaves are directly pointed to the
array which the domain builder set up. But if the mapping is not on a
boundary that fits into one p2m page, this will result in the last leaf
being only partially valid. And as the invalid entries are not
initialized in that case, things go badly wrong.

I am trying to fix that by checking whether the current leaf is a
complete map and if not, allocate a completely new page and copy only
the valid pointers there. This may not be the most efficient or elegant
solution, but at least it seems to allow me booting DomUs with memory
assignments all over the range.

BugLink: http://bugs.launchpad.net/bugs/686692
[v2: Redid a bit of commit wording and fixed a compile warning]

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-21 11:24:14 -05:00
Fenghua Yu
f21bbec9ff x86, mcheck, therm_throt.c: Export symbol platform_thermal_notify to allow coretemp to handler intr
In therm_throt.c, commit
9e76a97efd patch doesn't export
the symbol platform_thermal_notify.

Other drivers (e.g. drivers/hwmon/coretemp.c) can not find the
symbol platform_thermal_notify when defining threshould
interrupt handler.

Please apply this patch to allow threshold interrupt handler in
coretemp.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: R Durgadoss <durgadoss.r@intel.com>
Cc: khali@linux-fr.org <khali@linux-fr.org>
Cc: lm-sensors@lm-sensors.org <lm-sensors@lm-sensors.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
LKML-Reference: <20110121041239.GB26954@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-21 14:11:12 +01:00
Akinobu Mita
cc67ba6352 x86: Use asm-generic/cacheflush.h
The implementation of the cache flushing interfaces on the x86
is identical with the default implementation in asm-generic.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: arnd@arndb.de
LKML-Reference: <1295523136-4277-2-git-send-email-akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-21 14:11:12 +01:00
Linus Torvalds
2b1caf6ed7 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  smp: Allow on_each_cpu() to be called while early_boot_irqs_disabled status to init/main.c
  lockdep: Move early boot local IRQ enable/disable status to init/main.c
2011-01-20 18:30:37 -08:00
Linus Torvalds
8d99641f6c Merge branch 'akpm'
* akpm:
  kernel/smp.c: consolidate writes in smp_call_function_interrupt()
  kernel/smp.c: fix smp_call_function_many() SMP race
  memcg: correctly order reading PCG_USED and pc->mem_cgroup
  backlight: fix 88pm860x_bl macro collision
  drivers/leds/ledtrig-gpio.c: make output match input, tighten input checking
  MAINTAINERS: update Atmel AT91 entry
  mm: fix truncate_setsize() comment
  memcg: fix rmdir, force_empty with THP
  memcg: fix LRU accounting with THP
  memcg: fix USED bit handling at uncharge in THP
  memcg: modify accounting function for supporting THP better
  fs/direct-io.c: don't try to allocate more than BIO_MAX_PAGES in a bio
  mm: compaction: prevent division-by-zero during user-requested compaction
  mm/vmscan.c: remove duplicate include of compaction.h
  memblock: fix memblock_is_region_memory()
  thp: keep highpte mapped until it is no longer needed
  kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT
2011-01-20 17:02:14 -08:00
David Rientjes
6a108a14fa kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT
The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
is used to configure any non-standard kernel with a much larger scope than
only small devices.

This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
references to the option throughout the kernel.  A new CONFIG_EMBEDDED
option is added that automatically selects CONFIG_EXPERT when enabled and
can be used in the future to isolate options that should only be
considered for embedded systems (RISC architectures, SLOB, etc).

Calling the option "EXPERT" more accurately represents its intention: only
expert users who understand the impact of the configuration changes they
are making should enable it.

Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David Woodhouse <david.woodhouse@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Greg KH <gregkh@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Robin Holt <holt@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:05 -08:00
Linus Torvalds
e55fdbd741 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  virtio: remove virtio-pci root device
  LGUEST_GUEST: fix unmet direct dependencies (VIRTUALIZATION && VIRTIO)
  lguest: compile fixes
  lguest: Use this_cpu_ops
  lguest: document --rng in example Launcher
  lguest: example launcher to use guard pages, drop PROT_EXEC, fix limit logic
  lguest: --username and --chroot options
2011-01-20 16:31:20 -08:00
Randy Dunlap
7d81c3b9e2 xen: fix non-ANSI function warning in irq.c
Fix sparse warning for non-ANSI function declaration:

arch/x86/xen/irq.c:129:30: warning: non-ANSI function declaration of function 'xen_init_irq_ops'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc:	Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-20 14:52:13 -05:00
Tejun Heo
2ce802f62b lockdep: Move early boot local IRQ enable/disable status to init/main.c
During early boot, local IRQ is disabled until IRQ subsystem is
properly initialized.  During this time, no one should enable
local IRQ and some operations which usually are not allowed with
IRQ disabled, e.g. operations which might sleep or require
communications with other processors, are allowed.

lockdep tracked this with early_boot_irqs_off/on() callbacks.
As other subsystems need this information too, move it to
init/main.c and make it generally available.  While at it,
toggle the boolean to early_boot_irqs_disabled instead of
enabled so that it can be initialized with %false and %true
indicates the exceptional condition.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20110120110635.GB6036@htj.dyndns.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-20 13:32:33 +01:00
Dave Jones
fb87ec382f x86: Update CPU cache attributes table descriptors
Update to latest definitions in:

   http://www.intel.com/Assets/PDF/appnote/241618.pdf

[ Note, this update of the doc has removed some old values which
  we have listed.  I think until we have clarification that they
  were never used in production, they should be left there. ]

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
LKML-Reference: <20110120012055.GA15985@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-20 12:13:20 +01:00
Randy Dunlap
2b8216e635 LGUEST_GUEST: fix unmet direct dependencies (VIRTUALIZATION && VIRTIO)
Honor the kconfig menu hierarchy to remove kconfig dependency warnings:
VIRTIO and VIRTIO_RING are subordinate to VIRTUALIZATION.

warning: (LGUEST_GUEST) selects VIRTIO which has unmet direct dependencies (VIRTUALIZATION)
warning: (LGUEST_GUEST && VIRTIO_PCI && VIRTIO_BALLOON) selects VIRTIO_RING which has unmet direct dependencies (VIRTUALIZATION && VIRTIO)

Reported-by: Toralf F_rster <toralf.foerster@gmx.de>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-01-20 21:37:30 +10:30
Rusty Russell
ced05dd741 lguest: compile fixes
arch/x86/lguest/boot.c: In function ‘lguest_init_IRQ’:
arch/x86/lguest/boot.c:824: error: macro "__this_cpu_write" requires 2 arguments, but only 1 given
arch/x86/lguest/boot.c:824: error: ‘__this_cpu_write’ undeclared (first use in this function)
arch/x86/lguest/boot.c:824: error: (Each undeclared identifier is reported only once
arch/x86/lguest/boot.c:824: error: for each function it appears in.)

drivers/lguest/x86/core.c: In function ‘copy_in_guest_info’:
drivers/lguest/x86/core.c:94: error: lvalue required as left operand of assignment

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-01-20 21:37:29 +10:30
Christoph Lameter
c9f2954964 lguest: Use this_cpu_ops
Use this_cpu_ops in a couple of places in lguest.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-01-20 21:37:29 +10:30
Jan Beulich
9032160275 x86: Unify "numa=" command line option handling
In order to be able to suppress the use of SRAT tables that
32-bit Linux can't deal with (in one case known to lead to a
non-bootable system, unless disabling ACPI altogether), move the
"numa=" option handling to common code.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Renninger <trenn@suse.de>
LKML-Reference: <4D36B581020000780002D0FF@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-19 10:25:18 +01:00
Ingo Molnar
6b35eb9ddc Revert "x86: Make relocatable kernel work with new binutils"
This reverts commit 86b1e8dd83 ("x86: Make relocatable kernel work with
new binutils").

Markus Trippelsdorf reported a boot failure caused by this patch.

The real solution to the original patch will likely involve an
arch-generic solution to define an overlaid jiffies_64 and jiffies
variables.

Until that's done and tested on all architectures revert this commit to
solve the regression.

Reported-and-bisected-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <4D36A759.60704@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-19 10:09:42 +01:00
Linus Torvalds
404cbbd52f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Clear irqstack thread_info
  x86: Make relocatable kernel work with new binutils
2011-01-18 14:29:21 -08:00
Brian Gerst
7b698ea377 x86: Clear irqstack thread_info
Mathias Merz reported that v2.6.37 failed to boot on his
system.

Make sure that the thread_info part of the irqstack is
initialized to zeroes.

Reported-and-Tested-by: Matthias Merz <linux@merz-ka.de>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <AANLkTimyKXfJ1x8tgwrr1hYnNLrPfgE1NTe4z7L6tUDm@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-18 14:58:37 +01:00
Shaohua Li
86b1e8dd83 x86: Make relocatable kernel work with new binutils
The CONFIG_RELOCATABLE=y option is broken with new binutils, which will make
boot panic.

According to Lu Hongjiu, the affected binutils are from 2.20.51.0.12 to
2.21.51.0.3, which are release since Oct 22 this year. At least ubuntu 10.10 is
using such binutils. See:

    http://sourceware.org/bugzilla/show_bug.cgi?id=12327

The reason of the boot panic is that we have 'jiffies = jiffies_64;' in
vmlinux.lds.S. The jiffies isn't in any section. In kernel build, there is
warning saying jiffies is an absolute address and can't be relocatable. At
runtime, jiffies will have virtual address 0.

Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Cc: Lu Hongjiu<hongjiu.lu@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <1295312269.1949.725.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-18 09:05:33 +01:00
Linus Torvalds
f9ee7f60d6 Merge branches 'core-fixes-for-linus', 'x86-fixes-for-linus', 'timers-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: avoid pointless blocked-task warnings
  rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status
  rtmutex: Fix comment about why new_owner can be NULL in wake_futex_pi()

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, olpc: Add missing Kconfig dependencies
  x86, mrst: Set correct APB timer IRQ affinity for secondary cpu
  x86: tsc: Fix calibration refinement conditionals to avoid divide by zero
  x86, ia64, acpi: Clean up x86-ism in drivers/acpi/numa.c

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  timekeeping: Make local variables static
  time: Rename misnamed minsec argument of clocks_calc_mult_shift()

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Remove syscall_exit_fields
  tracing: Only process module tracepoints once
  perf record: Add "nodelay" mode, disabled by default
  perf sched: Fix list of events, dropping unsupported ':r' modifier
  Revert "perf tools: Emit clearer message for sys_perf_event_open ENOENT return"
  perf top: Fix annotate segv
  perf evsel: Fix order of event list deletion
2011-01-15 12:45:00 -08:00