Commit Graph

15450 Commits

Author SHA1 Message Date
zhenzhong.duan
c3d93f8801 xen: populate correct number of pages when across mem boundary (v2)
When populate pages across a mem boundary at bootup, the page count
populated isn't correct. This is due to mem populated to non-mem
region and ignored.

Pfn range is also wrongly aligned when mem boundary isn't page aligned.

For a dom0 booted with dom_mem=3368952K(0xcd9ff000-4k) dmesg diff is:
 [    0.000000] Freeing 9e-100 pfn range: 98 pages freed
 [    0.000000] 1-1 mapping on 9e->100
 [    0.000000] 1-1 mapping on cd9ff->100000
 [    0.000000] Released 98 pages of unused memory
 [    0.000000] Set 206435 page(s) to 1-1 mapping
-[    0.000000] Populating cd9fe-cda00 pfn range: 1 pages added
+[    0.000000] Populating cd9fe-cd9ff pfn range: 1 pages added
+[    0.000000] Populating 100000-100061 pfn range: 97 pages added
 [    0.000000] BIOS-provided physical RAM map:
 [    0.000000] Xen: 0000000000000000 - 000000000009e000 (usable)
 [    0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved)
 [    0.000000] Xen: 0000000000100000 - 00000000cd9ff000 (usable)
 [    0.000000] Xen: 00000000cd9ffc00 - 00000000cda53c00 (ACPI NVS)
...
 [    0.000000] Xen: 0000000100000000 - 0000000100061000 (usable)
 [    0.000000] Xen: 0000000100061000 - 000000012c000000 (unusable)
...
 [    0.000000] MEMBLOCK configuration:
...
-[    0.000000]  reserved[0x4]       [0x000000cd9ff000-0x000000cd9ffbff], 0xc00 bytes
-[    0.000000]  reserved[0x5]       [0x00000100000000-0x00000100060fff], 0x61000 bytes

Related xen memory layout:
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009ec00 (usable)
(XEN)  00000000000f0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000cd9ffc00 (usable)

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
[v2: If xen_do_chunk fail(populate), abort this chunk and any others]
Suggested by David, thanks.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:52:06 -04:00
Olaf Hering
00e37bdb01 xen PVonHVM: move shared_info to MMIO before kexec
Currently kexec in a PVonHVM guest fails with a triple fault because the
new kernel overwrites the shared info page. The exact failure depends on
the size of the kernel image. This patch moves the pfn from RAM into
MMIO space before the kexec boot.

The pfn containing the shared_info is located somewhere in RAM. This
will cause trouble if the current kernel is doing a kexec boot into a
new kernel. The new kernel (and its startup code) can not know where the
pfn is, so it can not reserve the page. The hypervisor will continue to
update the pfn, and as a result memory corruption occours in the new
kernel.

One way to work around this issue is to allocate a page in the
xen-platform pci device's BAR memory range. But pci init is done very
late and the shared_info page is already in use very early to read the
pvclock. So moving the pfn from RAM to MMIO is racy because some code
paths on other vcpus could access the pfn during the small   window when
the old pfn is moved to the new pfn. There is even a  small window were
the old pfn is not backed by a mfn, and during that time all reads
return -1.

Because it is not known upfront where the MMIO region is located it can
not be used right from the start in xen_hvm_init_shared_info.

To minimise trouble the move of the pfn is done shortly before kexec.
This does not eliminate the race because all vcpus are still online when
the syscore_ops will be called. But hopefully there is no work pending
at this point in time. Also the syscore_op is run last which reduces the
risk further.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:52:05 -04:00
Olaf Hering
4ff2d06255 xen: simplify init_hvm_pv_info
init_hvm_pv_info is called only in PVonHVM context, move it into ifdef.
init_hvm_pv_info does not fail, make it a void function.
remove arguments from init_hvm_pv_info because they are not used by the
caller.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:52:04 -04:00
Olaf Hering
4648da7cb4 xen: remove cast from HYPERVISOR_shared_info assignment
Both have type struct shared_info so no cast is needed.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:52:03 -04:00
David Vrabel
1c32cdc633 xen/x86: avoid updating TLS descriptors if they haven't changed
When switching tasks in a Xen PV guest, avoid updating the TLS
descriptors if they haven't changed.  This improves the speed of
context switches by almost 10% as much of the time the descriptors are
the same or only one is different.

The descriptors written into the GDT by Xen are modified from the
values passed in the update_descriptor hypercall so we keep shadow
copies of the three TLS descriptors to compare against.

lmbench3 test     Before  After  Improvement
--------------------------------------------
lat_ctx -s 32 24   7.19    6.52  9%
lat_pipe          12.56   11.66  7%

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:51:57 -04:00
David Vrabel
59290362da xen/x86: add desc_equal() to compare GDT descriptors
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[v1: Moving it to the Xen file]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:51:45 -04:00
David Vrabel
66a27dde9a xen/mm: zero PTEs for non-present MFNs in the initial page table
When constructing the initial page tables, if the MFN for a usable PFN
is missing in the p2m then that frame is initially ballooned out.  In
this case, zero the PTE (as in decrease_reservation() in
drivers/xen/balloon.c).

This is obviously safe instead of having an valid PTE with an MFN of
INVALID_P2M_ENTRY (~0).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:51:44 -04:00
David Vrabel
d095d43e78 xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable
In xen_set_pte() if batching is unavailable (because the caller is in
an interrupt context such as handling a page fault) it would fall back
to using native_set_pte() and trapping and emulating the PTE write.

On 32-bit guests this requires two traps for each PTE write (one for
each dword of the PTE).  Instead, do one mmu_update hypercall
directly.

During construction of the initial page tables, continue to use
native_set_pte() because most of the PTEs being set are in writable
and unpinned pages (see phys_pmd_init() in arch/x86/mm/init_64.c) and
using a hypercall for this is very expensive.

This significantly improves page fault performance in 32-bit PV
guests.

lmbench3 test  Before    After     Improvement
----------------------------------------------
lat_pagefault  3.18 us   2.32 us   27%
lat_proc fork  356 us    313.3 us  11%

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:51:43 -04:00
Liu, Jinsong
05e36006ad xen/mce: Register native mce handler as vMCE bounce back point
When Xen hypervisor inject vMCE to guest, use native mce handler
to handle it

Signed-off-by: Ke, Liping <liping.ke@intel.com>
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:51:38 -04:00
Liu, Jinsong
a8fccdb061 x86, MCE, AMD: Adjust initcall sequence for xen
there are 3 funcs which need to be _initcalled in a logic sequence:
1. xen_late_init_mcelog
2. mcheck_init_device
3. threshold_init_device

xen_late_init_mcelog must register xen_mce_chrdev_device before
native mce_chrdev_device registration if running under xen platform;

mcheck_init_device should be inited before threshold_init_device to
initialize mce_device, otherwise a a NULL ptr dereference will cause panic.

so we use following _initcalls
1. device_initcall(xen_late_init_mcelog);
2. device_initcall_sync(mcheck_init_device);
3. late_initcall(threshold_init_device);

when running under xen, the initcall order is 1,2,3;
on baremetal, we skip 1 and we do only 2 and 3.

Acked-and-tested-by: Borislav Petkov <bp@amd64.org>
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:51:37 -04:00
Liu, Jinsong
cef12ee52b xen/mce: Add mcelog support for Xen platform
When MCA error occurs, it would be handled by Xen hypervisor first,
and then the error information would be sent to initial domain for logging.

This patch gets error information from Xen hypervisor and convert
Xen format error into Linux format mcelog. This logic is basically
self-contained, not touching other kernel components.

By using tools like mcelog tool users could read specific error information,
like what they did under native Linux.

To test follow directions outlined in Documentation/acpi/apei/einj.txt

Acked-and-tested-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Ke, Liping <liping.ke@intel.com>
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:51:36 -04:00
Linus Torvalds
56b880e2e3 Merge branch 'fixes-for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull DMA-mapping fixes from Marek Szyprowski:
 "A set of minor fixes for dma-mapping code (ARM and x86) required for
  Contiguous Memory Allocator (CMA) patches merged in v3.5-rc1."

* 'fixes-for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  x86: dma-mapping: fix broken allocation when dma_mask has been provided
  ARM: dma-mapping: fix debug messages in dmabounce code
  ARM: mm: fix type of the arm_dma_limit global variable
  ARM: dma-mapping: Add missing static storage class specifier
2012-06-15 17:35:01 -07:00
Linus Torvalds
069915b946 Five bug-fixes:
- When booting as PVHVM we would try to use PV console - but would not validate
    the parameters causing us to crash during restore b/c we re-use the wrong event
    channel.
  - When booting on machines with SR-IOV PCI bridge we didn't check for the bridge
    and tried to use it.
  - Under AMD machines would advertise the APERFMPERF resulting in needless amount
    of MSRs from the guest.
  - A global value (xen_released_pages) was not subtracted at bootup when pages
    were added back in. This resulted in the balloon worker having the wrong
    account of how many pages were truly released.
  - Fix dead-lock when xen-blkfront is run in the same domain as xen-blkback.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJP2kcmAAoJEFjIrFwIi8fJXLcH/0a2m6KMcyjc4WaCHspAEFDL
 9B055QUFDEOlH55wE2QeED/8D+0HUbTYnQBycH126XLKzLfRv1fsrKFKDSA/SWW2
 Mh8N316UrY5Wc3KMdxXdCXJCDqDs7VhARTv6JdlUqUlH9oLRYE6CMRO8MujT0iwd
 r+uEnNuW0udMFt8x9SnJW7pEaq7u2N5koEGdWEzZhfoumDaCRxm5OKAKXZ0DZlEZ
 /BPjTW/N+Pf4u+bJZY+wQq41y4zGMqu7TDo/hOpuGZxeqtVnCE9trBbuGLnp4K+W
 n4TfZZs9Y1kovSMj6qTeB0aP0F77tqHyXPb1oPKxm2kWfqT2dFtIRpuLtXYSC+o=
 =cQl2
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull five Xen bug-fixes from Konrad Rzeszutek Wilk:

 - When booting as PVHVM we would try to use PV console - but would not validate
   the parameters causing us to crash during restore b/c we re-use the wrong event
   channel.
 - When booting on machines with SR-IOV PCI bridge we didn't check for the bridge
   and tried to use it.
 - Under AMD machines would advertise the APERFMPERF resulting in needless amount
   of MSRs from the guest.
 - A global value (xen_released_pages) was not subtracted at bootup when pages
   were added back in. This resulted in the balloon worker having the wrong
   account of how many pages were truly released.
 - Fix dead-lock when xen-blkfront is run in the same domain as xen-blkback.

* tag 'stable/for-linus-3.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: mark local pages as FOREIGN in the m2p_override
  xen/setup: filter APERFMPERF cpuid feature out
  xen/balloon: Subtract from xen_released_pages the count that is populated.
  xen/pci: Check for PCI bridge before using it.
  xen/events: Add WARN_ON when quick lookup found invalid type.
  xen/hvc: Check HVM_PARAM_CONSOLE_[EVTCHN|PFN] for correctness.
  xen/hvc: Fix error cases around HVM_PARAM_CONSOLE_PFN
  xen/hvc: Collapse error logic.
2012-06-15 17:17:15 -07:00
Linus Torvalds
c83119a980 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/smp: Fix topology checks on AMD MCM CPUs
  x86/mm: Fix some kernel-doc warnings
  x86, um: Correct syscall table type attributes breaking gcc 4.8
2012-06-15 16:59:19 -07:00
Linus Torvalds
ed21a66c18 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  watchdog: Quiet down the boot messages
  perf/x86: Fix broken LBR fixup code
  tracing: Have tracing_off() actually turn tracing off
2012-06-15 16:58:10 -07:00
Stefano Stabellini
b9e0d95c04 xen: mark local pages as FOREIGN in the m2p_override
When the frontend and the backend reside on the same domain, even if we
add pages to the m2p_override, these pages will never be returned by
mfn_to_pfn because the check "get_phys_to_machine(pfn) != mfn" will
always fail, so the pfn of the frontend will be returned instead
(resulting in a deadlock because the frontend pages are already locked).

INFO: task qemu-system-i38:1085 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-system-i38 D ffff8800cfc137c0     0  1085      1 0x00000000
 ffff8800c47ed898 0000000000000282 ffff8800be4596b0 00000000000137c0
 ffff8800c47edfd8 ffff8800c47ec010 00000000000137c0 00000000000137c0
 ffff8800c47edfd8 00000000000137c0 ffffffff82213020 ffff8800be4596b0
Call Trace:
 [<ffffffff81101ee0>] ? __lock_page+0x70/0x70
 [<ffffffff81a0fdd9>] schedule+0x29/0x70
 [<ffffffff81a0fe80>] io_schedule+0x60/0x80
 [<ffffffff81101eee>] sleep_on_page+0xe/0x20
 [<ffffffff81a0e1ca>] __wait_on_bit_lock+0x5a/0xc0
 [<ffffffff81101ed7>] __lock_page+0x67/0x70
 [<ffffffff8106f750>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff811867e6>] ? bio_add_page+0x36/0x40
 [<ffffffff8110b692>] set_page_dirty_lock+0x52/0x60
 [<ffffffff81186021>] bio_set_pages_dirty+0x51/0x70
 [<ffffffff8118c6b4>] do_blockdev_direct_IO+0xb24/0xeb0
 [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
 [<ffffffff8118ca95>] __blockdev_direct_IO+0x55/0x60
 [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
 [<ffffffff811e91c8>] ext3_direct_IO+0xf8/0x390
 [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
 [<ffffffff81004b60>] ? xen_mc_flush+0xb0/0x1b0
 [<ffffffff81104027>] generic_file_aio_read+0x737/0x780
 [<ffffffff813bedeb>] ? gnttab_map_refs+0x15b/0x1e0
 [<ffffffff811038f0>] ? find_get_pages+0x150/0x150
 [<ffffffff8119736c>] aio_rw_vect_retry+0x7c/0x1d0
 [<ffffffff811972f0>] ? lookup_ioctx+0x90/0x90
 [<ffffffff81198856>] aio_run_iocb+0x66/0x1a0
 [<ffffffff811998b8>] do_io_submit+0x708/0xb90
 [<ffffffff81199d50>] sys_io_submit+0x10/0x20
 [<ffffffff81a18d69>] system_call_fastpath+0x16/0x1b

The explanation is in the comment within the code:

We need to do this because the pages shared by the frontend
(xen-blkfront) can be already locked (lock_page, called by
do_read_cache_page); when the userspace backend tries to use them
with direct_IO, mfn_to_pfn returns the pfn of the frontend, so
do_blockdev_direct_IO is going to try to lock the same pages
again resulting in a deadlock.

A simplified call graph looks like this:

pygrub                          QEMU
-----------------------------------------------
do_read_cache_page              io_submit
  |                              |
lock_page                       ext3_direct_IO
                                 |
                                bio_add_page
                                 |
                                lock_page

Internally the xen-blkback uses m2p_add_override to swizzle (temporarily)
a 'struct page' to have a different MFN (so that it can point to another
guest). It also can easily find out whether another pfn corresponding
to the mfn exists in the m2p, and can set the FOREIGN bit
in the p2m, making sure that mfn_to_pfn returns the pfn of the backend.

This allows the backend to perform direct_IO on these pages, but as a
side effect prevents the frontend from using get_user_pages_fast on
them while they are being shared with the backend.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-06-14 14:03:41 -04:00
Marek Szyprowski
c080e26edc x86: dma-mapping: fix broken allocation when dma_mask has been provided
Commit 0a2b9a6ea9 ("X86: integrate CMA with DMA-mapping subsystem")
broke memory allocation with dma_mask. This patch fixes possible kernel
ops caused by lack of resetting page variable when jumping to 'again' label.

Reported-by: Konrad Rzeszutek Wilk <konrad@darnok.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
2012-06-14 14:01:30 +02:00
Stephane Eranian
25f4298582 perf/x86: Fix broken LBR fixup code
I noticed that the LBR fixups were not working anymore
on programs where they used to. I tracked this down to
a recent change to copy_from_user_nmi():

 db0dc75d64 ("perf/x86: Check user address explicitly in copy_from_user_nmi()")

This commit added a call to __range_not_ok() to the
copy_from_user_nmi() routine. The problem is that the logic
of the test must be reversed. __range_not_ok() returns 0 if the
range is VALID. We want to return early from copy_from_user_nmi()
if the range is NOT valid.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arun Sharma <asharma@fb.com>
Link: http://lkml.kernel.org/r/20120611134426.GA7542@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-13 15:00:28 +02:00
Borislav Petkov
161270fc1f x86/smp: Fix topology checks on AMD MCM CPUs
The warning below triggers on AMD MCM packages because physical package
IDs on the cores of a _physical_ socket are the same. I.e., this field
says which CPUs belong to the same physical package.

However, the same two CPUs belong to two different internal, i.e.
"logical" nodes in the same physical socket which is reflected in the
CPU-to-node map on x86 with NUMA.

Which makes this check wrong on the above topologies so circumvent it.

[    0.444413] Booting Node   0, Processors  #1 #2 #3 #4 #5 Ok.
[    0.461388] ------------[ cut here ]------------
[    0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
[    0.473960] Hardware name: Dinar
[    0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.486860] Booting Node   1, Processors  #6
[    0.491104] Modules linked in:
[    0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1
[    0.499510] Call Trace:
[    0.501946]  [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81
[    0.508185]  [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d
[    0.514163]  [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48
[    0.519881]  [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81
[    0.525943]  [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371
[    0.532004]  [<ffffffff8144c4ee>] start_secondary+0x19a/0x218
[    0.537729] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.628197]  #7 #8 #9 #10 #11 Ok.
[    0.807108] Booting Node   3, Processors  #12 #13 #14 #15 #16 #17 Ok.
[    0.897587] Booting Node   2, Processors  #18 #19 #20 #21 #22 #23 Ok.
[    0.917443] Brought up 24 CPUs

We ran a topology sanity check test we have here on it and
it all looks ok... hopefully :).

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120529135442.GE29157@aftab.osrc.amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-13 14:56:12 +02:00
Marcelo Tosatti
e32025a564 x86: kvmclock: remove check_and_clear_guest_paused warning
CPU offline path calls the hrtimer interrupt handler with interrupts
disabled, without touching preempt_count, triggering this warning.

Remove the warning since it is supposed to be used from hrtimer
interrupt context only.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-06-11 23:18:33 -03:00
Linus Torvalds
4e3c8a1b1c Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This push fixes an unaligned fault on x86-32 with aesni-intel and an
  RNG failure with atmel-rng (repeated bits)."

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: aesni-intel - fix unaligned cbc decrypt for x86-32
  hwrng: atmel-rng - fix race condition leading to repeated bits
2012-06-11 16:31:52 +03:00
Wanpeng Li
9efc31b81d x86/mm: Fix some kernel-doc warnings
Fix kernel-doc warnings in arch/x86/mm/ioremap.c and
arch/x86/mm/pageattr.c, just like this one:

  Warning(arch/x86/mm/ioremap.c:204):
     No description found for parameter 'phys_addr'
  Warning(arch/x86/mm/ioremap.c:204):
     Excess function parameter 'offset' description in 'ioremap_nocache'

Signed-off-by: Wanpeng Li <liwp@linux.vnet.ibm.com>
Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
Cc: Wanpeng Li <liwp.linux@gmail.com>
Link: http://lkml.kernel.org/r/1339296652-2935-1-git-send-email-liwp.linux@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-11 10:54:45 +02:00
Martin Pelikan
9271b0b4b2 x86, um: Correct syscall table type attributes breaking gcc 4.8
The latest GCC 4.8 does some more checking on type attributes that
break the build for ARCH=um -> fill them in.  Specifically, the
"asmlinkage" attributes is now tested for consistency.

Signed-off-by: Martin Pelikan <pelikan@storkhole.cz>
Link: http://lkml.kernel.org/r/1339269731-10772-1-git-send-email-pelikan@storkhole.cz
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-09 12:51:09 -07:00
Linus Torvalds
7249450449 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar.

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix the relax_domain_level boot parameter
  sched: Validate assumptions in sched_init_numa()
  sched: Always initialize cpu-power
  sched: Fix domain iteration
  sched/rt: Fix lockdep annotation within find_lock_lowest_rq()
  sched/numa: Load balance between remote nodes
  sched/x86: Calculate booted cores after construction of sibling_mask
2012-06-08 14:59:29 -07:00
Linus Torvalds
0b35d326f8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/nmi: Fix section mismatch warnings on 32-bit
  x86/uv: Fix UV2 BAU legacy mode
  x86/mm: Only add extra pages count for the first memory range during pre-allocation early page table space
  x86, efi stub: Add .reloc section back into image
  x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs
  x86/reboot: Fix a warning message triggered by stop_other_cpus()
  x86/intel/moorestown: Change intel_scu_devices_create() to __devinit
  x86/numa: Set numa_nodes_parsed at acpi_numa_memory_affinity_init()
  x86/gart: Fix kmemleak warning
  x86: mce: Add the dropped timer interval init back
  x86/mce: Fix the MCE poll timer logic
2012-06-08 09:26:55 -07:00
Linus Torvalds
106544d81d Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "A bit larger than what I'd wish for - half of it is due to hw driver
  updates to Intel Ivy-Bridge which info got recently released,
  cycles:pp should work there now too, amongst other things.  (but we
  are generally making exceptions for hardware enablement of this type.)

  There are also callchain fixes in it - responding to mostly
  theoretical (but valid) concerns.  The tooling side sports perf.data
  endianness/portability fixes which did not make it for the merge
  window - and various other fixes as well."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  perf/x86: Check user address explicitly in copy_from_user_nmi()
  perf/x86: Check if user fp is valid
  perf: Limit callchains to 127
  perf/x86: Allow multiple stacks
  perf/x86: Update SNB PEBS constraints
  perf/x86: Enable/Add IvyBridge hardware support
  perf/x86: Implement cycles:p for SNB/IVB
  perf/x86: Fix Intel shared extra MSR allocation
  x86/decoder: Fix bsr/bsf/jmpe decoding with operand-size prefix
  perf: Remove duplicate invocation on perf_event_for_each
  perf uprobes: Remove unnecessary check before strlist__delete
  perf symbols: Check for valid dso before creating map
  perf evsel: Fix 32 bit values endianity swap for sample_id_all header
  perf session: Handle endianity swap on sample_id_all header data
  perf symbols: Handle different endians properly during symbol load
  perf evlist: Pass third argument to ioctl explicitly
  perf tools: Update ioctl documentation for PERF_IOC_FLAG_GROUP
  perf tools: Make --version show kernel version instead of pull req tag
  perf tools: Check if callchain is corrupted
  perf callchain: Make callchain cursors TLS
  ...
2012-06-08 09:14:46 -07:00
Don Zickus
eeaaa96a3a x86/nmi: Fix section mismatch warnings on 32-bit
It was reported that compiling for 32-bit caused a bunch of
section mismatch warnings:

 VDSOSYM arch/x86/vdso/vdso32-syms.lds
  LD      arch/x86/vdso/built-in.o
  LD      arch/x86/built-in.o

 WARNING: arch/x86/built-in.o(.data+0x5af0): Section mismatch in
 reference from the variable test_nmi_ipi_callback_na.10451 to
 the function .init.text:test_nmi_ipi_callback() [...]

 WARNING: arch/x86/built-in.o(.data+0x5b04): Section mismatch in
 reference from the variable nmi_unk_cb_na.10399 to the function
 .init.text:nmi_unk_cb() The variable nmi_unk_cb_na.10399
 references the function __init nmi_unk_cb() [...]

Both of these are attributed to the internal representation of
the nmiaction struct created during register_nmi_handler.  The
reason for this is that those structs are not defined in the
init section whereas the rest of the code in nmi_selftest.c is.

To resolve this, I created a new #define,
register_nmi_handler_initonly, that tags the struct as
__initdata to resolve the mismatch.  This #define should only be
used in rare situations where the register/unregister is called
during init of the kernel.

Big thanks to Jan Beulich for decoding this for me as I didn't
have a clue what was going on.

Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl>
Tested-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl>
Cc: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/1338991542-23000-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08 12:19:27 +02:00
Cliff Wickman
d5d2d2eea8 x86/uv: Fix UV2 BAU legacy mode
The SGI Altix UV2 BAU (Broadcast Assist Unit) as used for
tlb-shootdown (selective broadcast mode) always uses UV2
broadcast descriptor format. There is no need to clear the
'legacy' (UV1) mode, because the hardware always uses UV2 mode
for selective broadcast.

But the BIOS uses general broadcast and legacy mode, and the
hardware pays attention to the legacy mode bit for general
broadcast. So the kernel must not clear that mode bit.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/E1SccoO-0002Lh-Cb@eag09.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08 11:48:28 +02:00
Yinghai Lu
bd2753b2dd x86/mm: Only add extra pages count for the first memory range during pre-allocation early page table space
Robin found this regression:

| I just tried to boot an 8TB system.  It fails very early in boot with:
| Kernel panic - not syncing: Cannot find space for the kernel page tables

git bisect commit 722bc6b167.

A git revert of that commit does boot past that point on the 8TB
configuration.

That commit will add up extra pages for all memory range even
above 4g.

Try to limit that extra page count adding to first entry only.

Bisected-by: Robin Holt <holt@sgi.com>
Tested-by: Robin Holt <holt@sgi.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/CAE9FiQUj3wyzQxtq9yzBNc9u220p8JZ1FYHG7t%3DMOzJ%3D9BZMYA@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08 11:40:50 +02:00
Jordan Justen
743628e868 x86, efi stub: Add .reloc section back into image
Some UEFI firmware will not load a .efi with a .reloc section
with a size of 0.

Therefore, we create a .efi image with 4 main areas and 3 sections.
1. PE/COFF file header
2. .setup section (covers all setup code following the first sector)
3. .reloc section (contains 1 dummy reloc entry, created in build.c)
4. .text section (covers the remaining kernel image)

To make room for the new .setup section data, the header
bugger_off_msg had to be shortened.

Reported-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Link: http://lkml.kernel.org/r/1339085121-12760-1-git-send-email-jordan.l.justen@intel.com
Tested-by: Lee G Rosenbaum <lee.g.rosenbaum@intel.com>
Tested-by: Henrik Rydberg <rydberg@euromail.se>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-07 09:52:33 -07:00
Arun Sharma
db0dc75d64 perf/x86: Check user address explicitly in copy_from_user_nmi()
Signed-off-by: Arun Sharma <asharma@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334961696-19580-5-git-send-email-asharma@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:08:04 +02:00
Arun Sharma
bc6ca7b342 perf/x86: Check if user fp is valid
Signed-off-by: Arun Sharma <asharma@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334961696-19580-4-git-send-email-asharma@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:08:01 +02:00
Arun Sharma
302fa4b58a perf/x86: Allow multiple stacks
Without this patch, applications with two different stack
regions (eg: native stack vs JIT stack) get truncated
callchains even when RBP chaining is present. GDB shows proper
stack traces and the frame pointer chaining is intact.

This patch disables the (fp < RSP) check, hoping that other checks
in the code save the day for us. In our limited testing, this
didn't seem to break anything.

In the long term, we could potentially have userspace advise
the kernel on the range of valid stack addresses, so we don't
spend a lot of time unwinding from bogus addresses.

Signed-off-by: Arun Sharma <asharma@fb.com>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334961696-19580-2-git-send-email-asharma@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:07:58 +02:00
Peter Zijlstra
8440ccb43f perf/x86: Update SNB PEBS constraints
Afaict there's no need to (incompletely) iterate the
MEM_UOPS_RETIRED.* umask state.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1338884803.28282.153.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:59:52 +02:00
Peter Zijlstra
b6db437ba8 perf/x86: Enable/Add IvyBridge hardware support
Implement rudimentary IVB perf support. The SDM states its identical
to SNB with exception of the exact event tables, but a quick look
suggests they're similar enough.

Also mark SNB-EP as broken for now.

Requested-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338884803.28282.153.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:59:49 +02:00
Peter Zijlstra
cccb9ba9e4 perf/x86: Implement cycles:p for SNB/IVB
Now that there's finally a chip with working PEBS (IvyBridge), we can
enable the hardware and implement cycles:p for SNB/IVB.

Cc: Stephane Eranian <eranian@google.com>
Requested-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338884803.28282.153.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:59:47 +02:00
Peter Zijlstra
b430f7c470 perf/x86: Fix Intel shared extra MSR allocation
Zheng Yan reported that event group validation can wreck event state
when Intel extra_reg allocation changes event state.

Validation shouldn't change any persistent state. Cloning events in
validate_{event,group}() isn't really pretty either, so add a few
special cases to avoid modifying the event state.

The code is restructured to minimize the special case impact.

Reported-by: Zheng Yan <zheng.z.yan@linux.intel.com>
Acked-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338903031.28282.175.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:59:44 +02:00
Kamalesh Babulal
ceb1cbac8e sched/x86: Calculate booted cores after construction of sibling_mask
Commit 316ad24830 ("sched/x86: Rewrite set_cpu_sibling_map()")
broke the booted_cores accounting.

The problem is that the booted_cores accounting needs all the
sibling links set up. So restore the second loop and add a comment as
to why its needed.

On qemu booted with -smp sockets=1,cores=2,threads=2;
Before:
 $ grep cores /proc/cpuinfo
 cpu cores       : 2
 cpu cores       : 1
 cpu cores       : 4
 cpu cores       : 3

With the patch:
 $ grep cores /proc/cpuinfo
 cpu cores       : 2
 cpu cores       : 2
 cpu cores       : 2
 cpu cores       : 2

Reported-by: Prarit Bhargava <prarit@redhat.com>
Reported-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120531073738.GH7511@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:37:59 +02:00
Tomoki Sekiyama
f6175f5bfb x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs
In current Linux, percpu variable `vector_irq' is not cleared on
offlined cpus while disabling devices' irqs. If the cpu that has
the disabled irqs in vector_irq is hotplugged,
__setup_vector_irq() hits invalid irq vector and may crash.

This bug can be reproduced as following;

  # echo 0 > /sys/devices/system/cpu/cpu7/online
  # modprobe -r some_driver_using_interrupts      # vector_irq@cpu7 uncleared
  # echo 1 > /sys/devices/system/cpu/cpu7/online  # kernel may crash

This patch fixes this bug by clearing vector_irq in
__clear_irq_vector() even if the cpu is offlined.

Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: ltc-kernel@ml.yrl.intra.hitachi.co.jp
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/4FC340BE.7080101@hitachi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 12:03:25 +02:00
Feng Tang
55c844a4dd x86/reboot: Fix a warning message triggered by stop_other_cpus()
When rebooting our 24 CPU Westmere servers with 3.4-rc6, we
always see this warning msg:

Restarting system.
machine restart
------------[ cut here ]------------
WARNING: at arch/x86/kernel/smp.c:125
native_smp_send_reschedule+0x74/0xa7() Hardware name: X8DTN
Modules linked in: igb [last unloaded: scsi_wait_scan]
Pid: 1, comm: systemd-shutdow Not tainted 3.4.0-rc6+ #22
Call Trace:
 <IRQ>  [<ffffffff8102a41f>] warn_slowpath_common+0x7e/0x96
 [<ffffffff8102a44c>] warn_slowpath_null+0x15/0x17
 [<ffffffff81018cf7>] native_smp_send_reschedule+0x74/0xa7
 [<ffffffff810561c1>] trigger_load_balance+0x279/0x2a6
 [<ffffffff81050112>] scheduler_tick+0xe0/0xe9
 [<ffffffff81036768>] update_process_times+0x60/0x70
 [<ffffffff81062f2f>] tick_sched_timer+0x68/0x92
 [<ffffffff81046e33>] __run_hrtimer+0xb3/0x13c
 [<ffffffff81062ec7>] ? tick_nohz_handler+0xd0/0xd0
 [<ffffffff810474f2>] hrtimer_interrupt+0xdb/0x198
 [<ffffffff81019a35>] smp_apic_timer_interrupt+0x81/0x94
 [<ffffffff81655187>] apic_timer_interrupt+0x67/0x70
 <EOI>  [<ffffffff8101a3c4>] ? default_send_IPI_mask_allbutself_phys+0xb4/0xc4
 [<ffffffff8101c680>] physflat_send_IPI_allbutself+0x12/0x14
 [<ffffffff81018db4>] native_nmi_stop_other_cpus+0x8a/0xd6
 [<ffffffff810188ba>] native_machine_shutdown+0x50/0x67
 [<ffffffff81018926>] machine_shutdown+0xa/0xc
 [<ffffffff8101897e>] native_machine_restart+0x20/0x32
 [<ffffffff810189b0>] machine_restart+0xa/0xc
 [<ffffffff8103b196>] kernel_restart+0x47/0x4c
 [<ffffffff8103b2e6>] sys_reboot+0x13e/0x17c
 [<ffffffff8164e436>] ? _raw_spin_unlock_bh+0x10/0x12
 [<ffffffff810fcac9>] ? bdi_queue_work+0xcf/0xd8
 [<ffffffff810fe82f>] ? __bdi_start_writeback+0xae/0xb7
 [<ffffffff810e0d64>] ? iterate_supers+0xa3/0xb7
 [<ffffffff816547a2>] system_call_fastpath+0x16/0x1b
---[ end trace 320af5cb1cb60c5b ]---

The root cause seems to be the
default_send_IPI_mask_allbutself_phys() takes quite some time (I
measured it could be several ms) to complete sending NMIs to all
the other 23 CPUs, and for HZ=250/1000 system, the time is long
enough for a timer interrupt to happen, which will in turn
trigger to kick load balance to a stopped CPU and cause this
warning in native_smp_send_reschedule().

So disabling the local irq before stop_other_cpu() can fix this
problem (tested 25 times reboot ok), and it is fine as there
should be nobody caring the timer interrupt in such reboot
stage.

The latest 3.4 kernel slightly changes this behavior by sending
REBOOT_VECTOR first and only send NMI_VECTOR if the REBOOT_VCTOR
fails, and this patch is still needed to prevent the problem.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120530231541.4c13433a@feng-i7
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 12:03:23 +02:00
Sebastian Andrzej Siewior
7071f6b288 x86/intel/moorestown: Change intel_scu_devices_create() to __devinit
The allmodconfig hits:

 WARNING: vmlinux.o(.text+0x6553d): Section mismatch in
          reference from the function intel_scu_devices_create() to the
          function .devinit.text: spi_register_board_info()
	  [...]

This patch marks intel_scu_devices_create() as devinit because
it only calls a devinit function, spi_register_board_info().

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Link: http://lkml.kernel.org/r/20120531212025.GA8519@breakpoint.cc
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 11:58:40 +02:00
Yasuaki Ishimatsu
4af463d28f x86/numa: Set numa_nodes_parsed at acpi_numa_memory_affinity_init()
When hot-adding a CPU, the system outputs following messages
since node_to_cpumask_map[2] was not allocated memory.

Booting Node 2 Processor 32 APIC 0xc0
node_to_cpumask_map[2] NULL
Pid: 0, comm: swapper/32 Tainted: G       A     3.3.5-acd #21
Call Trace:
 [<ffffffff81048845>] debug_cpumask_set_cpu+0x155/0x160
 [<ffffffff8105e28a>] ? add_timer_on+0xaa/0x120
 [<ffffffff8150665f>] numa_add_cpu+0x1e/0x22
 [<ffffffff815020bb>] identify_cpu+0x1df/0x1e4
 [<ffffffff815020d6>] identify_econdary_cpu+0x16/0x1d
 [<ffffffff81504614>] smp_store_cpu_info+0x3c/0x3e
 [<ffffffff81505263>] smp_callin+0x139/0x1be
 [<ffffffff815052fb>] start_secondary+0x13/0xeb

The reason is that the bit of node 2 was not set at
numa_nodes_parsed. numa_nodes_parsed is set by only
acpi_numa_processor_affinity_init /
acpi_numa_x2apic_affinity_init. Thus even if hot-added memory
which is same PXM as hot-added CPU is written in ACPI SRAT
Table, if the hot-added CPU is not written in ACPI SRAT table,
numa_nodes_parsed is not set.

But according to ACPI Spec Rev 5.0, it says about ACPI SRAT
table as follows: This optional table provides information that
allows OSPM to associate processors and memory ranges, including
ranges of memory provided by hot-added memory devices, with
system localities / proximity domains and clock domains.

It means that ACPI SRAT table only provides information for CPUs
present at boot time and for memory including hot-added memory.
So hot-added memory is written in ACPI SRAT table, but hot-added
CPU is not written in it. Thus numa_nodes_parsed should be set
by not only acpi_numa_processor_affinity_init /
acpi_numa_x2apic_affinity_init but also
acpi_numa_memory_affinity_init for the case.

Additionally, if system has cpuless memory node,
acpi_numa_processor_affinity_init /
acpi_numa_x2apic_affinity_init cannot set numa_nodes_parseds
since these functions cannot find cpu description for the node.
In this case, numa_nodes_parsed needs to be set by
acpi_numa_memory_affinity_init.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: liuj97@gmail.com
Cc: kosaki.motohiro@gmail.com
Link: http://lkml.kernel.org/r/4FCC2098.4030007@jp.fujitsu.com
[ merged it ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 11:58:39 +02:00
Xiaotian Feng
aff5a62d52 x86/gart: Fix kmemleak warning
aperture_64.c now is using memblock, the previous
kmemleak_ignore() for alloc_bootmem() should be removed then.

Otherwise, with kmemleak enabled, kernel will throw warnings
like:

[    0.000000] kmemleak: Trying to color unknown object at 0xffff8800c4000000 as Black
[    0.000000] Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc1-next-20120605+ #130
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff811b27e6>] paint_ptr+0x66/0xc0
[    0.000000]  [<ffffffff816b90fb>] kmemleak_ignore+0x2b/0x60
[    0.000000]  [<ffffffff81ef7bc0>] kmemleak_init+0x217/0x2c1
[    0.000000]  [<ffffffff81ed2b97>] start_kernel+0x32d/0x3eb
[    0.000000]  [<ffffffff81ed25e4>] ? repair_env_string+0x5a/0x5a
[    0.000000]  [<ffffffff81ed2356>] x86_64_start_reservations+0x131/0x135
[    0.000000]  [<ffffffff81ed2120>] ? early_idt_handlers+0x120/0x120
[    0.000000]  [<ffffffff81ed245c>] x86_64_start_kernel+0x102/0x111
[    0.000000] kmemleak: Early log backtrace:
[    0.000000]    [<ffffffff816b911b>] kmemleak_ignore+0x4b/0x60
[    0.000000]    [<ffffffff81ee6a38>] gart_iommu_hole_init+0x3e7/0x547
[    0.000000]    [<ffffffff81edb20b>] pci_iommu_alloc+0x44/0x6f
[    0.000000]    [<ffffffff81ee81ad>] mem_init+0x19/0xec
[    0.000000]    [<ffffffff81ed2a54>] start_kernel+0x1ea/0x3eb
[    0.000000]    [<ffffffff81ed2356>] x86_64_start_reservations+0x131/0x135
[    0.000000]    [<ffffffff81ed245c>] x86_64_start_kernel+0x102/0x111
[    0.000000]    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
Cc: Xiaotian Feng <xtfeng@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1338922831-2847-1-git-send-email-xtfeng@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 11:58:38 +02:00
Thomas Gleixner
1a87fc1ec7 x86: mce: Add the dropped timer interval init back
commit 82f7af09 ("x86/mce: Cleanup timer mess) dropped the
initialization of the per cpu timer interval. Duh :(

Restore the previous behaviour.

Reported-by: Chen Gong <gong.chen@linux.intel.com>
Cc: bp@amd64.org
Cc: tony.luck@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-06-06 11:33:21 +02:00
Masami Hiramatsu
436d03faf6 x86/decoder: Fix bsr/bsf/jmpe decoding with operand-size prefix
Fix the x86 instruction decoder to decode bsr/bsf/jmpe with
operand-size prefix (66h). This fixes the test case failure
reported by Linus, attached below.

bsf/bsr/jmpe have a special encoding. Opcode map in
Intel Software Developers Manual vol2 says they have
TZCNT/LZCNT variants if it has F3h prefix. However, there
is no information if it has other 66h or F2h prefixes.
Current instruction decoder supposes that those are
bad instructions, but it actually accepts at least
operand-size prefixes.

H. Peter Anvin further explains:

 " TZCNT/LZCNT are F3 + BSF/BSR exactly because the F2 and
   F3 prefixes have historically been no-ops with most instructions.
   This allows software to unconditionally use the prefixed versions
   and get TZCNT/LZCNT on the processors that have them if they don't
   care about the difference. "

This fixes errors reported by test_get_len:

  Warning: arch/x86/tools/test_get_len found difference at <em_bsf>:ffffffff81036d87
  Warning: ffffffff81036de5:	66 0f bc c2          	bsf    %dx,%ax
  Warning: objdump says 4 bytes, but insn_get_length() says 3
  Warning: arch/x86/tools/test_get_len found difference at <em_bsr>:ffffffff81036ea6
  Warning: ffffffff81036f04:	66 0f bd c2          	bsr    %dx,%ax
  Warning: objdump says 4 bytes, but insn_get_length() says 3
  Warning: decoded and checked 13298882 instructions with 2 warnings

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <yrl.pp-manager.tt@hitachi.com>
Link: http://lkml.kernel.org/r/20120604150911.22338.43296.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 08:54:18 +02:00
Chen Gong
958fb3c512 x86/mce: Fix the MCE poll timer logic
In commit 82f7af09 ("x86/mce: Cleanup timer mess), Thomas just
forgot the "/ 2" there while cleaning up.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@amd64.org
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/1338863702-9245-1-git-send-email-gong.chen@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 08:28:21 +02:00
Linus Torvalds
eea5b5510f Typo/thinko in a cleanup caused a semantic change. Fix it.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPzkYHAAoJEKurIx+X31iBeZ8QAK44Watxy0Ib0IUGrx2wcds8
 dEgVNyv9qrT28/PCBQiFtoRB83vsWWOzdNDWH6s4AAB6cFRbS8dOHvjUTsFfwgJ1
 lgD9URJmegBFInmkwSBVhY/MpwNm/pw0CNIs2ymqAvSlAgj8I8zF5xWsxTZfJcYn
 gxDNTxbuaEd3sPQO8wjBrw8NhCNpNwzEzZUXh31tM92bgpscIsXsJl/cRna5B1NU
 Z5LiSZev1W0/lf+Ys94ZsOSRT9zfjTI+mjXwv/lu8DlgeRQYyXixRjROgWvCbywx
 hlepvAQHtss9z5YTiGhRlPnR/CZ0fEUMQRtyRsp4qxG8BrgkWAdB++3ZMVbQYdom
 i98TQh1HZU3zzxueIwKwfjPKhG9q2Ee1XzE0ow7sXinBJQgiGrXiEv1tcX7001P7
 vpkyqVon2KKSYknxdHtbc6XnwjbzGDEoS0fqZf0boKoHee7KWBmFyX9JXLvZjtY1
 ef4FqTZNccYWL/5Hi0slZWAucC5iPleeV6sm9y4xG/gJFbTIw+joq1dc3pBZJ2uR
 rHhxD5tWTwbovsq1igcjAbrh9davwiFWiufW3Y5GdTAZJJ1tF6YjCmg18QbvaHJj
 9uplEUBUA4N6UUqPCjdKRdPaxPwkNOmjYH3YQIajSmLcn0YoI9NYw4Xn0sp2jCBo
 zGnaJvG2IIC9LNgaVohz
 =LQWb
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull MCE regression fix from Tony Luck:
 "Typo/thinko in a cleanup caused a semantic change. Fix it."

* tag 'please-pull-mce' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  x86/mce: Fix the MCE poll timer logic
2012-06-05 15:15:04 -07:00
Chen Gong
c2238f10e0 x86/mce: Fix the MCE poll timer logic
In commit 82f7af09 (x86/mce: Cleanup timer mess), Thomas just forgot
the "/ 2" there while cleaning up.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-06-05 10:15:07 -07:00
Linus Torvalds
0b3e9f3f21 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar.

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Remove NULL assignment of dattr_cur
  sched: Remove the last NULL entry from sched_feat_names
  sched: Make sched_feat_names const
  sched/rt: Fix SCHED_RR across cgroups
  sched: Move nr_cpus_allowed out of 'struct sched_rt_entity'
  sched: Make sure to not re-read variables after validation
  sched: Fix SD_OVERLAP
  sched: Don't try allocating memory from offline nodes
  sched/nohz: Fix rq->cpu_load calculations some more
  sched/x86: Use cpu_llc_shared_mask(cpu) for coregroup_mask
2012-06-05 09:47:15 -07:00
Linus Torvalds
63004afa71 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull straggler x86 fixes from Peter Anvin:
 "Three groups of patches:

  - EFI boot stub documentation and the ability to print error messages;
  - Removal for PTRACE_ARCH_PRCTL for x32 (obsolete interface which
    should never have been ported, and the port is broken and
    potentially dangerous.)
  - ftrace stack corruption fixes.  I'm not super-happy about the
    technical implementation, but it is probably the least invasive in
    the short term.  In the future I would like a single method for
    nesting the debug stack, however."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, x32, ptrace: Remove PTRACE_ARCH_PRCTL for x32
  x86, efi: Add EFI boot stub documentation
  x86, efi; Add EFI boot stub console support
  x86, efi: Only close open files in error path
  ftrace/x86: Do not change stacks in DEBUG when calling lockdep
  x86: Allow nesting of the debug stack IDT setting
  x86: Reset the debug_stack update counter
  ftrace: Use breakpoint method to update ftrace caller
  ftrace: Synchronize variable setting with breakpoints
2012-06-02 16:17:03 -07:00