Commit Graph

19117 Commits

Author SHA1 Message Date
Andy Lutomirski
add4eed0a2 x86/vdso, build: Fix cross-compilation from big-endian architectures
This adds a macro GET(x) to convert x from big-endian to
little-endian.  Hopefully I put it everywhere it needs to go and got
all the cases needed for everyone's linux/elf.h.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2cf258df123cb24bad63c274c8563c050547d99d.1401464755.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-30 16:58:43 -07:00
Andy Lutomirski
011561837d x86/vdso, build: When vdso2c fails, unlink the output
This avoids bizarre failures if make is run again.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/1764385fe9931e8940b9d001132515448ea89523.1401464755.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-30 16:58:39 -07:00
H. Peter Anvin
03c1b4e8e5 Merge remote-tracking branch 'origin/x86/espfix' into x86/vdso
Merge x86/espfix into x86/vdso, due to changes in the vdso setup code
that otherwise cause conflicts.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-21 17:36:33 -07:00
H. Peter Anvin
e6ab9a20e7 Merge commit '7ed6fb9b5a5510e4ef78ab27419184741169978a' into x86/espfix
Merge in Linus' tree with:

fa81511bb0 x86-64, modify_ldt: Make support for 16-bit segments a runtime option

... reverted, to avoid a conflict.  This commit is no longer necessary
with the proper fix in place.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-21 15:23:19 -07:00
H. Peter Anvin
7ed6fb9b5a Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"
This reverts commit fa81511bb0 in
preparation of merging in the proper fix (espfix64).

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-21 10:22:59 -07:00
Andy Lutomirski
ac49b9a9f2 x86, mm: Replace arch_vma_name with vm_ops->name for vsyscalls
This removes the last vestiges of arch_vma_name from x86, replacing it
with vm_ops->name.  Good riddance.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/e681cb56096eee5b8b8767093a4f6fb82839f0a4.1400538962.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-20 11:39:31 -07:00
Andy Lutomirski
a62c34bd2a x86, mm: Improve _install_special_mapping and fix x86 vdso naming
Using arch_vma_name to give special mappings a name is awkward.  x86
currently implements it by comparing the start address of the vma to
the expected address of the vdso.  This requires tracking the start
address of special mappings and is probably buggy if a special vma
is split or moved.

Improve _install_special_mapping to just name the vma directly.  Use
it to give the x86 vvar area a name, which should make CRIU's life
easier.

As a side effect, the vvar area will show up in core dumps.  This
could be considered weird and is fixable.

[hpa: I say we accept this as-is but be prepared to deal with knocking
 out the vvars from core dumps if this becomes a problem.]

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-20 11:38:42 -07:00
Andy Lutomirski
1e844fb43c x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
The oops can be triggered in qemu using -no-hpet (but not nohpet) by
reading a couple of pages past the end of the vdso text.  This
should send SIGBUS instead of OOPSing.

The bug was introduced by:

commit 7a59ed415f
Author: Stefani Seibold <stefani@seibold.net>
Date:   Mon Mar 17 23:22:09 2014 +0100

    x86, vdso: Add 32 bit VDSO time support for 32 bit kernel

which is new in 3.15.

This will be fixed separately in 3.15, but that patch will not apply
to tip/x86/vdso.  This is the equivalent fix for tip/x86/vdso and,
presumably, 3.16.

Cc: Stefani Seibold <stefani@seibold.net>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/c8b0a9a0b8d011a8b273cbb2de88d37190ed2751.1400538962.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-20 11:36:21 -07:00
Linus Torvalds
fa81511bb0 x86-64, modify_ldt: Make support for 16-bit segments a runtime option
Checkin:

b3b42ac2cb x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

disabled 16-bit segments on 64-bit kernels due to an information
leak.  However, it does seem that people are genuinely using Wine to
run old 16-bit Windows programs on Linux.

A proper fix for this ("espfix64") is coming in the upcoming merge
window, but as a temporary fix, create a sysctl to allow the
administrator to re-enable support for 16-bit segments.

It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If
you hit this issue and care about your old Windows program more than
you care about a kernel stack address information leak, you can do

   echo 1 > /proc/sys/abi/ldt16

as root (add it to your startup scripts), and you should be ok.

The sysctl table is only added if you have COMPAT support enabled on
x86-64, but I assume anybody who runs old windows binaries very much
does that ;)

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/CA%2B55aFw9BPoD10U1LfHbOMpHWZkvJTkMcfCs9s3urPr1YyWBxw@mail.gmail.com
Cc: <stable@vger.kernel.org>
2014-05-14 16:33:54 -07:00
Anthony Iliopoulos
9844f54623 x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow()
The invalidation is required in order to maintain proper semantics
under CoW conditions. In scenarios where a process clones several
threads, a thread operating on a core whose DTLB entry for a
particular hugepage has not been invalidated, will be reading from
the hugepage that belongs to the forked child process, even after
hugetlb_cow().

The thread will not see the updated page as long as the stale DTLB
entry remains cached, the thread attempts to write into the page,
the child process exits, or the thread gets migrated to a different
processor.

Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com>
Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp
Suggested-by: Shay Goikhman <shay.goikhman@huawei.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v2.6.16+ (!)
2014-05-13 16:34:09 -07:00
H. Peter Anvin
7a5091d584 x86, rdrand: When nordrand is specified, disable RDSEED as well
One can logically expect that when the user has specified "nordrand",
the user doesn't want any use of the CPU random number generator,
neither RDRAND nor RDSEED, so disable both.

Reported-by: Stephan Mueller <smueller@chronox.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Link: http://lkml.kernel.org/r/21542339.0lFnPSyGRS@myon.chronox.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-11 20:25:20 -07:00
Boris Ostrovsky
28b92e09e2 x86, vdso, time: Cast tv_nsec to u64 for proper shifting in update_vsyscall()
With tk->wall_to_monotonic.tv_nsec being a 32-bit value on 32-bit
systems, (tk->wall_to_monotonic.tv_nsec << tk->shift) in update_vsyscall()
may lose upper bits or, worse, add them since compiler will do this:
	(u64)(tk->wall_to_monotonic.tv_nsec << tk->shift)
instead of
	((u64)tk->wall_to_monotonic.tv_nsec << tk->shift)

So if, for example, tv_nsec is 0x800000 and shift is 8 we will end up
with 0xffffffff80000000 instead of 0x80000000. And then we are stuck in
the subsequent 'while' loop.

We need an explicit cast.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1399648287-15178-1-git-send-email-boris.ostrovsky@oracle.com
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org> # v3.14
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-09 08:45:52 -07:00
Andres Freund
c45f77364b x86: Fix typo in MSR_IA32_MISC_ENABLE_LIMIT_CPUID macro
The spuriously added semicolon didn't have any effect because the
macro isn't currently in use.

c0a639ad0b

Signed-off-by: Andres Freund <andres@anarazel.de>
Link: http://lkml.kernel.org/r/1399598957-7011-3-git-send-email-andres@anarazel.de
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-09 08:42:47 -07:00
Andres Freund
722a0d22d0 x86: Fix typo preventing msr_set/clear_bit from having an effect
Due to a typo the msr accessor function introduced in
22085a66c2 didn't have any lasting
effects because they accidentally wrote the old value back.

After c0a639ad0b this at the very least
this causes cpuid limits not to be lifted on some cpus leading to
missing capabilities for those.

Signed-off-by: Andres Freund <andres@anarazel.de>
Link: http://lkml.kernel.org/r/1399598957-7011-2-git-send-email-andres@anarazel.de
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-09 08:42:32 -07:00
Feng Tang
62187910b0 x86/intel: Add quirk to disable HPET for the Baytrail platform
HPET on current Baytrail platform has accuracy problem to be
used as reliable clocksource/clockevent, so add a early quirk to
disable it.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1398327498-13163-2-git-send-email-feng.tang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-08 08:15:34 +02:00
Feng Tang
f10f383d84 x86/hpet: Make boot_hpet_disable extern
HPET on some platform has accuracy problem. Making
"boot_hpet_disable" extern so that we can runtime disable
the HPET timer by using quirk to check the platform.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1398327498-13163-1-git-send-email-feng.tang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-08 08:15:34 +02:00
George Spelvin
14262d67fe x86-64, build: Fix stack protector Makefile breakage with 32-bit userland
If you are using a 64-bit kernel with 32-bit userland, then
scripts/gcc-x86_64-has-stack-protector.sh invokes 32-bit gcc
with -mcmodel=kernel, which produces:

<stdin>:1:0: error: code model 'kernel' not supported in the 32 bit mode

and trips the "broken compiler" test at arch/x86/Makefile:120.

There are several places a fix is possible, but the following seems
cleanest.  (But it's minimal; it would also be possible to factor
out a bunch of stuff from the two branches of the if.)

Signed-off-by: George Spelvin <linux@horizon.com>
Link: http://lkml.kernel.org/r/20140507210552.7581.qmail@ns.horizon.com
Cc: <stable@vger.kernel.org> # v3.14
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-07 14:14:44 -07:00
Christian Gmeiner
aadca6fa40 x86/reboot: Add reboot quirk for Certec BPC600
Certec BPC600 needs reboot=pci to actually reboot.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Li Aubrey <aubrey.li@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1399446114-2147-1-git-send-email-christian.gmeiner@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-07 11:22:10 +02:00
Andi Kleen
2605fc216f asmlinkage, x86: Add explicit __visible to arch/x86/*
As requested by Linus add explicit __visible to the asmlinkage users.
This marks all functions visible to assembler.

Tree sweep for arch/x86/*

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 16:07:44 -07:00
H. Peter Anvin
ac008fe0a3 x86, build: Don't get confused by local symbols
arch/x86/crypto/sha1_avx2_x86_64_asm.S introduced _end as a local
symbol, which broke the build under certain circumstances.  Although
the wisdom of _end as a local symbol can definitely be questioned, the
build should not break for that reason.

Thus, filter the output of nm to only get global symbols of
appropriate type.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-uxm3j3w3odglcwhafwq5tjqu@git.kernel.org
2014-05-05 15:23:35 -07:00
Andy Lutomirski
2b6f2e649f x86, vdso: Remove vestiges of VDSO_PRELINK and some outdated comments
These definitions had no effect.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/946c104e40c47319f8ab406e54118799cb55bd99.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:19:07 -07:00
Andy Lutomirski
f40c330091 x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO
This makes the 64-bit and x32 vdsos use the same mechanism as the
32-bit vdso.  Most of the churn is deleting all the old fixmap code.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/8af87023f57f6bb96ec8d17fce3f88018195b49b.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:19:01 -07:00
Andy Lutomirski
18d0a6fd22 x86, vdso: Move the 32-bit vdso special pages after the text
This unifies the vdso mapping code and teaches it how to map special
pages at addresses corresponding to symbols in the vdso image.  The
new code is used for all vdso variants, but so far only the 32-bit
variants use the new vvar page position.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/b6d7858ad7b5ac3fd3c29cab6d6d769bc45d195e.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:18:56 -07:00
Andy Lutomirski
6f121e548f x86, vdso: Reimplement vdso.so preparation in build-time C
Currently, vdso.so files are prepared and analyzed by a combination
of objcopy, nm, some linker script tricks, and some simple ELF
parsers in the kernel.  Replace all of that with plain C code that
runs at build time.

All five vdso images now generate .c files that are compiled and
linked in to the kernel image.

This should cause only one userspace-visible change: the loaded vDSO
images are stripped more heavily than they used to be.  Everything
outside the loadable segment is dropped.  In particular, this causes
the section table and section name strings to be missing.  This
should be fine: real dynamic loaders don't load or inspect these
tables anyway.  The result is roughly equivalent to eu-strip's
--strip-sections option.

The purpose of this change is to enable the vvar and hpet mappings
to be moved to the page following the vDSO load segment.  Currently,
it is possible for the section table to extend into the page after
the load segment, so, if we map it, it risks overlapping the vvar or
hpet page.  This happens whenever the load segment is just under a
multiple of PAGE_SIZE.

The only real subtlety here is that the old code had a C file with
inline assembler that did 'call VDSO32_vsyscall' and a linker script
that defined 'VDSO32_vsyscall = __kernel_vsyscall'.  This most
likely worked by accident: the linker script entry defines a symbol
associated with an address as opposed to an alias for the real
dynamic symbol __kernel_vsyscall.  That caused ld to relocate the
reference at link time instead of leaving an interposable dynamic
relocation.  Since the VDSO32_vsyscall hack is no longer needed, I
now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it
work.  vdso2c will generate an error and abort the build if the
resulting image contains any dynamic relocations, so we won't
silently generate bad vdso images.

(Dynamic relocations are a problem because nothing will even attempt
to relocate the vdso.)

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:18:51 -07:00
Andy Lutomirski
cfda7bb9ec x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c
This code is used during CPU setup, and it isn't strictly speaking
related to the 32-bit vdso.  It's easier to understand how this
works when the code is closer to its callers.

This also lets syscall32_cpu_init be static, which might save some
trivial amount of kernel text.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/4e466987204e232d7b55a53ff6b9739f12237461.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:18:47 -07:00
Andy Lutomirski
3d7ee969bf x86, vdso: Clean up 32-bit vs 64-bit vdso params
Rather than using 'vdso_enabled' and an awful #define, just call the
parameters vdso32_enabled and vdso64_enabled.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/87913de56bdcbae3d93917938302fc369b05caee.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:18:40 -07:00
Andy Lutomirski
73159fdcdb x86, mm: Ensure correct alignment of the fixmap
The early_ioremap code requires that its buffers not span a PMD
boundary.  The logic for ensuring that only works if the fixmap is
aligned, so assert that it's aligned correctly.

To make this work reliably, reserve_top_address needs to be
adjusted.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/e59a5f4362661f75dd4841fa74e1f2448045e245.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:18:25 -07:00
H. Peter Anvin
34273f41d5 x86, espfix: Make it possible to disable 16-bit support
Embedded systems, which may be very memory-size-sensitive, are
extremely unlikely to ever encounter any 16-bit software, so make it
a CONFIG_EXPERT option to turn off support for any 16-bit software
whatsoever.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
2014-05-04 12:27:37 -07:00
Ingo Molnar
0214196ce0 * Fix earlyprintk=efi,keep support by switching to an ioremap() mapping
of the framebuffer when early_ioremap() is no longer available and
    dropping __init from functions that may be invoked after
    free_initmem() - Dave Young
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTZIL0AAoJEC84WcCNIz1Vr9gP/RCHnmo9+w88ujYMjXtoq+/b
 qDX/Fl8/as/gJ8cKhOVlQpC/t4VbC28mRkxV3J8NS/AklY0mU2R8TatprIyUoKAI
 oPZwdSbuEIS8ehCr/D+6aAIGLtFYaLD8VK27niNHEHVytZytPqQGpDKARgphin5l
 AqtEUv9NNfLaN/aHUuMV33xlD4r25BoWlj3RD2h+Rpnu2/vBXs14NTBN1r+SrLFh
 r8htTDsbm3NjDCvboYyPJjnFZvlYqxtLCBC2vVD8fBvaXcBmj/vLP6WmFd3sxbTZ
 4CLmRMShaqh87JH9gdg0m/xJ5sEgRqqvMiqjcaAuJzAew0eE6gUZjE9+fawWYHwT
 XU0kcsM9wn/014f9fUdqaqM38o/XbnVcW+D5iSrwcx6hhNHzf7nFGnSndN2tednQ
 k3z3tpX/GB9u5l0064Clru6GbSnV2cSfayaoIc4sULDrp7KBmyrlwBtsQ67C/JfV
 0gJ4ridzbFllHBiw3Cyw8vzLDPgQ6t2DGw6RkzUpbMwLZG5YMRcyNODWewcTuH7g
 VcMMaDKVw7uCrItFyTscMuUe1nVnbZANdLu9znF8TejgX1MzwwmdetqAE/WPR+3V
 vZoYGNE5zAwGhqF34BLSof9BHoeOjucx1qgaV3QYhrdtgtTXaGf++TvwOhpCVNOC
 vhUguxcrMLOM68He6o5H
 =BzhM
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fix from Matt Fleming:

" * Fix earlyprintk=efi,keep support by switching to an ioremap() mapping
    of the framebuffer when early_ioremap() is no longer available and
    dropping __init from functions that may be invoked after
    free_initmem() - Dave Young "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-04 20:20:42 +02:00
H. Peter Anvin
197725de65 x86, espfix: Make espfix64 a Kconfig option, fix UML
Make espfix64 a hidden Kconfig option.  This fixes the x86-64 UML
build which had broken due to the non-existence of init_espfix_bsp()
in UML: since UML uses its own Kconfig, this option does not appear in
the UML build.

This also makes it possible to make support for 16-bit segments a
configuration option, for the people who want to minimize the size of
the kernel.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Richard Weinberger <richard@nod.at>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
2014-05-04 10:00:49 -07:00
Linus Torvalds
0384dcae2b Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "This udpate delivers:

   - A fix for dynamic interrupt allocation on x86 which is required to
     exclude the GSI interrupts from the dynamic allocatable range.

     This was detected with the newfangled tablet SoCs which have GPIOs
     and therefor allocate a range of interrupts.  The MSI allocations
     already excluded the GSI range, so we never noticed before.

   - The last missing set_irq_affinity() repair, which was delayed due
     to testing issues

   - A few bug fixes for the armada SoC interrupt controller

   - A memory allocation fix for the TI crossbar interrupt controller

   - A trivial kernel-doc warning fix"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: irq-crossbar: Not allocating enough memory
  irqchip: armanda: Sanitize set_irq_affinity()
  genirq: x86: Ensure that dynamic irq allocation does not conflict
  linux/interrupt.h: fix new kernel-doc warnings
  irqchip: armada-370-xp: Fix releasing of MSIs
  irqchip: armada-370-xp: implement the ->check_device() msi_chip operation
  irqchip: armada-370-xp: fix invalid cast of signed value into unsigned variable
2014-05-03 08:32:48 -07:00
Dave Young
5f35eb0e29 x86/efi: earlyprintk=efi,keep fix
earlyprintk=efi,keep will cause kernel hangs while freeing initmem like
below:

  VFS: Mounted root (ext4 filesystem) readonly on device 254:2.
  devtmpfs: mounted
  Freeing unused kernel memory: 880K (ffffffff817d4000 - ffffffff818b0000)

It is caused by efi earlyprintk use __init function which will be freed
later.  Such as early_efi_write is marked as __init, also it will use
early_ioremap which is init function as well.

To fix this issue, I added early initcall early_efi_map_fb which maps
the whole efi fb for later use. OTOH, adding a wrapper function
early_efi_map which calls early_ioremap before ioremap is available.

With this patch applied efi boot ok with earlyprintk=efi,keep console=efi

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-05-03 06:39:06 +01:00
Linus Torvalds
0845e11c2a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "Two very small changes: one fix for the vSMP Foundation platform, and
  one to help LLVM not choke on options it doesn't understand (although
  it probably should)"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vsmp: Fix irq routing
  x86: LLVMLinux: Wrap -mno-80387 with cc-option
2014-05-02 14:04:52 -07:00
H. Peter Anvin
20b68535cd x86, espfix: Fix broken header guard
Header guard is #ifndef, not #ifdef...

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-02 11:34:17 -07:00
Linus Torvalds
e7e6d2a4a1 - Fix for a Haswell regression in nested virtualization, introduced during
the merge window.
 
 - A fix from Oleg to async page faults.
 
 - A bunch of small ARM changes.
 
 - A trivial patch to use the new MSI-X API introduced during the merge
 window.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTY5anAAoJEBvWZb6bTYby6EQQAIWbOJCrLO3NQxwE9M7d8YvN
 oviaLFv7vJh1vaXVo7SNjBRXTq4pzWrhg9rwWlHBg1KnxJ0/sc9Tn07Fe+0bxWDh
 1XIFXNEkO9+Bpl43VnKGC7sbkYE9m3+jpGCWjF01vBCh+BY73wUOsPD0Zw9YQojN
 TKBtiQEjb8avuoTUR0JSOTwLZw4DlDRmRLHkNwlqqvbPdvuIWI/LG2wFUvY7/eq8
 dWxIPBjLKaIv2aUs9wGNNiz4Kb92uyH5L6bI6SK8VxphRA+51BOjMcBbzdY+Q1XL
 c4CTaL9ybAyUi4SRv41qWnM09YbI1FayUW93k9xz/vEplXOHp5R/lyUdZETd/d83
 GxaooTLcy9nOYeZ75buiH/0EG5HxI7On/QfUBEE3qIf8KfGgxb479HbRw6RnX4bf
 EhQzf7eyZvvk43Xk3OYwq8Ux1SOiXQEo+8TpCSaM/KN57cJbjGB4GCUK6JX8qJCx
 7MfXBdrhkAdw5V4lEBQMYKp4pdUdgYKRXavhLevm0qFjX1Swl6LIHxLtjFTKyX9S
 Xfxi09J7EUs7SsI35pdlMtPQkklEUXE96S/W3RCEpR+OfgbVMkYkcQI8TGb7ib3l
 xLNJrSgFDSlP5F3rN5SYIItAqboXb7iLp7SiF2ByXV43yexIrzTH0bwdwPwpZHhk
 2ziVieX5WXEX4tgzZkRj
 =+bLo
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 - Fix for a Haswell regression in nested virtualization, introduced
   during the merge window.
 - A fix from Oleg to async page faults.
 - A bunch of small ARM changes.
 - A trivial patch to use the new MSI-X API introduced during the merge
   window.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: ARM: vgic: Fix the overlap check action about setting the GICD & GICC base address.
  KVM: arm/arm64: vgic: fix GICD_ICFGR register accesses
  KVM: async_pf: mm->mm_users can not pin apf->mm
  KVM: ARM: vgic: Fix sgi dispatch problem
  MAINTAINERS: co-maintainance of KVM/{arm,arm64}
  arm: KVM: fix possible misalignment of PGDs and bounce page
  KVM: x86: Check for host supported fields in shadow vmcs
  kvm: Use pci_enable_msix_exact() instead of pci_enable_msix()
  ARM: KVM: disable KVM in Kconfig on big-endian systems
2014-05-02 09:26:09 -07:00
H. Peter Anvin
e1fe9ed8d2 x86, espfix: Move espfix definitions into a separate header file
Sparse warns that the percpu variables aren't declared before they are
defined.  Rather than hacking around it, move espfix definitions into
a proper header file.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-01 14:16:15 -07:00
H. Peter Anvin
246f2d2ee1 x86-32, espfix: Remove filter for espfix32 due to race
It is not safe to use LAR to filter when to go down the espfix path,
because the LDT is per-process (rather than per-thread) and another
thread might change the descriptors behind our back.  Fortunately it
is always *safe* (if a bit slow) to go down the espfix path, and a
32-bit LDT stack segment is extremely rare.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Cc: <stable@vger.kernel.org> # consider after upstream merge
2014-04-30 14:14:49 -07:00
H. Peter Anvin
3891a04aaf x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack
The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer.  This
causes some 16-bit software to break, but it also leaks kernel state
to user space.  We have a software workaround for that ("espfix") for
the 32-bit kernel, but it relies on a nonzero stack segment base which
is not available in 64-bit mode.

In checkin:

    b3b42ac2cb x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
the logic that 16-bit support is crippled on 64-bit kernels anyway (no
V86 support), but it turns out that people are doing stuff like
running old Win16 binaries under Wine and expect it to work.

This works around this by creating percpu "ministacks", each of which
is mapped 2^16 times 64K apart.  When we detect that the return SS is
on the LDT, we copy the IRET frame to the ministack and use the
relevant alias to return to userspace.  The ministacks are mapped
readonly, so if IRET faults we promote #GP to #DF which is an IST
vector and thus has its own stack; we then do the fixup in the #DF
handler.

(Making #GP an IST exception would make the msr_safe functions unsafe
in NMI/MC context, and quite possibly have other effects.)

Special thanks to:

- Andy Lutomirski, for the suggestion of using very small stack slots
  and copy (as opposed to map) the IRET frame there, and for the
  suggestion to mark them readonly and let the fault promote to #DF.
- Konrad Wilk for paravirt fixup and testing.
- Borislav Petkov for testing help and useful comments.

Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrew Lutomriski <amluto@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dirk Hohndel <dirk@hohndel.org>
Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
Cc: comex <comexk@gmail.com>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <stable@vger.kernel.org> # consider after upstream merge
2014-04-30 14:14:28 -07:00
Thomas Gleixner
62a08ae2a5 genirq: x86: Ensure that dynamic irq allocation does not conflict
On x86 the allocation of irq descriptors may allocate interrupts which
are in the range of the GSI interrupts. That's wrong as those
interrupts are hardwired and we don't have the irq domain translation
like PPC. So one of these interrupts can be hooked up later to one of
the devices which are hard wired to it and the io_apic init code for
that particular interrupt line happily reuses that descriptor with a
completely different configuration so hell breaks lose.

Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs,
except for a few usage sites which have not yet blown up in our face
for whatever reason. But for drivers which need an irq range, like the
GPIO drivers, we have no limit in place and we don't want to expose
such a detail to a driver.

To cure this introduce a function which an architecture can implement
to impose a lower bound on the dynamic interrupt allocations.

Implement it for x86 and set the lower bound to nr_gsi_irqs, which is
the end of the hardwired interrupt space, so all dynamic allocations
happen above.

That not only allows the GPIO driver to work sanely, it also protects
the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and
htirq code. They need to be cleaned up as well, but that's a separate
issue.

Reported-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Krogerus Heikki <heikki.krogerus@intel.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-28 12:20:00 +02:00
Bandan Das
fe2b201b3b KVM: x86: Check for host supported fields in shadow vmcs
We track shadow vmcs fields through two static lists,
one for read only and another for r/w fields. However, with
addition of new vmcs fields, not all fields may be supported on
all hosts. If so, copy_vmcs12_to_shadow() trying to vmwrite on
unsupported hosts will result in a vmwrite error. For example, commit
36be0b9deb introduced GUEST_BNDCFGS, which is not supported
by all processors. Filter out host unsupported fields before
letting guests use shadow vmcs

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-28 11:14:51 +02:00
Oren Twaig
39025ba382 x86/vsmp: Fix irq routing
Correct IRQ routing in case a vSMP box is detected
but the  Interrupt Routing Comply (IRC) value is set to
"comply", which leads to incorrect IRQ routing.

Before the patch:

When a vSMP box was detected and IRC was set to "comply",
users (and the kernel) couldn't effectively set the
destination of the IRQs. This is because the hook inside
vsmp_64.c always setup all CPUs as the IRQ destination using
cpumask_setall() as the return value for IRQ allocation mask.
Later, this "overrided" mask caused the kernel to set the IRQ
destination to the lowest online CPU in the mask (CPU0 usually).

After the patch:

When the IRC is set to "comply", users (and the kernel) can control
the destination of the IRQs as we will not be changing the
default "apic->vector_allocation_domain".

Signed-off-by: Oren Twaig <oren@scalemp.com>
Acked-by: Shai Fultheim <shai@scalemp.com>
Link: http://lkml.kernel.org/r/1398669697-2123-1-git-send-email-oren@scalemp.com
[ Minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-28 09:27:34 +02:00
Stephane Eranian
9f7ff8931e perf/x86: Fix RAPL rdmsrl_safe() usage
This patch fixes a bug introduced by:

  2422365780 ("perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU")

The rdmsrl_safe() function returns 0 on success.
The current code was failing to detect the RAPL PMU
on real hardware  (missing /sys/devices/power) because
the return value of rdmsrl_safe() was misinterpreted.

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Venkatesh Srinivas <venkateshs@google.com>
Cc: peterz@infradead.org
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20140423170418.GA12767@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24 08:12:41 +02:00
Linus Torvalds
39bfe90706 Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vdso fix from Peter Anvin:
 "This is a single build fix for building with gold as opposed to GNU
  ld.  It got queued up separately and was expected to be pushed during
  the merge window, but it got left behind"

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, vdso: Make the vdso linker script compatible with Gold
2014-04-22 09:09:06 -07:00
Behan Webster
8f2dd677be x86: LLVMLinux: Wrap -mno-80387 with cc-option
Wrap -mno-80387 gcc options with cc-option so they don't break
clang.

Signed-off-by: Behan Webster <behanw@converseincode.com>
Cc: torvalds@linux-foundation.org
Cc: dwmw2@infradead.org
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/1398145227-25053-1-git-send-email-behanw@converseincode.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-22 11:41:16 +02:00
Linus Torvalds
6d4596905b Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
 "This fixes the preemption-count imbalance crash reported by Owen
  Kibel"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Fix CMCI preemption bugs
2014-04-19 10:41:43 -07:00
Linus Torvalds
8de3f7a705 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two kernel side fixes:

   - an Intel uncore PMU driver potential crash fix
   - a kprobes/perf-call-graph interaction fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
  kprobes/x86: Fix page-fault handling logic
2014-04-19 10:40:11 -07:00
Venkatesh Srinivas
2422365780 perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
CPUs which should support the RAPL counters according to
Family/Model/Stepping may still issue #GP when attempting to access
the RAPL MSRs. This may happen when Linux is running under KVM and
we are passing-through host F/M/S data, for example. Use rdmsrl_safe
to first access the RAPL_POWER_UNIT MSR; if this fails, do not
attempt to use this PMU.

Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com
Cc: zheng.z.yan@intel.com
Cc: eranian@google.com
Cc: ak@linux.intel.com
Cc: linux-kernel@vger.kernel.org
[ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 12:14:26 +02:00
Linus Torvalds
88764e0a3e Xen regression and bug fixes for 3.15-rc1.
- Fix completely broken 32-bit PV guests caused by x86 refactoring
   32-bit thread_info.
 - Only enable ticketlock slow path on Xen (not bare metal).
 - Fix two bugs with PV guests not shutting down when requested.
 - Fix a minor memory leak in xen-pciback error path.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQEcBAABAgAGBQJTT/hQAAoJEFxbo/MsZsTR6sMIAJs7mJXSqDQn3Z8O+TemRa53
 p92ZomTNYALjUMglXcxJ2Zua6IsZMWdu7jcV1GoXC70V4YLmUs8KaBgZmI5ayUQy
 bBpK+6WIAJyBkJdNH5fK3wggJ2UZjw0/twPNgd9gACwjUiYhx8iHN/hTGvu4qPBJ
 MGAIlg6wdnGwRydi72uk9Am/xpebEdQy4DRD20vjwA/qUkT4uHVv/AA4hc4AK29w
 ToK8qFSisgAlahcmq8/T4+OBFEKz78b9dQcdsGWyAk0ofWILfwD1l53xhzUin25s
 JUVevWhhLCKRZBOq4Ykc5qyqnLff4m56rm/THQ6f0oRdJn/OR+SWOImda2Qqmvs=
 =Gxpq
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen fixes from David Vrabel:
 "Xen regression and bug fixes for 3.15-rc1:

   - fix completely broken 32-bit PV guests caused by x86 refactoring
     32-bit thread_info.
   - only enable ticketlock slow path on Xen (not bare metal)
   - fix two bugs with PV guests not shutting down when requested
   - fix a minor memory leak in xen-pciback error path"

* tag 'stable/for-linus-3.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/manage: Poweroff forcefully if user-space is not yet up.
  xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
  xen/spinlock: Don't enable them unconditionally.
  xen-pciback: silence an unwanted debug printk
  xen: fix memory leak in __xen_pcibk_add_pci_dev()
  x86/xen: Fix 32-bit PV guests's usage of kernel_stack
2014-04-17 10:54:07 -07:00
Masami Hiramatsu
6381c24cd6 kprobes/x86: Fix page-fault handling logic
Current kprobes in-kernel page fault handler doesn't
expect that its single-stepping can be interrupted by
an NMI handler which may cause a page fault(e.g. perf
with callback tracing).

In that case, the page-fault handled by kprobes and it
misunderstands the page-fault has been caused by the
single-stepping code and tries to recover IP address
to probed address.

But the truth is the page-fault has been caused by the
NMI handler, and do_page_fault failes to handle real
page fault because the IP address is modified and
causes Kernel BUGs like below.

 ----
 [ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
 [ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40

To handle this correctly, I fixed the kprobes fault
handler to ensure the faulted ip address is its own
single-step buffer instead of checking current kprobe
state.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Sandeepa Prabhu <sandeepa.prabhu@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fche@redhat.com
Cc: systemtap@sourceware.org
Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-17 10:57:02 +02:00
Ingo Molnar
ea431643d6 x86/mce: Fix CMCI preemption bugs
The following commit:

  27f6c573e0 ("x86, CMCI: Add proper detection of end of CMCI storms")

Added two preemption bugs:

 - machine_check_poll() does a get_cpu_var() without a matching
   put_cpu_var(), which causes preemption imbalance and crashes upon
   bootup.

 - it does percpu ops without disabling preemption. Preemption is not
   disabled due to the mistaken use of a raw spinlock.

To fix these bugs fix the imbalance and change
cmci_discover_lock to a regular spinlock.

Reported-by: Owen Kibel <qmewlo@gmail.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Chen, Gong <gong.chen@linux.intel.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Todorov <atodorov@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
--
 arch/x86/kernel/cpu/mcheck/mce.c       |    4 +---
 arch/x86/kernel/cpu/mcheck/mce_intel.c |   18 +++++++++---------
 2 files changed, 10 insertions(+), 12 deletions(-)
2014-04-17 10:28:42 +02:00