LAM is supported only in 64-bit mode and applies only addresses used for data
accesses. In 64-bit mode, linear address have 64 bits. LAM is applied to 64-bit
linear address and allow software to use high bits for metadata.
LAM supports configurations that differ regarding which pointer bits are masked
and can be used for metadata.
LAM includes following mode:
- LAM_U57, pointer bits in positions 62:57 are masked (LAM width 6),
allows bits 62:57 of a user pointer to be used as metadata.
There are some arch_prctls:
ARCH_ENABLE_TAGGED_ADDR: enable LAM mode, mask high bits of a user pointer.
ARCH_GET_UNTAG_MASK: get current untagged mask.
ARCH_GET_MAX_TAG_BITS: the maximum tag bits user can request. zero if LAM
is not supported.
The LAM mode is for pre-process, a process has only one chance to set LAM mode.
But there is no API to disable LAM mode. So all of test cases are run under
child process.
Functions of this test:
MALLOC
- LAM_U57 masks bits 57:62 of a user pointer. Process on user space
can dereference such pointers.
- Disable LAM, dereference a pointer with metadata above 48 bit or 57 bit
lead to trigger SIGSEGV.
TAG_BITS
- Max tag bits of LAM_U57 is 6.
Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230312112612.31869-13-kirill.shutemov%40linux.intel.com
Use $(KHDR_INCLUDES) as lookup path for kernel headers. This prevents
building against kernel headers from the build environment in scenarios
where kernel headers are installed into a specific output directory
(O=...).
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <stable@vger.kernel.org> # 5.18+
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
In order to successfully build all these 32bit tests, these 32bit gcc
and glibc packages, named gcc-32bit and glibc-devel-static-32bit on SUSE,
need to be installed.
This patch added this information in warn_32bit_failure.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The $(CC) variable used in Makefiles could contain several arguments
such as "ccache gcc". These need to be passed as a single string to
check_cc.sh, otherwise only the first argument will be used as the
compiler command. Without quotes, the $(CC) variable is passed as
distinct arguments which causes the script to fail to build trivial
programs.
Fix this by adding quotes around $(CC) when calling check_cc.sh to pass
the whole string as a single argument to the script even if it has
several words such as "ccache gcc".
Link: https://lkml.kernel.org/r/d0d460d7be0107a69e3c52477761a6fe694c1840.1646991629.git.guillaume.tucker@collabora.com
Fixes: e9886ace22 ("selftests, x86: Rework x86 target architecture detection")
Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Tested-by: "kernelci.org bot" <bot@kernelci.org>
Reviewed-by: Guenter Roeck <groeck@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
AMX TILEDATA is a very large XSAVE feature. It could have caused
nasty XSAVE buffer space waste in two places:
* Signal stacks
* Kernel task_struct->fpu buffers
To avoid this waste, neither of these buffers have AMX state by
default. The non-default features are called "dynamic" features.
There is an arch_prctl(ARCH_REQ_XCOMP_PERM) which allows a task
to declare that it wants to use AMX or other "dynamic" XSAVE
features. This arch_prctl() ensures that sufficient sigaltstack
space is available before it will succeed. It also expands the
task_struct buffer.
Functions of this test:
* Test arch_prctl(ARCH_REQ_XCOMP_PERM). Ensure that it checks for
proper sigaltstack sizing and that the sizing is enforced for
future sigaltstack calls.
* Ensure that ARCH_REQ_XCOMP_PERM is inherited across fork()
* Ensure that TILEDATA use before the prctl() is fatal
* Ensure that TILEDATA is cleared across fork()
Note: Generally, compiler support is needed to do something with
AMX. Instead, directly load AMX state from userspace with a
plain XSAVE. Do not depend on the compiler.
[ dhansen: bunches of cleanups ]
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211026122524.7BEDAA95@davehans-spike.ostc.intel.com
This is very heavily based on some code from Thomas Gleixner. On a system
without XSAVES, it triggers the WARN_ON():
Bad FPU state detected at copy_kernel_to_fpregs+0x2f/0x40, reinitializing FPU registers.
[ bp: Massage in nitpicks. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rik van Riel <riel@surriel.com>
Link: https://lkml.kernel.org/r/20210608144346.234764986@linutronix.de
The test measures the kernel's signal delivery with different (enough vs.
insufficient) stack sizes.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Len Brown <len.brown@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20210518200320.17239-7-chang.seok.bae@intel.com
Move test_vdso from x86 to the vDSO test suite.
Suggested-by: Andy Lutomirski <luto@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
this has been brought into a shape which is maintainable and actually
works.
This final version was done by Sasha Levin who took it up after Intel
dropped the ball. Sasha discovered that the SGX (sic!) offerings out there
ship rogue kernel modules enabling FSGSBASE behind the kernels back which
opens an instantanious unpriviledged root hole.
The FSGSBASE instructions provide a considerable speedup of the context
switch path and enable user space to write GSBASE without kernel
interaction. This enablement requires careful handling of the exception
entries which go through the paranoid entry path as they cannot longer rely
on the assumption that user GSBASE is positive (as enforced via prctl() on
non FSGSBASE enabled systemn). All other entries (syscalls, interrupts and
exceptions) can still just utilize SWAPGS unconditionally when the entry
comes from user space. Converting these entries to use FSGSBASE has no
benefit as SWAPGS is only marginally slower than WRGSBASE and locating and
retrieving the kernel GSBASE value is not a free operation either. The real
benefit of RD/WRGSBASE is the avoidance of the MSR reads and writes.
The changes come with appropriate selftests and have held up in field
testing against the (sanitized) Graphene-SGX driver.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8pGnoTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoTYJD/9873GkwvGcc/Vq/dJH1szGTgFftPyZ
c/Y9gzx7EGBPLo25BS820L+ZlynzXHDxExKfCEaD10TZfe5XIc1vYNR0J74M2NmK
IBgEDstJeW93ai+rHCFRXIevhpzU4GgGYJ1MeeOgbVMN3aGU1g6HfzMvtF0fPn8Y
n6fsLZa43wgnoTdjwjjikpDTrzoZbaL1mbODBzBVPAaTbim7IKKTge6r/iCKrOjz
Uixvm3g9lVzx52zidJ9kWa8esmbOM1j0EPe7/hy3qH9DFo87KxEzjHNH3T6gY5t6
NJhRAIfY+YyTHpPCUCshj6IkRudE6w/qjEAmKP9kWZxoJrvPCTWOhCzelwsFS9b9
gxEYfsnaKhsfNhB6fi0PtWlMzPINmEA7SuPza33u5WtQUK7s1iNlgHfvMbjstbwg
MSETn4SG2/ZyzUrSC06lVwV8kh0RgM3cENc/jpFfIHD0vKGI3qfka/1RY94kcOCG
AeJd0YRSU2RqL7lmxhHyG8tdb8eexns41IzbPCLXX2sF00eKNkVvMRYT2mKfKLFF
q8v1x7yuwmODdXfFR6NdCkGm9IU7wtL6wuQ8Nhu9UraFmcXo6X6FLJC18FqcvSb9
jvcRP4XY/8pNjjf44JB8yWfah0xGQsaMIKQGP4yLv4j6Xk1xAQKH1MqcC7l1D2HN
5Z24GibFqSK/vA==
=QaAN
-----END PGP SIGNATURE-----
Merge tag 'x86-fsgsbase-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fsgsbase from Thomas Gleixner:
"Support for FSGSBASE. Almost 5 years after the first RFC to support
it, this has been brought into a shape which is maintainable and
actually works.
This final version was done by Sasha Levin who took it up after Intel
dropped the ball. Sasha discovered that the SGX (sic!) offerings out
there ship rogue kernel modules enabling FSGSBASE behind the kernels
back which opens an instantanious unpriviledged root hole.
The FSGSBASE instructions provide a considerable speedup of the
context switch path and enable user space to write GSBASE without
kernel interaction. This enablement requires careful handling of the
exception entries which go through the paranoid entry path as they
can no longer rely on the assumption that user GSBASE is positive (as
enforced via prctl() on non FSGSBASE enabled systemn).
All other entries (syscalls, interrupts and exceptions) can still just
utilize SWAPGS unconditionally when the entry comes from user space.
Converting these entries to use FSGSBASE has no benefit as SWAPGS is
only marginally slower than WRGSBASE and locating and retrieving the
kernel GSBASE value is not a free operation either. The real benefit
of RD/WRGSBASE is the avoidance of the MSR reads and writes.
The changes come with appropriate selftests and have held up in field
testing against the (sanitized) Graphene-SGX driver"
* tag 'x86-fsgsbase-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86/fsgsbase: Fix Xen PV support
x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase
selftests/x86/fsgsbase: Add a missing memory constraint
selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test
selftests/x86: Add a syscall_arg_fault_64 test for negative GSBASE
selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE
selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write
Documentation/x86/64: Add documentation for GS/FS addressing mode
x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit
x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit
x86/entry/64: Introduce the FIND_PERCPU_BASE macro
x86/entry/64: Switch CR3 before SWAPGS in paranoid entry
x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation
x86/process/64: Use FSGSBASE instructions on thread copy and ptrace
x86/process/64: Use FSBSBASE in switch_to() if available
x86/process/64: Make save_fsgs_for_kvm() ready for FSGSBASE
x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions
x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
...
Debuggers expect that doing PTRACE_GETREGS, then poking at a tracee
and maybe letting it run for a while, then doing PTRACE_SETREGS will
put the tracee back where it was. In the specific case of a 32-bit
tracer and tracee, the PTRACE_GETREGS/SETREGS data structure doesn't
have fs_base or gs_base fields, so FSBASE and GSBASE fields are
never stored anywhere. Everything used to still work because
nonzero FS or GS would result full reloads of the segment registers
when the tracee resumes, and the bases associated with FS==0 or
GS==0 are irrelevant to 32-bit code.
Adding FSGSBASE support broke this: when FSGSBASE is enabled, FSBASE
and GSBASE are now restored independently of FS and GS for all tasks
when context-switched in. This means that, if a 32-bit tracer
restores a previous state using PTRACE_SETREGS but the tracee's
pre-restore and post-restore bases don't match, then the tracee is
resumed with the wrong base.
Fix it by explicitly loading the base when a 32-bit tracer pokes FS
or GS on a 64-bit kernel.
Also add a test case.
Fixes: 673903495c ("x86/process/64: Use FSBSBASE in switch_to() if available")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/229cc6a50ecbb701abd50fe4ddaf0eda888898cd.1593192140.git.luto@kernel.org
There are several copies of get_eflags() and set_eflags() and they all are
buggy. Consolidate them and fix them. The fixes are:
Add memory clobbers. These are probably unnecessary but they make sure
that the compiler doesn't move something past one of these calls when it
shouldn't.
Respect the redzone on x86_64. There has no failure been observed related
to this, but it's definitely a bug.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/982ce58ae8dea2f1e57093ee894760e35267e751.1593191971.git.luto@kernel.org
Patch series "selftests, powerpc, x86: Memory Protection Keys", v19.
Memory protection keys enables an application to protect its address space
from inadvertent access by its own code.
This feature is now enabled on powerpc and has been available since
4.16-rc1. The patches move the selftests to arch neutral directory and
enhance their test coverage.
Tested on powerpc64 and x86_64 (Skylake-SP).
This patch (of 24):
Move selftest files from tools/testing/selftests/x86/ to
tools/testing/selftests/vm/.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/14d25194c3e2e652e0047feec4487e269e76e8c9.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 entry updates from Ingo Molnar:
"This contains x32 and compat syscall improvements, the biggest one of
which splits x32 syscalls into their own table, which allows new
syscalls to share the x32 and x86-64 number - which turns the
512-547 special syscall numbers range into a legacy wart that won't be
extended going forward"
* 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/syscalls: Split the x32 syscalls into their own table
x86/syscalls: Disallow compat entries for all types of 64-bit syscalls
x86/syscalls: Use the compat versions of rt_sigsuspend() and rt_sigprocmask()
x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long
MPX is being removed from the kernel due to a lack of support in the
toolchain going forward (gcc).
This is the smallest possible patch to fix some issues that have been
reported around running the MPX selftests. It it would also have been part
of any removal series, it is offered first.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190705175318.784C233E@viggo.jf.intel.com
For unfortunate historical reasons, the x32 syscalls and the x86_64
syscalls are not all numbered the same. As an example, ioctl() is nr 16 on
x86_64 but 514 on x32.
This has potentially nasty consequences, since it means that there are two
valid RAX values to do ioctl(2) and two invalid RAX values. The valid
values are 16 (i.e. ioctl(2) using the x86_64 ABI) and (514 | 0x40000000)
(i.e. ioctl(2) using the x32 ABI).
The invalid values are 514 and (16 | 0x40000000). 514 will enter the
"COMPAT_SYSCALL_DEFINE3(ioctl, ...)" entry point with in_compat_syscall()
and in_x32_syscall() returning false, whereas (16 | 0x40000000) will enter
the native entry point with in_compat_syscall() and in_x32_syscall()
returning true. Both are bogus, and both will exercise code paths in the
kernel and in any running seccomp filters that really ought to be
unreachable.
Splitting out the x32 syscalls into their own tables, allows both bogus
invocations to return -ENOSYS. I've checked glibc, musl, and Bionic, and
all of them appear to call syscalls with their correct numbers, so this
change should have no effect on them.
There is an added benefit going forward: new syscalls that need special
handling on x32 can share the same number on x32 and x86_64. This means
that the special syscall range 512-547 can be treated as a legacy wart
instead of something that may need to be extended in the future.
Also add a selftest to verify the new behavior.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/208024256b764312598f014ebfb0a42472c19354.1562185330.git.luto@kernel.org
Make sure that both variants of the nasty TF-in-compat-syscall are
exercised regardless of what vendor's CPU is running the tests.
Also change the intentional signal after SYSCALL to use ud2, which
is a lot more comprehensible.
This crashes the kernel due to an FSGSBASE bug right now.
This test *also* detects a bug in KVM when run on an Intel host. KVM
people, feel free to use it to help debug. There's a bunch of code in this
test to warn instead of going into an infinite looping when the bug gets
triggered.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "BaeChang Seok" <chang.seok.bae@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: "Bae, Chang Seok" <chang.seok.bae@intel.com>
Link: https://lkml.kernel.org/r/5f5de10441ab2e3005538b4c33be9b1965d1bb63.1562035429.git.luto@kernel.org
Some toolchains need -no-pie to build all tests, others do not support
the -no-pie flag at all. Therefore, add another test for the
availability of the flag.
This amends commit 3346a6a4e5
("selftests: x86: sysret_ss_attrs doesn't build on a PIE build").
Signed-off-by: Florian Weimer <fweimer@redhat.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
This exercises a nasty corner case of the x86 ISA.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/67e08b69817171da8026e0eb3af0214b06b4d74f.1525800455.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 PTI and Spectre related fixes and updates from Ingo Molnar:
"Here's the latest set of Spectre and PTI related fixes and updates:
Spectre:
- Add entry code register clearing to reduce the Spectre attack
surface
- Update the Spectre microcode blacklist
- Inline the KVM Spectre helpers to get close to v4.14 performance
again.
- Fix indirect_branch_prediction_barrier()
- Fix/improve Spectre related kernel messages
- Fix array_index_nospec_mask() asm constraint
- KVM: fix two MSR handling bugs
PTI:
- Fix a paranoid entry PTI CR3 handling bug
- Fix comments
objtool:
- Fix paranoid_entry() frame pointer warning
- Annotate WARN()-related UD2 as reachable
- Various fixes
- Add Add Peter Zijlstra as objtool co-maintainer
Misc:
- Various x86 entry code self-test fixes
- Improve/simplify entry code stack frame generation and handling
after recent heavy-handed PTI and Spectre changes. (There's two
more WIP improvements expected here.)
- Type fix for cache entries
There's also some low risk non-fix changes I've included in this
branch to reduce backporting conflicts:
- rename a confusing x86_cpu field name
- de-obfuscate the naming of single-TLB flushing primitives"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
x86/entry/64: Fix CR3 restore in paranoid_exit()
x86/cpu: Change type of x86_cache_size variable to unsigned int
x86/spectre: Fix an error message
x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
selftests/x86/mpx: Fix incorrect bounds with old _sigfault
x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
x86/speculation: Add <asm/msr-index.h> dependency
nospec: Move array_index_nospec() parameter checking into separate macro
x86/speculation: Fix up array_index_nospec_mask() asm constraint
x86/debug: Use UD2 for WARN()
x86/debug, objtool: Annotate WARN()-related UD2 as reachable
objtool: Fix segfault in ignore_unreachable_insn()
selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems
selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c
selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.c
selftests/x86: Fix build bug caused by the 5lvl test which has been moved to the VM directory
selftests/x86/pkeys: Remove unused functions
selftests/x86: Clean up and document sscanf() usage
selftests/x86: Fix vDSO selftest segfault for vsyscall=none
x86/entry/64: Remove the unused 'icebp' macro
...
The ldt_gdt and ptrace_syscall selftests, even in their 64-bit variant, use
hard-coded 32-bit syscall numbers and call "int $0x80".
This will fail on 64-bit systems with CONFIG_IA32_EMULATION=y disabled.
Therefore, do not build these tests if we cannot build 32-bit binaries
(which should be a good approximation for CONFIG_IA32_EMULATION=y being enabled).
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kselftest@vger.kernel.org
Cc: shuah@kernel.org
Link: http://lkml.kernel.org/r/20180211111013.16888-6-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On 64-bit builds, we should not rely on "int $0x80" working (it only does if
CONFIG_IA32_EMULATION=y is enabled). To keep the "Set TF and check int80"
test running on 64-bit installs with CONFIG_IA32_EMULATION=y enabled, build
this test only if we can also build 32-bit binaries (which should be a
good approximation for that).
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kselftest@vger.kernel.org
Cc: shuah@kernel.org
Link: http://lkml.kernel.org/r/20180211111013.16888-5-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 pti updates from Thomas Gleixner:
"This contains:
- a PTI bugfix to avoid setting reserved CR3 bits when PCID is
disabled. This seems to cause issues on a virtual machine at least
and is incorrect according to the AMD manual.
- a PTI bugfix which disables the perf BTS facility if PTI is
enabled. The BTS AUX buffer is not globally visible and causes the
CPU to fault when the mapping disappears on switching CR3 to user
space. A full fix which restores BTS on PTI is non trivial and will
be worked on.
- PTI bugfixes for EFI and trusted boot which make sure that the user
space visible page table entries have the NX bit cleared
- removal of dead code in the PTI pagetable setup functions
- add PTI documentation
- add a selftest for vsyscall to verify that the kernel actually
implements what it advertises.
- a sysfs interface to expose vulnerability and mitigation
information so there is a coherent way for users to retrieve the
status.
- the initial spectre_v2 mitigations, aka retpoline:
+ The necessary ASM thunk and compiler support
+ The ASM variants of retpoline and the conversion of affected ASM
code
+ Make LFENCE serializing on AMD so it can be used as speculation
trap
+ The RSB fill after vmexit
- initial objtool support for retpoline
As I said in the status mail this is the most of the set of patches
which should go into 4.15 except two straight forward patches still on
hold:
- the retpoline add on of LFENCE which waits for ACKs
- the RSB fill after context switch
Both should be ready to go early next week and with that we'll have
covered the major holes of spectre_v2 and go back to normality"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
x86,perf: Disable intel_bts when PTI
security/Kconfig: Correct the Documentation reference for PTI
x86/pti: Fix !PCID and sanitize defines
selftests/x86: Add test_vsyscall
x86/retpoline: Fill return stack buffer on vmexit
x86/retpoline/irq32: Convert assembler indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/spectre: Add boot time option to select Spectre v2 mitigation
x86/retpoline: Add initial retpoline support
objtool: Allow alternatives to be ignored
objtool: Detect jumps to retpoline thunks
x86/pti: Make unpoison of pgd for trusted boot work for real
x86/alternatives: Fix optimize_nops() checking
sysfs/cpu: Fix typos in vulnerability documentation
x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
...
This tests that the vsyscall entries do what they're expected to do.
It also confirms that attempts to read the vsyscall page behave as
expected.
If changes are made to the vsyscall code or its memory map handling,
running this test in all three of vsyscall=none, vsyscall=emulate,
and vsyscall=native are helpful.
(Because it's easy, this also compares the vsyscall results to their
vDSO equivalents.)
Note to KAISER backporters: please test this under all three
vsyscall modes. Also, in the emulate and native modes, make sure
that test_vsyscall_64 agrees with the command line or config
option as to which mode you're in. It's quite easy to mess up
the kernel such that native mode accidentally emulates
or vice versa.
Greg, etc: please backport this to all your Meltdown-patched
kernels. It'll help make sure the patches didn't regress
vsyscalls.
CSigned-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/2b9c5a174c1d60fd7774461d518aa75598b1d8fd.1515719552.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
5-level paging provides a 56-bit virtual address space for user space
application. But the kernel defaults to mappings below the 47-bit address
space boundary, which is the upper bound for 4-level paging, unless an
application explicitely request it by using a mmap(2) address hint above
the 47-bit boundary. The kernel prevents mappings which spawn across the
47-bit boundary unless mmap(2) was invoked with MAP_FIXED.
Add a self-test that covers the corner cases of the interface and validates
the correctness of the implementation.
[ tglx: Massaged changelog once more ]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20171115143607.81541-2-kirill.shutemov@linux.intel.com
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysret_ss_attrs fails to compile leading x86 test run to fail on systems
configured to build using PIE by default. Add -no-pie fix it.
Relocation might still fail if relocated above 4G. For now this change
fixes the build and runs x86 tests.
tools/testing/selftests/x86$ make
gcc -m64 -o .../tools/testing/selftests/x86/single_step_syscall_64 -O2
-g -std=gnu99 -pthread -Wall single_step_syscall.c -lrt -ldl
gcc -m64 -o .../tools/testing/selftests/x86/sysret_ss_attrs_64 -O2 -g
-std=gnu99 -pthread -Wall sysret_ss_attrs.c thunks.S -lrt -ldl
/usr/bin/ld: /tmp/ccS6pvIh.o: relocation R_X86_64_32S against `.text'
can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
Makefile:49: recipe for target
'.../tools/testing/selftests/x86/sysret_ss_attrs_64' failed
make: *** [.../tools/testing/selftests/x86/sysret_ss_attrs_64] Error 1
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Add override with EXTRA_CLEAN for lib.mk clean to fix the following
warnings from clean target run.
Makefile:44: warning: overriding recipe for target 'clean'
../lib.mk:55: warning: ignoring old recipe for target 'clean'
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
PPC:
* correct assumption about ASDR on POWER9
* fix MMIO emulation on POWER9
x86:
* add a simple test for ioperm
* cleanup TSS
(going through KVM tree as the whole undertaking was caused by VMX's
use of TSS)
* fix nVMX interrupt delivery
* fix some performance counters in the guest
And two cleanup patches.
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJYuu5qAAoJEED/6hsPKofoRAUH/jkx/KFDcw3FggixysWVgRai
iLSbbAZemnSLFSOkOU/t7Bz0fXCUgB0tAcMJd9ow01Dg1zObiTpuUIo6qEPaYHdX
gqtUzlHuyECZEcgK0RXS9kDYLrvw7EFocxnDWQfV91qCZSS6nBSSLF3ST1rNV69W
mUvcZG+MciDcZUe1lTexoswVTh1m7avvozEnQ5OHnZR9yicoXiadBQjzL6yqWoqf
Ml/29zRk5+MvloTudxjkAKm3mh7psW88jNMh37TXbAA7i+Xwl9cU6GLR9mFWstoP
7Ot7ecq9mNAUO3lTIQh7lqvB60LMFznS4IlYK7MbplC3kvJLkfzhTWaN1aGvh90=
=cqHo
-----END PGP SIGNATURE-----
Merge tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Radim Krčmář:
"Second batch of KVM changes for the 4.11 merge window:
PPC:
- correct assumption about ASDR on POWER9
- fix MMIO emulation on POWER9
x86:
- add a simple test for ioperm
- cleanup TSS (going through KVM tree as the whole undertaking was
caused by VMX's use of TSS)
- fix nVMX interrupt delivery
- fix some performance counters in the guest
... and two cleanup patches"
* tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nVMX: Fix pending events injection
x86/kvm/vmx: remove unused variable in segment_base()
selftests/x86: Add a basic selftest for ioperm
x86/asm: Tidy up TSS limit code
kvm: convert kvm.users_count from atomic_t to refcount_t
KVM: x86: never specify a sample period for virtualized in_tx_cp counters
KVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9
KVM: PPC: Book3S HV: Fix software walk of guest process page tables
This doesn't fully exercise the interaction between KVM and ioperm(),
but it does test basic functionality.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
This update consists of:
-- fixes to several existing tests from Stafford Horne
-- cpufreq tests from Viresh Kumar
-- Selftest build and install fixes from Bamvor Jian Zhang
and Michael Ellerman
-- Fixes to protection-keys tests from Dave Hansen
-- Warning fixes from Shuah Khan
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJYsJgxAAoJEAsCRMQNDUMcNCUQALy+jVZV3U1yypLCQinlgbdH
rlh7oKIpGfWGXNe1BQVLS5S+bjil9XDdty+4VOB7x9gfQ6fvea3w0IQhI5CyONmm
hZg/miheZzN5ujqKjfuUQrHzEbEAs+CH0A0sVH+ueptw37roTWhf1ZCSpQBpas5p
XMZrfBI0mQLd9Z3D0G5TSsVjSPcMhKeoYDMGPMCulZuamVMY40XkPcvaYe1Zg1Mj
7nD7Aw6JxxV0tlZwo0n540w8tdx/yQ+49jqhulozCQNL+KmXO8FlM/Jnu1b24/YW
hlu5dvLmi9rAHYEHwqFf5yqZci/50Q+LHuxcxEp3RLxRW+KXJP7c53Kn8eutIwqH
HR03TSA1TRv9b4MvWJs/ULF/EYYtTPUDSinAtNMf4iegXp0BbT7P0eOibF1vj3tz
bcfPB5vi1SxQqLQwCPomUzhlPB4muBu9lHjZ2tI5EKynXXZxN33zugHYqBY0zNPm
7dS+4iXs/phEDlW0j+3BhHQz2of+Q6fSOC/jvgAYGdmqh1aNHl9WpIWfFubuBQhd
fkKJmgpJ1Mk5mBG/dGdCGTryv38tzFLr+n4MJWthfya84cbvk1W0HQQjwmROrIiP
qxC4F1Da6F88mfrpFDKW9LxAwfJFCgSxnYFygRsyzZK/VKdm2CI8yeoY2rt2lyRF
jUdxx7SJ7+71sO1xWcAE
=F3yO
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest update from Shuah Khan:
"This update consists of:
- fixes to several existing tests from Stafford Horne
- cpufreq tests from Viresh Kumar
- Selftest build and install fixes from Bamvor Jian Zhang and Michael
Ellerman
- Fixes to protection-keys tests from Dave Hansen
- Warning fixes from Shuah Khan"
* tag 'linux-kselftest-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (28 commits)
selftests/powerpc: Fix remaining fallout from recent changes
selftests/powerpc: Fix the clean rule since recent changes
selftests: Fix the .S and .S -> .o rules
selftests: Fix the .c linking rule
selftests: Fix selftests build to just build, not run tests
selftests, x86, protection_keys: fix wrong offset in siginfo
selftests, x86, protection_keys: fix uninitialized variable warning
selftest: cpufreq: Update MAINTAINERS file
selftest: cpufreq: Add special tests
selftest: cpufreq: Add support to test cpufreq modules
selftest: cpufreq: Add suspend/resume/hibernate support
selftest: cpufreq: Add support for cpufreq tests
selftests: Add intel_pstate to TARGETS
selftests/intel_pstate: Update makefile to match new style
selftests/intel_pstate: Fix warning on loop index overflow
cpupower: Restore format of frequency-info limit
selftests/futex: Add headers to makefile dependencies
selftests/futex: Add stdio used for logging
selftests: x86 protection_keys remove dead code
selftests: x86 protection_keys fix unused variable compile warnings
...
Pull x86 mm updates from Ingo Molnar:
"A laundry list of changes: KASAN improvements/fixes for ptdump, a
self-test fix, PAT cleanup and wbinvd() avoidance, removal of stale
code and documentation updates"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/ptdump: Add address marker for KASAN shadow region
x86/mm/ptdump: Optimize check for W+X mappings for CONFIG_KASAN=y
x86/mm/pat: Use rb_entry()
x86/mpx: Re-add MPX to selftests Makefile
x86/mm: Remove CONFIG_DEBUG_NX_TEST
x86/mm/cpa: Avoid wbinvd() for PREEMPT
x86/mm: Improve documentation for low-level device I/O functions
Ingo pointed out that the MPX tests were no longer in the selftests
Makefile. It appears that I shot myself in the foot on this one
and accidentally removed them when I added the pkeys tests, probably
from bungling a merge conflict.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 5f23f6d082 ("x86/pkeys: Add self-tests")
Link: http://lkml.kernel.org/r/20170201225629.C3070852@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Enable O and KBUILD_OUTPUT for kselftest. User could compile kselftest
to another directory by passing O or KBUILD_OUTPUT. And O is high
priority than KBUILD_OUTPUT.
Signed-off-by: Bamvor Jian Zhang <bamvor.zhangjian@linaro.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
SYSRET to a noncanonical address will blow up on Intel CPUs. Linux
needs to prevent this from happening in two major cases, and the
criteria will become more complicated when support for larger virtual
address spaces is added.
A fast-path SYSCALL will fall through to the following instruction
using SYSRET without any particular checking. To prevent fall-through
to a noncanonical address, Linux prevents the highest canonical page
from being mapped. This test case checks a variety of possible maximum
addresses to make sure that either we can't map code there or that
SYSCALL fall-through works.
A slow-path system call can return anywhere. Linux needs to make sure
that, if the return address is non-canonical, it won't use SYSRET.
This test cases causes sigreturn() to return to a variety of addresses
(with RCX == RIP) and makes sure that nothing explodes.
Some of this code comes from Kirill Shutemov.
Kirill reported the following output with 5-level paging enabled:
[RUN] sigreturn to 0x800000000000
[OK] Got SIGSEGV at RIP=0x800000000000
[RUN] sigreturn to 0x1000000000000
[OK] Got SIGSEGV at RIP=0x1000000000000
[RUN] sigreturn to 0x2000000000000
[OK] Got SIGSEGV at RIP=0x2000000000000
[RUN] sigreturn to 0x4000000000000
[OK] Got SIGSEGV at RIP=0x4000000000000
[RUN] sigreturn to 0x8000000000000
[OK] Got SIGSEGV at RIP=0x8000000000000
[RUN] sigreturn to 0x10000000000000
[OK] Got SIGSEGV at RIP=0x10000000000000
[RUN] sigreturn to 0x20000000000000
[OK] Got SIGSEGV at RIP=0x20000000000000
[RUN] sigreturn to 0x40000000000000
[OK] Got SIGSEGV at RIP=0x40000000000000
[RUN] sigreturn to 0x80000000000000
[OK] Got SIGSEGV at RIP=0x80000000000000
[RUN] sigreturn to 0x100000000000000
[OK] Got SIGSEGV at RIP=0x100000000000000
[RUN] sigreturn to 0x200000000000000
[OK] Got SIGSEGV at RIP=0x200000000000000
[RUN] sigreturn to 0x400000000000000
[OK] Got SIGSEGV at RIP=0x400000000000000
[RUN] sigreturn to 0x800000000000000
[OK] Got SIGSEGV at RIP=0x800000000000000
[RUN] sigreturn to 0x1000000000000000
[OK] Got SIGSEGV at RIP=0x1000000000000000
[RUN] sigreturn to 0x2000000000000000
[OK] Got SIGSEGV at RIP=0x2000000000000000
[RUN] sigreturn to 0x4000000000000000
[OK] Got SIGSEGV at RIP=0x4000000000000000
[RUN] sigreturn to 0x8000000000000000
[OK] Got SIGSEGV at RIP=0x8000000000000000
[RUN] Trying a SYSCALL that falls through to 0x7fffffffe000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x7ffffffff000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x800000000000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0xfffffffff000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x1000000000000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x1fffffffff000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x2000000000000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x3fffffffff000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x4000000000000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x7fffffffff000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x8000000000000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0xffffffffff000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x10000000000000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x1ffffffffff000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x20000000000000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x3ffffffffff000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x40000000000000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x7ffffffffff000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x80000000000000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0xfffffffffff000
[OK] We survived
[RUN] Trying a SYSCALL that falls through to 0x100000000000000
[OK] mremap to 0xfffffffffff000 failed
[RUN] Trying a SYSCALL that falls through to 0x1fffffffffff000
[OK] mremap to 0x1ffffffffffe000 failed
[RUN] Trying a SYSCALL that falls through to 0x200000000000000
[OK] mremap to 0x1fffffffffff000 failed
[RUN] Trying a SYSCALL that falls through to 0x3fffffffffff000
[OK] mremap to 0x3ffffffffffe000 failed
[RUN] Trying a SYSCALL that falls through to 0x400000000000000
[OK] mremap to 0x3fffffffffff000 failed
[RUN] Trying a SYSCALL that falls through to 0x7fffffffffff000
[OK] mremap to 0x7ffffffffffe000 failed
[RUN] Trying a SYSCALL that falls through to 0x800000000000000
[OK] mremap to 0x7fffffffffff000 failed
[RUN] Trying a SYSCALL that falls through to 0xffffffffffff000
[OK] mremap to 0xfffffffffffe000 failed
[RUN] Trying a SYSCALL that falls through to 0x1000000000000000
[OK] mremap to 0xffffffffffff000 failed
[RUN] Trying a SYSCALL that falls through to 0x1ffffffffffff000
[OK] mremap to 0x1fffffffffffe000 failed
[RUN] Trying a SYSCALL that falls through to 0x2000000000000000
[OK] mremap to 0x1ffffffffffff000 failed
[RUN] Trying a SYSCALL that falls through to 0x3ffffffffffff000
[OK] mremap to 0x3fffffffffffe000 failed
[RUN] Trying a SYSCALL that falls through to 0x4000000000000000
[OK] mremap to 0x3ffffffffffff000 failed
[RUN] Trying a SYSCALL that falls through to 0x7ffffffffffff000
[OK] mremap to 0x7fffffffffffe000 failed
[RUN] Trying a SYSCALL that falls through to 0x8000000000000000
[OK] mremap to 0x7ffffffffffff000 failed
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e70bd9a3f90657ba47b755100a20475d038fa26b.1482808435.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This code should be a good demonstration of how to use the new
system calls as well as how to use protection keys in general.
This code shows how to:
1. Manipulate the Protection Keys Rights User (PKRU) register
2. Set a protection key on memory
3. Fetch and/or modify PKRU from the signal XSAVE state
4. Read the kernel-provided protection key in the siginfo
5. Set up an execute-only mapping
There are currently 13 tests:
test_read_of_write_disabled_region
test_read_of_access_disabled_region
test_write_of_write_disabled_region
test_write_of_access_disabled_region
test_kernel_write_of_access_disabled_region
test_kernel_write_of_write_disabled_region
test_kernel_gup_of_access_disabled_region
test_kernel_gup_write_to_write_disabled_region
test_executing_on_unreadable_memory
test_ptrace_of_child
test_pkey_syscalls_on_non_allocated_pkey
test_pkey_syscalls_bad_args
test_pkey_alloc_exhaust
Each of the tests is run with plain memory (via mmap(MAP_ANON)),
transparent huge pages, and hugetlb.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave@sr71.net>
Cc: mgorman@techsingularity.net
Cc: arnd@arndb.de
Cc: linux-api@vger.kernel.org
Cc: shuahkh@osg.samsung.com
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163024.FC5A0C2D@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Should print this on vDSO remapping success (on new kernels):
[root@localhost ~]# ./test_mremap_vdso_32
AT_SYSINFO_EHDR is 0xf773f000
[NOTE] Moving vDSO: [f773f000, f7740000] -> [a000000, a001000]
[OK]
Or print that mremap() for vDSOs is unsupported:
[root@localhost ~]# ./test_mremap_vdso_32
AT_SYSINFO_EHDR is 0xf773c000
[NOTE] Moving vDSO: [0xf773c000, 0xf773d000] -> [0xf7737000, 0xf7738000]
[FAIL] mremap() of the vDSO does not work on this kernel!
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kselftest@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160628113539.13606-3-dsafonov@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I've had this code for a while, but never submitted it upstream. Now
that Skylake hardware is out in the wild, folks can actually run this
for real. It tests the following:
1. The MPX hardware is enabled by the kernel and doing what it
is supposed to
2. The MPX management code is present and enabled in the kernel
3. MPX Signal handling
4. The MPX bounds table population code (on-demand population)
5. The MPX bounds table unmapping code (kernel-initiated freeing
when unused)
This has also caught bugs in the XSAVE code because MPX state is
saved/restored with XSAVE.
I'm submitting it now because it would have caught the recent issues
with the compat_siginfo code not being properly augmented when new
siginfo state is added.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160608172535.5B40B0EE@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This catches two distinct bugs in the current code. I'll fix them.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/7e5941148d1e2199f070dadcdf7355959f5f8e85.1460075211.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This exercises two cases that are known to be buggy on Xen PV right
now.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/61afe904c95c92abb29cd075b51e10e7feb0f774.1458162709.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is a second attempt to make the improvements from c6f2062935
("x86/signal/64: Fix SS handling for signals delivered to 64-bit
programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
support for SS in the 64-bit signal context").
This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for
all 64-bit signals (including x32). It indicates that the saved SS
field is valid and that the kernel supports the new behavior.
The goal is to fix a problems with signal handling in 64-bit tasks:
SS wasn't saved in the 64-bit signal context, making it awkward to
determine what SS was at the time of signal delivery and making it
impossible to return to a non-flat SS (as calling sigreturn clobbers
SS).
This also made it extremely difficult for 64-bit tasks to return to
fully-defined 16-bit contexts, because only the kernel can easily do
espfix64, but sigreturn was unable to set a non-flag SS:ESP.
(DOSEMU has a monstrous hack to partially work around this
limitation.)
If we could go back in time, the correct fix would be to make 64-bit
signals work just like 32-bit signals with respect to SS: save it
in signal context, reset it when delivering a signal, and restore
it in sigreturn.
Unfortunately, doing that (as I tried originally) breaks DOSEMU:
DOSEMU wouldn't reset the signal context's SS when clearing the LDT
and changing the saved CS to 64-bit mode, since it predates the SS
context field existing in the first place.
This patch is a bit more complicated, and it tries to balance a
bunch of goals. It makes most cases of changing ucontext->ss during
signal handling work as expected.
I do this by special-casing the interesting case. On sigreturn,
ucontext->ss will be honored by default, unless the ucontext was
created from scratch by an old program and had a 64-bit CS
(unfortunately, CRIU can do this) or was the result of changing a
32-bit signal context to 64-bit without resetting SS (as DOSEMU
does).
For the benefit of new 64-bit software that uses segmentation (new
versions of DOSEMU might), the new behavior can be detected with a
new ucontext flag UC_SIGCONTEXT_SS.
To avoid compilation issues, __pad0 is left as an alias for ss in
ucontext.
The nitty-gritty details are documented in the header file.
This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests,
as the kernel change allows both of them to pass.
Tested-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org
[ Small readability edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This checks that ELF binaries are started with an appropriately
blank register state.
( There's currently a nasty special case in the entry asm to
arrange for this. I'm planning on removing the special case,
and this will help make sure I don't break it. )
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ef54f8d066b30a3eb36bbf26300eebb242185700.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Previously the Makefile supported 32-bit-only tests and tests
that were 32-bit and 64-bit. This adds the support for tests
that are only built as 64-bit binaries.
There aren't any yet, but there might be a few some day.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuahkhan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/99789bfe65706e6df32cc7e13f656e8c9fa92031.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The vdso-based sigreturn mechanism is fragile and isn't used by
modern glibc so, if we break it, we'll only notice when someone
tests an unusual libc.
Add an explicit selftest.
[ I wrote this while debugging a Bionic breakage -- my first guess
was that I had somehow messed up sigreturn. I've caused problems in
that code before, and it's really easy to fail to notice it because
there's nothing on a modern distro that needs vdso-based sigreturn. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/32946d714156879cd8e5d8eab044cd07557ed558.1452628504.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
ldt_gdt.c relies on cross-cpu invalidation of SS to do one of
its tests. On 32-bit builds, this works fine, but on 64-bit
builds, it only works if the kernel has proper SS sigcontext
handling for 64-bit user programs.
Since the SS fixes are currently reverted, restrict the test
case to 32 bits for now.
In principle, I could change the test to use a different segment
register, but it would be messy: CS can't point to the LDT for
64-bit code, and the other registers don't result in immediate
faults because they aren't reloaded on kernel -> user
transitions.
When we fix sigcontext (in 4.6?), we can revert this.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/231591d9122d282402d8f53175134f8db5b3bc73.1452561752.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>