Pull arch/sh updates from Rich Felker:
"Cleanup, SECCOMP_FILTER support, message printing fixes, and other
changes to arch/sh"
* tag 'sh-for-5.9' of git://git.libc.org/linux-sh: (34 commits)
sh: landisk: Add missing initialization of sh_io_port_base
sh: bring syscall_set_return_value in line with other architectures
sh: Add SECCOMP_FILTER
sh: Rearrange blocks in entry-common.S
sh: switch to copy_thread_tls()
sh: use the generic dma coherent remap allocator
sh: don't allow non-coherent DMA for NOMMU
dma-mapping: consolidate the NO_DMA definition in kernel/dma/Kconfig
sh: unexport register_trapped_io and match_trapped_io_handler
sh: don't include <asm/io_trapped.h> in <asm/io.h>
sh: move the ioremap implementation out of line
sh: move ioremap_fixed details out of <asm/io.h>
sh: remove __KERNEL__ ifdefs from non-UAPI headers
sh: sort the selects for SUPERH alphabetically
sh: remove -Werror from Makefiles
sh: Replace HTTP links with HTTPS ones
arch/sh/configs: remove obsolete CONFIG_SOC_CAMERA*
sh: stacktrace: Remove stacktrace_ops.stack()
sh: machvec: Modernize printing of kernel messages
sh: pci: Modernize printing of kernel messages
...
Pull Xtensa updates from Max Filippov:
- add syscall audit support
- add seccomp filter support
- clean up make rules under arch/xtensa/boot
- fix state management for exclusive access opcodes
- fix build with PMU enabled
* tag 'xtensa-20200805' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: add missing exclusive access state management
xtensa: fix xtensa_pmu_setup prototype
xtensa: add boot subdirectories build artifacts to 'targets'
xtensa: add uImage and xipImage to targets
xtensa: move vmlinux.bin[.gz] to boot subdirectory
xtensa: initialize_mmu.h: fix a duplicated word
selftests/seccomp: add xtensa support
xtensa: add seccomp support
xtensa: expose syscall through user_pt_regs
xtensa: add audit support
Pull seccomp updates from Kees Cook:
"There are a bunch of clean ups and selftest improvements along with
two major updates to the SECCOMP_RET_USER_NOTIF filter return:
EPOLLHUP support to more easily detect the death of a monitored
process, and being able to inject fds when intercepting syscalls that
expect an fd-opening side-effect (needed by both container folks and
Chrome). The latter continued the refactoring of __scm_install_fd()
started by Christoph, and in the process found and fixed a handful of
bugs in various callers.
- Improved selftest coverage, timeouts, and reporting
- Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner)
- Refactor __scm_install_fd() into __receive_fd() and fix buggy
callers
- Introduce 'addfd' command for SECCOMP_RET_USER_NOTIF (Sargun
Dhillon)"
* tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits)
selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
seccomp: Introduce addfd ioctl to seccomp user notifier
fs: Expand __receive_fd() to accept existing fd
pidfd: Replace open-coded receive_fd()
fs: Add receive_fd() wrapper for __receive_fd()
fs: Move __scm_install_fd() to __receive_fd()
net/scm: Regularize compat handling of scm_detach_fds()
pidfd: Add missing sock updates for pidfd_getfd()
net/compat: Add missing sock updates for SCM_RIGHTS
selftests/seccomp: Check ENOSYS under tracing
selftests/seccomp: Refactor to use fixture variants
selftests/harness: Clean up kern-doc for fixtures
seccomp: Use -1 marker for end of mode 1 syscall list
seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID
selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall()
selftests/seccomp: Make kcmp() less required
seccomp: Use pr_fmt
selftests/seccomp: Improve calibration loop
selftests/seccomp: use 90s as timeout
selftests/seccomp: Expand benchmark to per-filter measurements
...
secure_computing() is called first in syscall_trace_enter() so that
a system call will be aborted quickly without doing succeeding syscall
tracing if seccomp rules want to deny that system call.
TODO:
- Update https://github.com/seccomp/libseccomp csky support
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Xtensa syscall number can be obtained and changed through the
struct user_pt_regs. Syscall return value register is fixed relatively
to the current register window in the user_pt_regs, so it needs a bit of
special treatment.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Now that the selftest harness has variants, use them to eliminate a
bunch of copy/paste duplication.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Will Deacon <will@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
When SECCOMP_IOCTL_NOTIF_ID_VALID was first introduced it had the wrong
direction flag set. While this isn't a big deal as nothing currently
enforces these bits in the kernel, it should be defined correctly. Fix
the define and provide support for the old command until it is no longer
needed for backward compatibility.
Fixes: 6a21cc50f0 ("seccomp: add a return code to trap to userspace")
Signed-off-by: Kees Cook <keescook@chromium.org>
It's useful to see how much (at a minimum) each filter adds to the
syscall overhead. Add additional calculations.
Signed-off-by: Kees Cook <keescook@chromium.org>
The TSYNC ESRCH flag test will fail for regular users because NNP was
not set yet. Add NNP setting.
Fixes: 51891498f2 ("seccomp: allow TSYNC and USER_NOTIF together")
Cc: stable@vger.kernel.org
Reviewed-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Kees Cook <keescook@chromium.org>
Running the seccomp tests as a regular user shouldn't just fail tests
that require CAP_SYS_ADMIN (for getting a PID namespace). Instead,
detect those cases and SKIP them. Additionally, gracefully SKIP missing
CONFIG_USER_NS (and add to "config" since we'd prefer to actually test
this case).
Signed-off-by: Kees Cook <keescook@chromium.org>
The kselftests will be renaming XFAIL to SKIP in the test harness, and
to avoid painful conflicts, rename XFAIL to SKIP now in a future-proofed
way.
Signed-off-by: Kees Cook <keescook@chromium.org>
s390 cannot set syscall number and reture code at the same time,
so set the appropriate flag to indicate it.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
glibc 2.31 calls clock_nanosleep when its nanosleep function is used. So
the restart_syscall fails after that. In order to deal with it, we trace
clock_nanosleep and nanosleep. Then we check for either.
This works just fine on systems with both glibc 2.30 and glibc 2.31,
whereas it failed before on a system with glibc 2.31.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Pull kselftest update from Shuah Khan:
"This kselftest update consists of:
- resctrl_tests for resctrl file system. resctrl isn't included in
the default TARGETS list in kselftest Makefile. It can be run
manually.
- Kselftest harness improvements.
- Kselftest framework and individual test fixes to support runs on
Kernel CI rings and other environments that use relocatable build
and install features.
- Minor cleanups and typo fixes"
* tag 'linux-kselftest-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (25 commits)
selftests: enforce local header dependency in lib.mk
selftests: Fix memfd to support relocatable build (O=objdir)
selftests: Fix seccomp to support relocatable build (O=objdir)
selftests/harness: Handle timeouts cleanly
selftests/harness: Move test child waiting logic
selftests: android: Fix custom install from skipping test progs
selftests: android: ion: Fix ionmap_test compile error
selftests: Fix kselftest O=objdir build from cluttering top level objdir
selftests/seccomp: Adjust test fixture counts
selftests/ftrace: Fix typo in trigger-multihist.tc
selftests/timens: Remove duplicated include <time.h>
selftests/resctrl: fix spelling mistake "Errror" -> "Error"
selftests/resctrl: Add the test in MAINTAINERS
selftests/resctrl: Disable MBA and MBM tests for AMD
selftests/resctrl: Use cache index3 id for AMD schemata masks
selftests/resctrl: Add vendor detection mechanism
selftests/resctrl: Add Cache Allocation Technology (CAT) selftest
selftests/resctrl: Add Cache QoS Monitoring (CQM) selftest
selftests/resctrl: Add MBA test
selftests/resctrl: Add MBM test
...
The seccomp selftest reported the wrong test counts since it was using
slightly the wrong API for defining text fixtures. Adjust the API usage.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The restriction introduced in 7a0df7fbc1 ("seccomp: Make NEW_LISTENER and
TSYNC flags exclusive") is mostly artificial: there is enough information
in a seccomp user notification to tell which thread triggered a
notification. The reason it was introduced is because TSYNC makes the
syscall return a thread-id on failure, and NEW_LISTENER returns an fd, and
there's no way to distinguish between these two cases (well, I suppose the
caller could check all fds it has, then do the syscall, and if the return
value was an fd that already existed, then it must be a thread id, but
bleh).
Matthew would like to use these two flags together in the Chrome sandbox
which wants to use TSYNC for video drivers and NEW_LISTENER to proxy
syscalls.
So, let's fix this ugliness by adding another flag, TSYNC_ESRCH, which
tells the kernel to just return -ESRCH on a TSYNC error. This way,
NEW_LISTENER (and any subsequent seccomp() commands that want to return
positive values) don't conflict with each other.
Suggested-by: Matthew Denton <mpdenton@google.com>
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Link: https://lore.kernel.org/r/20200304180517.23867-1-tycho@tycho.ws
Signed-off-by: Kees Cook <keescook@chromium.org>
Pull seccomp updates from Kees Cook:
"Mostly this is implementing the new flag SECCOMP_USER_NOTIF_FLAG_CONTINUE,
but there are cleanups as well.
- implement SECCOMP_USER_NOTIF_FLAG_CONTINUE (Christian Brauner)
- fixes to selftests (Christian Brauner)
- remove secure_computing() argument (Christian Brauner)"
* tag 'seccomp-v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: rework define for SECCOMP_USER_NOTIF_FLAG_CONTINUE
seccomp: fix SECCOMP_USER_NOTIF_FLAG_CONTINUE test
seccomp: simplify secure_computing()
seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE
seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
seccomp: avoid overflow in implicit constant conversion
The ifndef for SECCOMP_USER_NOTIF_FLAG_CONTINUE was placed under the
ifndef for the SECCOMP_FILTER_FLAG_NEW_LISTENER feature. This will not
work on systems that do support SECCOMP_FILTER_FLAG_NEW_LISTENER but do not
support SECCOMP_USER_NOTIF_FLAG_CONTINUE. So move the latter ifndef out of
the former ifndef's scope.
2019-10-20 11:14:01 make run_tests -C seccomp
make: Entering directory '/usr/src/perf_selftests-x86_64-rhel-7.6-0eebfed2954f152259cae0ad57b91d3ea92968e8/tools/testing/selftests/seccomp'
gcc -Wl,-no-as-needed -Wall seccomp_bpf.c -lpthread -o seccomp_bpf
seccomp_bpf.c: In function ‘user_notification_continue’:
seccomp_bpf.c:3562:15: error: ‘SECCOMP_USER_NOTIF_FLAG_CONTINUE’ undeclared (first use in this function)
resp.flags = SECCOMP_USER_NOTIF_FLAG_CONTINUE;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
seccomp_bpf.c:3562:15: note: each undeclared identifier is reported only once for each function it appears in
Makefile:12: recipe for target 'seccomp_bpf' failed
make: *** [seccomp_bpf] Error 1
make: Leaving directory '/usr/src/perf_selftests-x86_64-rhel-7.6-0eebfed2954f152259cae0ad57b91d3ea92968e8/tools/testing/selftests/seccomp'
Reported-by: kernel test robot <rong.a.chen@intel.com>
Fixes: 0eebfed295 ("seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE")
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Tycho Andersen <tycho@tycho.ws>
Link: https://lore.kernel.org/r/20191021091055.4644-1-christian.brauner@ubuntu.com
Signed-off-by: Kees Cook <keescook@chromium.org>
The seccomp selftest goes to some length to build against older kernel
headers, viz. all the #ifdefs at the beginning of the file.
Commit 201766a20e ("ptrace: add PTRACE_GET_SYSCALL_INFO request")
introduces some additional macros, but doesn't do the #ifdef dance.
Let's add that dance here to avoid:
gcc -Wl,-no-as-needed -Wall seccomp_bpf.c -lpthread -o seccomp_bpf
In file included from seccomp_bpf.c:51:
seccomp_bpf.c: In function ‘tracer_ptrace’:
seccomp_bpf.c:1787:20: error: ‘PTRACE_EVENTMSG_SYSCALL_ENTRY’ undeclared (first use in this function); did you mean ‘PTRACE_EVENT_CLONE’?
EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../kselftest_harness.h:608:13: note: in definition of macro ‘__EXPECT’
__typeof__(_expected) __exp = (_expected); \
^~~~~~~~~
seccomp_bpf.c:1787:2: note: in expansion of macro ‘EXPECT_EQ’
EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
^~~~~~~~~
seccomp_bpf.c:1787:20: note: each undeclared identifier is reported only once for each function it appears in
EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../kselftest_harness.h:608:13: note: in definition of macro ‘__EXPECT’
__typeof__(_expected) __exp = (_expected); \
^~~~~~~~~
seccomp_bpf.c:1787:2: note: in expansion of macro ‘EXPECT_EQ’
EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
^~~~~~~~~
seccomp_bpf.c:1788:6: error: ‘PTRACE_EVENTMSG_SYSCALL_EXIT’ undeclared (first use in this function); did you mean ‘PTRACE_EVENT_EXIT’?
: PTRACE_EVENTMSG_SYSCALL_EXIT, msg);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
../kselftest_harness.h:608:13: note: in definition of macro ‘__EXPECT’
__typeof__(_expected) __exp = (_expected); \
^~~~~~~~~
seccomp_bpf.c:1787:2: note: in expansion of macro ‘EXPECT_EQ’
EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
^~~~~~~~~
make: *** [Makefile:12: seccomp_bpf] Error 1
[skhan@linuxfoundation.org: Fix checkpatch error in commit log]
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Fixes: 201766a20e ("ptrace: add PTRACE_GET_SYSCALL_INFO request")
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain
details of the syscall the tracee is blocked in.
There are two reasons for a special syscall-related ptrace request.
Firstly, with the current ptrace API there are cases when ptracer cannot
retrieve necessary information about syscalls. Some examples include:
* The notorious int-0x80-from-64-bit-task issue. See [1] for details.
In short, if a 64-bit task performs a syscall through int 0x80, its
tracer has no reliable means to find out that the syscall was, in
fact, a compat syscall, and misidentifies it.
* Syscall-enter-stop and syscall-exit-stop look the same for the
tracer. Common practice is to keep track of the sequence of
ptrace-stops in order not to mix the two syscall-stops up. But it is
not as simple as it looks; for example, strace had a (just recently
fixed) long-standing bug where attaching strace to a tracee that is
performing the execve system call led to the tracer identifying the
following syscall-exit-stop as syscall-enter-stop, which messed up
all the state tracking.
* Since the introduction of commit 84d77d3f06 ("ptrace: Don't allow
accessing an undumpable mm"), both PTRACE_PEEKDATA and
process_vm_readv become unavailable when the process dumpable flag is
cleared. On such architectures as ia64 this results in all syscall
arguments being unavailable for the tracer.
Secondly, ptracers also have to support a lot of arch-specific code for
obtaining information about the tracee. For some architectures, this
requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
argument and return value.
ptrace(2) man page:
long ptrace(enum __ptrace_request request, pid_t pid,
void *addr, void *data);
...
PTRACE_GET_SYSCALL_INFO
Retrieve information about the syscall that caused the stop.
The information is placed into the buffer pointed by "data"
argument, which should be a pointer to a buffer of type
"struct ptrace_syscall_info".
The "addr" argument contains the size of the buffer pointed to
by "data" argument (i.e., sizeof(struct ptrace_syscall_info)).
The return value contains the number of bytes available
to be written by the kernel.
If the size of data to be written by the kernel exceeds the size
specified by "addr" argument, the output is truncated.
[ldv@altlinux.org: selftests/seccomp/seccomp_bpf: update for PTRACE_GET_SYSCALL_INFO]
Link: http://lkml.kernel.org/r/20190708182904.GA12332@altlinux.org
Link: http://lkml.kernel.org/r/20190510152842.GF28558@altlinux.org
Signed-off-by: Elvira Khabirova <lineprinter@altlinux.org>
Co-developed-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Eugene Syromyatnikov <esyr@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greentime Hu <greentime@andestech.com>
Cc: Helge Deller <deller@gmx.de> [parisc]
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: kbuild test robot <lkp@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull Kselftest updates from Shuah Khan:
- fixes to seccomp test, and kselftest framework
- cleanups to remove duplicate header defines
- fixes to efivarfs "make clean" target
- cgroup cleanup path
- Moving the IMA kexec_load selftest to selftests/kexec work from Mimi
Johar and Petr Vorel
- A framework to kselftest for writing kernel test modules addition
from Tobin C. Harding
* tag 'linux-kselftest-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (29 commits)
selftests: build and run gpio when output directory is the src dir
selftests/ipc: Fix msgque compiler warnings
selftests/efivarfs: clean up test files from test_create*()
selftests: fix headers_install circular dependency
selftests/kexec: update get_secureboot_mode
selftests/kexec: make kexec_load test independent of IMA being enabled
selftests/kexec: check kexec_load and kexec_file_load are enabled
selftests/kexec: Add missing '=y' to config options
selftests/kexec: kexec_file_load syscall test
selftests/kexec: define "require_root_privileges"
selftests/kexec: define common logging functions
selftests/kexec: define a set of common functions
selftests/kexec: cleanup the kexec selftest
selftests/kexec: move the IMA kexec_load selftest to selftests/kexec
selftests/harness: Add 30 second timeout per test
selftests/seccomp: Handle namespace failures gracefully
selftests: cgroup: fix cleanup path in test_memcg_subtree_control()
selftests: efivarfs: remove the test_create_read file if it was exist
rseq/selftests: Adapt number of threads to the number of detected cpus
lib: Add test module for strscpy_pad
...
When running without USERNS or PIDNS the seccomp test would hang since
it was waiting forever for the child to trigger the user notification
since it seems the glibc() abort handler makes a call to getpid(),
which would trap again. This changes the getpid filter to getppid, and
makes sure ASSERTs execute to stop from spawning the listener.
Reported-by: Shuah Khan <shuah@kernel.org>
Fixes: 6a21cc50f0 ("seccomp: add a return code to trap to userspace")
Cc: stable@vger.kernel.org # > 5.0
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Clang noticed that some none-zero sleep()s were actually using zero
anyway. This switches to nanosleep() to gain sub-second granularity.
seccomp_bpf.c:2625:9: warning: implicit conversion from 'double' to
'unsigned int' changes value from 0.1 to 0 [-Wliteral-conversion]
sleep(0.1);
~~~~~ ^~~
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Shuah Khan <shuah@kernel.org>
The pid ns cannot be unshare()d as an unprivileged user without owning the
userns as well. Let's unshare the userns so that we can subsequently
unshare the pidns.
This also means that we don't need to set the no new privs bit as in the
other test cases, since we're unsharing the userns.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
seccomp() doesn't allow users who aren't root in their userns to attach
filters unless they have the nnp bit set, so let's set it so that these
tests can pass when run as an unprivileged user.
This idea stolen from the other seccomp tests, which use this trick :)
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
The get_metadata() test requires real root, so let's skip it if we're not
real root.
Note that I used XFAIL here because that's what the test does later if
CONFIG_CHEKCKPOINT_RESTORE happens to not be enabled. After looking at the
code, there doesn't seem to be a nice way to skip tests defined as TEST(),
since there's no return code (I tried exit(KSFT_SKIP), but that didn't work
either...). So let's do it this way to be consistent, and easier to fix
when someone comes along and fixes it.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
There used to be an explanation here because it could trigger lockdep
previously, but now we're not doing recursive locking, so it really is just
for grins.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
This this test forks a child, and then the parent waits for a write() to a
pipe signalling the child is ready to be attached to. If something in the
child ASSERTs before it does this write, the test will hang waiting for it.
Instead, let's EXPECT, so that execution continues until we do the write.
Any failure after that is fine and can ASSERT.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
Passing EPERM during syscall skipping was confusing since the test wasn't
actually exercising the errno evaluation -- it was just passing a literal
"1" (EPERM). Instead, expand the tests to check both direct value returns
(positive, 45000 in this case), and errno values (negative, -ESRCH in this
case) to check both fake success and fake failure during syscall skipping.
Reported-by: Colin Ian King <colin.king@canonical.com>
Fixes: a33b2d0359 ("selftests/seccomp: Add tests for basic ptrace actions")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
In the face of missing user notification support, the self test needs
to stop executing a test (ASSERT_*) instead of just reporting and
continuing (EXPECT_*). This adjusts the user notification tests to do
that where needed.
Reported-by: Shuah Khan <shuah@kernel.org>
Fixes: 6a21cc50f0 ("seccomp: add a return code to trap to userspace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Tycho Andersen <tycho@tycho.ws>
Tested-by: Shuah Khan <shuah@kernel.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
Pull seccomp updates from James Morris:
- Add SECCOMP_RET_USER_NOTIF
- seccomp fixes for sparse warnings and s390 build (Tycho)
* 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
seccomp, s390: fix build for syscall type change
seccomp: fix poor type promotion
samples: add an example of seccomp user trap
seccomp: add a return code to trap to userspace
seccomp: switch system call argument type to void *
seccomp: hoist struct seccomp_data recalculation higher
This patch introduces a means for syscalls matched in seccomp to notify
some other task that a particular filter has been triggered.
The motivation for this is primarily for use with containers. For example,
if a container does an init_module(), we obviously don't want to load this
untrusted code, which may be compiled for the wrong version of the kernel
anyway. Instead, we could parse the module image, figure out which module
the container is trying to load and load it on the host.
As another example, containers cannot mount() in general since various
filesystems assume a trusted image. However, if an orchestrator knows that
e.g. a particular block device has not been exposed to a container for
writing, it want to allow the container to mount that block device (that
is, handle the mount for it).
This patch adds functionality that is already possible via at least two
other means that I know about, both of which involve ptrace(): first, one
could ptrace attach, and then iterate through syscalls via PTRACE_SYSCALL.
Unfortunately this is slow, so a faster version would be to install a
filter that does SECCOMP_RET_TRACE, which triggers a PTRACE_EVENT_SECCOMP.
Since ptrace allows only one tracer, if the container runtime is that
tracer, users inside the container (or outside) trying to debug it will not
be able to use ptrace, which is annoying. It also means that older
distributions based on Upstart cannot boot inside containers using ptrace,
since upstart itself uses ptrace to monitor services while starting.
The actual implementation of this is fairly small, although getting the
synchronization right was/is slightly complex.
Finally, it's worth noting that the classic seccomp TOCTOU of reading
memory data from the task still applies here, but can be avoided with
careful design of the userspace handler: if the userspace handler reads all
of the task memory that is necessary before applying its security policy,
the tracee's subsequent memory edits will not be read by the tracer.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
CC: Kees Cook <keescook@chromium.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: "Serge E. Hallyn" <serge@hallyn.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
CC: Christian Brauner <christian@brauner.io>
CC: Tyler Hicks <tyhicks@canonical.com>
CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Signed-off-by: Kees Cook <keescook@chromium.org>
If a seccomp user is not interested in Speculative Store Bypass mitigation
by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag when
adding filters.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Since seccomp_get_metadata() depends on CHECKPOINT_RESTORE, XFAIL the
test if the ptrace reports it as missing.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
- Fix seccomp GET_METADATA to deal with field sizes correctly (Tycho Andersen)
- Add selftest to make sure GET_METADATA doesn't regress (Tycho Andersen)