Prepare fpu__drop() to use fpu->fpregs_active.
There are two distinct usecases for fpu__drop() in this context:
exit_thread() when called for 'current' in exit(), and when called
for another task in fork().
This patch does not change behavior, it only adds a couple of
debug checks and structures the code to make the ->fpregs_active
change more obviously correct.
All the complications will be removed later on.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-18-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Do this temporarily only, to make it easier to change the FPU state machine,
in particular this change couples the fpu->fpregs_active and fpu->fpstate_active
states: they are only set/cleared together (as far as the scheduler sees them).
This will be removed by later patches.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-17-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The fpregs_active() inline function is pretty pointless - in almost
all the callsites it can be replaced with a direct fpu->fpregs_active
access.
Do so and eliminate the extra layer of obfuscation.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-16-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make it more consistent with regular memcpy() semantics, where the destination
argument comes first.
No change in functionality.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-15-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
'size_total' is derived from an unsigned input parameter - and then converted
to 'int' and checked for negative ranges:
if (size_total < 0 || offset < size_total) {
This conversion and the checks are unnecessary obfuscation, reject overly
large requested copy sizes outright and simplify the underlying code.
Reported-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-10-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Right now there's a confusing mixture of 'offset' and 'size' parameters:
- __copy_xstate_to_*() input parameter 'end_pos' not not really an offset,
but the full size of the copy to be performed.
- input parameter 'count' to copy_xstate_to_*() shadows that of
__copy_xstate_to_*()'s 'count' parameter name - but the roles
are different: the first one is the total number of bytes to
be copied, while the second one is a partial copy size.
To unconfuse all this, use a consistent set of parameter names:
- 'size' is the partial copy size within a single xstate component
- 'size_total' is the total copy requested
- 'offset_start' is the requested starting offset.
- 'offset' is the offset within an xstate component.
No change in functionality.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-9-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
'start_pos' is always 0, so remove it and remove the pointless check of 'pos < 0'
which can not ever be true as 'pos' is unsigned ...
No change in functionality.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-8-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Parameter ordering is weird:
int copy_xstate_to_kernel(unsigned int pos, unsigned int count, void *kbuf, struct xregs_state *xsave);
int copy_xstate_to_user(unsigned int pos, unsigned int count, void __user *ubuf, struct xregs_state *xsave);
'pos' and 'count', which are attributes of the destination buffer, are listed before the destination
buffer itself ...
List them after the primary arguments instead.
This makes the code more similar to regular memcpy() variant APIs.
No change in functionality.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-6-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The 'kbuf' parameter is unused in the _user() side of the API, remove it.
This simplifies the code and makes it easier to think about.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-5-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The 'ubuf' parameter is unused in the _kernel() side of the API, remove it.
This simplifies the code and makes it easier to think about.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-4-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
copy_xstate_to_user() is a weird API - in part due to a bad API inherited
from the regset APIs.
But don't propagate that bad API choice into the FPU code - so as a first
step split the API into kernel and user buffer handling routines.
(Also split the xstate_copyout() internal helper.)
The split API is a dumb duplication that should be obviously correct, the
real splitting will be done in the next patch.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-3-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The 'copyin/copyout' nomenclature needlessly departs from what the modern FPU code
uses, which is:
copy_fpregs_to_fpstate()
copy_fpstate_to_sigframe()
copy_fregs_to_user()
copy_fxregs_to_kernel()
copy_fxregs_to_user()
copy_kernel_to_fpregs()
copy_kernel_to_fregs()
copy_kernel_to_fxregs()
copy_kernel_to_xregs()
copy_user_to_fregs()
copy_user_to_fxregs()
copy_user_to_xregs()
copy_xregs_to_kernel()
copy_xregs_to_user()
I.e. according to this pattern, the following rename should be done:
copyin_to_xsaves() -> copy_user_to_xstate()
copyout_from_xsaves() -> copy_xstate_to_user()
or, if we want to be pedantic, denote that that the user-space format is ptrace:
copyin_to_xsaves() -> copy_user_ptrace_to_xstate()
copyout_from_xsaves() -> copy_xstate_to_user_ptrace()
But I'd suggest the shorter, non-pedantic name.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-2-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJZxSV8AAoJELDendYovxMvf5YH/jUEgeFVgP0KRjNsvgp4gu88
4BW7nWYtFt4gGE1KnrKEPbg5Je0OwkpW7vUXxvLwDGWymtHMzVuuB2xxwzkePyzS
17Kzmb/JiuaNpVF4+5v3JvAw3b9iHrZ7T6cXOQgm28agd3m/y+9FSyzoMoNNRdGG
xURwUyK1idRqtkQV5VsQAK0Z1lVF7YhhaxWXBtClsqnKWoeLBLc8fpRJmUNruA33
E2Sdi06mNNN3xudu1s2edC5hAO4EgVKmonnmyHRYonIYwuqSND8fhEXj+PRdHj7s
lLVRQixd3raBiSscLASaQ7I/66frBm+TXzmoHAVtYkdlXBJlisTIvQlMPtAwu60=
=c3HX
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.14b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"A fix for a missing __init annotation and two cleanup patches"
* tag 'for-linus-4.14b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen, arm64: drop dummy lookup_address()
xen: don't compile pv-specific parts if XEN_PV isn't configured
xen: x86: mark xen_find_pt_base as __init
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Use R13 instead of RBP. Both are callee-saved registers, so the
substitution is straightforward.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Mix things up a little bit to get rid of the RBP usage, without hurting
performance too much. Use RDI instead of RBP for the TBL pointer. That
will clobber CTX, so spill CTX onto the stack and use R12 to read it in
the outer loop. R12 is used as a non-persistent temporary variable
elsewhere, so it's safe to use.
Also remove the unused y4 variable.
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Swap the usages of R12 and RBP. Use R12 for the TBL register, and use
RBP to store the pre-aligned stack pointer.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
There's no need to use RBP as a temporary register for the TBL value,
because it always stores the same value: the address of the K256 table.
Instead just reference the address of K256 directly.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Swap the usages of R12 and RBP. Use R12 for the TBL register, and use
RBP to store the pre-aligned stack pointer.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Swap the usages of R12 and RBP. Use R12 for the REG_D register, and use
RBP to store the pre-aligned stack pointer.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Use R11 instead of RBP. Since R11 isn't a callee-saved register, it
doesn't need to be saved and restored on the stack.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Use RSI instead of RBP for RT1. Since RSI is also used as a the 'dst'
function argument, it needs to be saved on the stack until the argument
is needed.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Use R15 instead of RBP. R15 can't be used as the RID1 register because
of x86 instruction encoding limitations. So use R15 for CTX and RDI for
CTX. This means that CTX is no longer an implicit function argument.
Instead it needs to be explicitly copied from RDI.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Use R15 instead of RBP. R15 can't be used as the RID1 register because
of x86 instruction encoding limitations. So use R15 for CTX and RDI for
CTX. This means that CTX is no longer an implicit function argument.
Instead it needs to be explicitly copied from RDI.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Use R12 instead of RBP. Both are callee-saved registers, so the
substitution is straightforward.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using RBP as a temporary register breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Use R12 instead of RBP. R12 can't be used as the RT0 register because
of x86 instruction encoding limitations. So use R12 for CTX and RDI for
CTX. This means that CTX is no longer an implicit function argument.
Instead it needs to be explicitly copied from RDI.
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
- fix build without CONFIG_HAVE_KVM_IRQ_ROUTING
- fix NULL access in x86 CR access
- fix race with VMX posted interrups
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJZwS3PAAoJEED/6hsPKofoT+EH/0EGL2BdSAMmtLm5HUrGJHpO
412Q0bxV2KREcic1xJ+eJiuUcM2UihvflOyJQVBFEkToClw9jbB8Ms0kQUufYkLa
R1y7HmrDVVSbuEtd68fqbApuUaOKbjQEjmjKL5j3A2vxs9dgID5qMffRj5yGBC+a
V0ZpVsdLwQvqix77ibPXpoZnerbvOqkFadskGjYBpoiXEhNPbsEdc4Ca6sHAiqSs
hfUGTAnMSLBl34GfMBwvh++b8H/YlAoWM2vDnV4LnQb48hbGwqSwcVQ3CFEQbFgN
MrZoRFYpdx4FzXYYsh7dTSvPO4JyZXex7QKZSrZpg59Azfcx8pKv3am7H9W811g=
=ksrm
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
- fix build without CONFIG_HAVE_KVM_IRQ_ROUTING
- fix NULL access in x86 CR access
- fix race with VMX posted interrups
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt
KVM: VMX: do not change SN bit in vmx_update_pi_irte()
KVM: x86: Fix the NULL pointer parameter in check_cr_write()
Revert "KVM: Don't accept obviously wrong gsi values via KVM_IRQFD"
WARN_ON_ONCE(pi_test_sn(&vmx->pi_desc)) in kvm_vcpu_trigger_posted_interrupt()
intends to detect the violation of invariant that VT-d PI notification
event is not suppressed when vcpu is in the guest mode. Because the
two checks for the target vcpu mode and the target suppress field
cannot be performed atomically, the target vcpu mode may change in
between. If that does happen, WARN_ON_ONCE() here may raise false
alarms.
As the previous patch fixed the real invariant breaker, remove this
WARN_ON_ONCE() to avoid false alarms, and document the allowed cases
instead.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reported-by: "Ramamurthy, Venkatesh" <venkatesh.ramamurthy@intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 28b835d60f ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
In kvm_vcpu_trigger_posted_interrupt() and pi_pre_block(), KVM
assumes that PI notification events should not be suppressed when the
target vCPU is not blocked.
vmx_update_pi_irte() sets the SN field before changing an interrupt
from posting to remapping, but it does not check the vCPU mode.
Therefore, the change of SN field may break above the assumption.
Besides, I don't see reasons to suppress notification events here, so
remove the changes of SN field to avoid race condition.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reported-by: "Ramamurthy, Venkatesh" <venkatesh.ramamurthy@intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 28b835d60f ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Routine check_cr_write() will trigger emulator_get_cpuid()->
kvm_cpuid() to get maxphyaddr, and NULL is passed as values
for ebx/ecx/edx. This is problematic because kvm_cpuid() will
dereference these pointers.
Fixes: d1cd3ce900 ("KVM: MMU: check guest CR3 reserved bits based on its physical address width.")
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Pull x86 fix from Thomas Gleixner:
"A single fix addressing the missing CP8 feature bit in CPUID for a
range of AMD ZEN models/mask revisions"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/AMD: Fix erratum 1076 (CPB bit)
Pull UML updates from Richard Weinberger:
- minor improvements
- fixes for Debian's new gcc defaults (pie enabled by default)
- fixes for XSTATE/XSAVE to make UML work again on modern systems
* 'for-linus-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: return negative in tuntap_open_tramp()
um: remove a stray tab
um: Use relative modversions with LD_SCRIPT_DYN
um: link vmlinux with -no-pie
um: Fix CONFIG_GCOV for modules.
Fix minor typos and grammar in UML start_up help
um: defconfig: Cleanup from old Kconfig options
um: Fix FP register size for XSTATE/XSAVE
gcc-4.6 causes a harmless link-time warning:
WARNING: vmlinux.o(.text.unlikely+0x48e): Section mismatch in reference from the function xen_find_pt_base() to the function .init.text:m2p()
The function xen_find_pt_base() references
the function __init m2p().
This is often because xen_find_pt_base lacks a __init
annotation or the annotation of m2p is wrong.
Newer compilers inline this function, so it never shows up, but marking
it __init is the right way to avoid the warning.
Fixes: 70e6119955 ("xen: move p2m list if conflicting with e820 map")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
When emulating a nested VM-entry from L1 to L2, several control field
validation checks are deferred to the hardware. Should one of these
validation checks fail, vcpu_vmx_run will set the vmx->fail flag. When
this happens, the L2 guest state is not loaded (even in part), and
execution should continue in L1 with the next instruction after the
VMLAUNCH/VMRESUME.
The VMCS12 is not modified (except for the VM-instruction error
field), the VMCS12 MSR save/load lists are not processed, and the CPU
state is not loaded from the VMCS12 host area. Moreover, the vmcs02
exit reason is stale, so it should not be consulted for any reason.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On an early VMLAUNCH/VMRESUME failure (i.e. one which sets the
VM-instruction error field of the current VMCS), the launch state of
the current VMCS is not set to "launched," and the VM-exit information
fields of the current VMCS (including IDT-vectoring information and
exit reason) are stale.
On a late VMLAUNCH/VMRESUME failure (i.e. one which sets the high bit
of the exit reason field), the launch state of the current VMCS is not
set to "launched," and only two of the VM-exit information fields of
the current VMCS are modified (exit reason and exit
qualification). The remaining VM-exit information fields of the
current VMCS (including IDT-vectoring information, in particular) are
stale.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After a successful VM-entry, RFLAGS is cleared, with the exception of
bit 1, which is always set. This is handled by load_vmcs12_host_state.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
During code inspection, the following potential race was seen:
CPU0 CPU1
kvm_async_pf_task_wait apf_task_wake_one
[L] swait_active(&n->wq)
[S] prepare_to_swait(&n.wq)
[L] if (!hlist_unhahed(&n.link))
schedule() [S] hlist_del_init(&n->link);
Properly serialize swait_active() checks such that a wakeup is
not missed.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The value of the guest_irq argument to vmx_update_pi_irte() is
ultimately coming from a KVM_IRQFD API call. Do not BUG() in
vmx_update_pi_irte() if the value is out-of bounds. (Especially,
since KVM as a whole seems to hang after that.)
Instead, print a message only once if we find that we don't have a
route for a certain IRQ (which can be out-of-bounds or within the
array).
This fixes CVE-2017-1000252.
Fixes: efc644048e ("KVM: x86: Update IRTE for posted-interrupts")
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If L1 does not specify the "use TPR shadow" VM-execution control in
vmcs12, then L0 must specify the "CR8-load exiting" and "CR8-store
exiting" VM-execution controls in vmcs02. Failure to do so will give
the L2 VM unrestricted read/write access to the hardware CR8.
This fixes CVE-2017-12154.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CPUID Fn8000_0007_EDX[CPB] is wrongly 0 on models up to B1. But they do
support CPB (AMD's Core Performance Boosting cpufreq CPU feature), so fix that.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170907170821.16021-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull more set_fs removal from Al Viro:
"Christoph's 'use kernel_read and friends rather than open-coding
set_fs()' series"
* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: unexport vfs_readv and vfs_writev
fs: unexport vfs_read and vfs_write
fs: unexport __vfs_read/__vfs_write
lustre: switch to kernel_write
gadget/f_mass_storage: stop messing with the address limit
mconsole: switch to kernel_read
btrfs: switch write_buf to kernel_write
net/9p: switch p9_fd_read to kernel_write
mm/nommu: switch do_mmap_private to kernel_read
serial2002: switch serial2002_tty_write to kernel_{read/write}
fs: make the buf argument to __kernel_write a void pointer
fs: fix kernel_write prototype
fs: fix kernel_read prototype
fs: move kernel_read to fs/read_write.c
fs: move kernel_write to fs/read_write.c
autofs4: switch autofs4_write to __kernel_write
ashmem: switch to ->read_iter
qemu-system-x86-8600 [004] d..1 7205.687530: kvm_entry: vcpu 2
qemu-system-x86-8600 [004] .... 7205.687532: kvm_exit: reason EXCEPTION_NMI rip 0xffffffffa921297d info ffffeb2c0e44e018 80000b0e
qemu-system-x86-8600 [004] .... 7205.687532: kvm_page_fault: address ffffeb2c0e44e018 error_code 0
qemu-system-x86-8600 [004] .... 7205.687620: kvm_try_async_get_page: gva = 0xffffeb2c0e44e018, gfn = 0x427e4e
qemu-system-x86-8600 [004] .N.. 7205.687628: kvm_async_pf_not_present: token 0x8b002 gva 0xffffeb2c0e44e018
kworker/4:2-7814 [004] .... 7205.687655: kvm_async_pf_completed: gva 0xffffeb2c0e44e018 address 0x7fcc30c4e000
qemu-system-x86-8600 [004] .... 7205.687703: kvm_async_pf_ready: token 0x8b002 gva 0xffffeb2c0e44e018
qemu-system-x86-8600 [004] d..1 7205.687711: kvm_entry: vcpu 2
After running some memory intensive workload in guest, I catch the kworker
which completes the GUP too quickly, and queues an "Page Ready" #PF exception
after the "Page not Present" exception before the next vmentry as the above
trace which will result in #DF injected to guest.
This patch fixes it by clearing the queue for "Page not Present" if "Page Ready"
occurs before the next vmentry since the GUP has already got the required page
and shadow page table has already been fixed by "Page Ready" handler.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Fixes: 7c90705bf2 ("KVM: Inject asynchronous page fault into a PV guest if page is swapped out.")
[Changed indentation and added clearing of injected. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Don't block vCPU if there is pending exception.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>