Commit Graph

19105 Commits

Author SHA1 Message Date
Oleg Nesterov
2b82cadffc uprobes/x86: Introduce push_ret_address()
Extract the "push return address" code from branch_emulate_op() into
the new simple helper, push_ret_address(). It will have more users.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30 19:10:38 +02:00
Oleg Nesterov
78d9af4cd3 uprobes/x86: Cleanup the usage of arch_uprobe->def.fixups, make it u8
handle_riprel_insn() assumes that nobody else could modify ->fixups
before. This is correct but fragile, change it to use "|=".

Also make ->fixups u8, we are going to add the new members into the
union. It is not clear why UPROBE_FIX_RIP_.X lived in the upper byte,
redefine them so that they can fit into u8.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30 19:10:38 +02:00
Oleg Nesterov
97aa5cddbe uprobes/x86: Move default_xol_ops's data into arch_uprobe->def
Finally we can move arch_uprobe->fixups/rip_rela_target_address
into the new "def" struct and place this struct in the union, they
are only used by default_xol_ops paths.

The patch also renames rip_rela_target_address to riprel_target just
to make this name shorter.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30 19:10:37 +02:00
Oleg Nesterov
220ef8dc9a uprobes/x86: Move UPROBE_FIX_SETF logic from arch_uprobe_post_xol() to default_post_xol_op()
UPROBE_FIX_SETF is only needed to handle "popf" correctly but it is
processed by the generic arch_uprobe_post_xol() code. This doesn't
allows us to make ->fixups private for default_xol_ops.

1 Change default_post_xol_op(UPROBE_FIX_SETF) to set ->saved_tf = T.

   "popf" always reads the flags from stack, it doesn't matter if TF
   was set or not before single-step. Ignoring the naming, this is
   even more logical, "saved_tf" means "owned by application" and we
   do not own this flag after "popf".

2. Change arch_uprobe_post_xol() to save ->saved_tf into the local
   "bool send_sigtrap" before ->post_xol().

3. Change arch_uprobe_post_xol() to ignore UPROBE_FIX_SETF and just
   check ->saved_tf after ->post_xol().

With this patch ->fixups and ->rip_rela_target_address are only used
by default_xol_ops hooks, we are ready to remove them from the common
part of arch_uprobe.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30 19:10:37 +02:00
Oleg Nesterov
6ded5f3848 uprobes/x86: Don't use arch_uprobe_abort_xol() in arch_uprobe_post_xol()
014940bad8 "uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails"
changed arch_uprobe_post_xol() to use arch_uprobe_abort_xol() if ->post_xol
fails. This was correct and helped to avoid the additional complications,
we need to clear X86_EFLAGS_TF in this case.

However, now that we have uprobe_xol_ops->abort() hook it would be better
to avoid arch_uprobe_abort_xol() here. ->post_xol() should likely do what
->abort() does anyway, we should not do the same work twice. Currently only
handle_riprel_post_xol() can be called twice, this is unnecessary but safe.
Still this is not clean and can lead to the problems in future.

Change arch_uprobe_post_xol() to clear X86_EFLAGS_TF and restore ->ip by
hand and avoid arch_uprobe_abort_xol(). This temporary uglifies the usage
of autask.saved_tf, we will cleanup this later.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30 19:10:37 +02:00
Oleg Nesterov
588fbd613c uprobes/x86: Introduce uprobe_xol_ops->abort() and default_abort_op()
arch_uprobe_abort_xol() calls handle_riprel_post_xol() even if
auprobe->ops != default_xol_ops. This is fine correctness wise, only
default_pre_xol_op() can set UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX and
otherwise handle_riprel_post_xol() is nop.

But this doesn't look clean and this doesn't allow us to move ->fixups
into the union in arch_uprobe. Move this handle_riprel_post_xol() call
into the new default_abort_op() hook and change arch_uprobe_abort_xol()
accordingly.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30 19:10:36 +02:00
Oleg Nesterov
dd91016dfc uprobes/x86: Don't change the task's state if ->pre_xol() fails
Currently this doesn't matter, the only ->pre_xol() hook can't fail,
but we need to fix arch_uprobe_pre_xol() anyway. If ->pre_xol() fails
we should not change regs->ip/flags, we should just return the error
to make restart actually possible.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30 19:10:36 +02:00
Oleg Nesterov
b24dc8dace uprobes/x86: Fix is_64bit_mm() with CONFIG_X86_X32
is_64bit_mm() assumes that mm->context.ia32_compat means the 32-bit
instruction set, this is not true if the task is TIF_X32.

Change set_personality_ia32() to initialize mm->context.ia32_compat
by TIF_X32 or TIF_IA32 instead of 1. This allows to fix is_64bit_mm()
without affecting other users, they all treat ia32_compat as "bool".

TIF_ in ->ia32_compat looks a bit strange, but this is grep-friendly
and avoids the new define's.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30 19:10:35 +02:00
Oleg Nesterov
8dbacad93a uprobes/x86: Make good_insns_* depend on CONFIG_X86_*
Add the suitable ifdef's around good_insns_* arrays. We do not want
to add the ugly ifdef's into their only user, uprobe_init_insn(), so
the "#else" branch simply defines them as NULL. This doesn't generate
the extra code, gcc is smart enough, although the code is fine even if
it could not detect that (without CONFIG_IA32_EMULATION) is_64bit_mm()
is __builtin_constant_p().

The patch looks more complicated because it also moves good_insns_64
up close to good_insns_32.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30 19:10:35 +02:00
Oleg Nesterov
ff261964cf uprobes/x86: Shift "insn_complete" from branch_setup_xol_ops() to uprobe_init_insn()
Change uprobe_init_insn() to make insn_complete() == T, this makes
other insn_get_*() calls unnecessary.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30 19:10:34 +02:00
Oleg Nesterov
2ae1f49ae1 uprobes/x86: Add is_64bit_mm(), kill validate_insn_bits()
1. Extract the ->ia32_compat check from 64bit validate_insn_bits()
   into the new helper, is_64bit_mm(), it will have more users.

   TODO: this checks is actually wrong if mm owner is X32 task,
   we need another fix which changes set_personality_ia32().

   TODO: even worse, the whole 64-or-32-bit logic is very broken
   and the fix is not simple, we need the nontrivial changes in
   the core uprobes code.

2. Kill validate_insn_bits() and change its single caller to use
   uprobe_init_insn(is_64bit_mm(mm).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30 19:10:33 +02:00
Oleg Nesterov
73175d0d19 uprobes/x86: Add uprobe_init_insn(), kill validate_insn_{32,64}bits()
validate_insn_32bits() and validate_insn_64bits() are very similar,
turn them into the single uprobe_init_insn() which has the additional
"bool x86_64" argument which can be passed to insn_init() and used to
choose between good_insns_64/good_insns_32.

Also kill UPROBE_FIX_NONE, it has no users.

Note: the current code doesn't use ifdef's consistently, good_insns_64
depends on CONFIG_X86_64 but good_insns_32 is unconditional. This patch
removes ifdef around good_insns_64, we will add it back later along with
the similar one for good_insns_32.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30 19:10:33 +02:00
Denys Vlasenko
250bbd12c2 uprobes/x86: Refuse to attach uprobe to "word-sized" branch insns
All branch insns on x86 can be prefixed with the operand-size
override prefix, 0x66. It was only ever useful for performing
jumps to 32-bit offsets in 16-bit code segments.

In 32-bit code, such instructions are useless since
they cause IP truncation to 16 bits, and in case of call insns,
they save only 16 bits of return address and misalign
the stack pointer as a "bonus".

In 64-bit code, such instructions are treated differently by Intel
and AMD CPUs: Intel ignores the prefix altogether,
AMD treats them the same as in 32-bit mode.

Before this patch, the emulation code would execute
the instructions as if they have no 0x66 prefix.

With this patch, we refuse to attach uprobes to such insns.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30 19:10:33 +02:00
Ingo Molnar
42ebd27bcb Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-25 10:04:22 +02:00
Stephane Eranian
9f7ff8931e perf/x86: Fix RAPL rdmsrl_safe() usage
This patch fixes a bug introduced by:

  2422365780 ("perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU")

The rdmsrl_safe() function returns 0 on success.
The current code was failing to detect the RAPL PMU
on real hardware  (missing /sys/devices/power) because
the return value of rdmsrl_safe() was misinterpreted.

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Venkatesh Srinivas <venkateshs@google.com>
Cc: peterz@infradead.org
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20140423170418.GA12767@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24 08:12:41 +02:00
Linus Torvalds
39bfe90706 Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vdso fix from Peter Anvin:
 "This is a single build fix for building with gold as opposed to GNU
  ld.  It got queued up separately and was expected to be pushed during
  the merge window, but it got left behind"

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, vdso: Make the vdso linker script compatible with Gold
2014-04-22 09:09:06 -07:00
Linus Torvalds
6d4596905b Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
 "This fixes the preemption-count imbalance crash reported by Owen
  Kibel"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Fix CMCI preemption bugs
2014-04-19 10:41:43 -07:00
Linus Torvalds
8de3f7a705 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two kernel side fixes:

   - an Intel uncore PMU driver potential crash fix
   - a kprobes/perf-call-graph interaction fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
  kprobes/x86: Fix page-fault handling logic
2014-04-19 10:40:11 -07:00
Yan, Zheng
4a3dc121d3 perf/x86: Export perf_assign_events()
export perf_assign_events to allow building perf Intel uncore driver
as module

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1395133004-23205-3-git-send-email-zheng.z.yan@intel.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 12:54:46 +02:00
Ingo Molnar
1111b680d3 Merge branch 'perf/urgent' into perf/core, to pick up PMU driver fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 12:14:55 +02:00
Venkatesh Srinivas
2422365780 perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
CPUs which should support the RAPL counters according to
Family/Model/Stepping may still issue #GP when attempting to access
the RAPL MSRs. This may happen when Linux is running under KVM and
we are passing-through host F/M/S data, for example. Use rdmsrl_safe
to first access the RAPL_POWER_UNIT MSR; if this fails, do not
attempt to use this PMU.

Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com
Cc: zheng.z.yan@intel.com
Cc: eranian@google.com
Cc: ak@linux.intel.com
Cc: linux-kernel@vger.kernel.org
[ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 12:14:26 +02:00
Oleg Nesterov
6cc5e7ff2c uprobes/x86: Emulate relative conditional "near" jmp's
Change branch_setup_xol_ops() to simply use opc1 = OPCODE2(insn) - 0x10
if OPCODE1() == 0x0f; this matches the "short" jmp which checks the same
condition.

Thanks to lib/insn.c, it does the rest correctly. branch->ilen/offs are
correct no matter if this jmp is "near" or "short".

Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:25 +02:00
Oleg Nesterov
8f95505bc1 uprobes/x86: Emulate relative conditional "short" jmp's
Teach branch_emulate_op() to emulate the conditional "short" jmp's which
check regs->flags.

Note: this doesn't support jcxz/jcexz, loope/loopz, and loopne/loopnz.
They all are rel8 and thus they can't trigger the problem, but perhaps
we will add the support in future just for completeness.

Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:23 +02:00
Oleg Nesterov
8e89c0be17 uprobes/x86: Emulate relative call's
See the previous "Emulate unconditional relative jmp's" which explains
why we can not execute "jmp" out-of-line, the same applies to "call".

Emulating of rip-relative call is trivial, we only need to additionally
push the ret-address. If this fails, we execute this instruction out of
line and this should trigger the trap, the probed application should die
or the same insn will be restarted if a signal handler expands the stack.
We do not even need ->post_xol() for this case.

But there is a corner (and almost theoretical) case: another thread can
expand the stack right before we execute this insn out of line. In this
case it hit the same problem we are trying to solve. So we simply turn
the probed insn into "call 1f; 1:" and add ->post_xol() which restores
->sp and restarts.

Many thanks to Jonathan who finally found the standalone reproducer,
otherwise I would never resolve the "random SIGSEGV's under systemtap"
bug-report. Now that the problem is clear we can write the simplified
test-case:

	void probe_func(void), callee(void);

	int failed = 1;

	asm (
		".text\n"
		".align 4096\n"
		".globl probe_func\n"
		"probe_func:\n"
		"call callee\n"
		"ret"
	);

	/*
	 * This assumes that:
	 *
	 *	- &probe_func = 0x401000 + a_bit, aligned = 0x402000
	 *
	 *	- xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
	 *	  as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
	 *
	 * so we can target the non-canonical address from xol_vma using
	 * the simple math below, 100 * 4096 is just the random offset
	 */
	asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1  + 100 * 4096\n");

	void callee(void)
	{
		failed = 0;
	}

	int main(void)
	{
		probe_func();
		return failed;
	}

It SIGSEGV's if you probe "probe_func" (although this is not very reliable,
randomize_va_space/etc can change the placement of xol area).

Note: as Denys Vlasenko pointed out, amd and intel treat "callw" (0x66 0xe8)
differently. This patch relies on lib/insn.c and thus implements the intel's
behaviour: 0x66 is simply ignored. Fortunately nothing sane should ever use
this insn, so we postpone the fix until we decide what should we do; emulate
or not, support or not, etc.

Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:23 +02:00
Oleg Nesterov
d241006354 uprobes/x86: Emulate nop's using ops->emulate()
Finally we can kill the ugly (and very limited) code in __skip_sstep().
Just change branch_setup_xol_ops() to treat "nop" as jmp to the next insn.

Thanks to lib/insn.c, it is clever enough. OPCODE1() == 0x90 includes
"(rep;)+ nop;" at least, and (afaics) much more.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:22 +02:00
Oleg Nesterov
7ba6db2d68 uprobes/x86: Emulate unconditional relative jmp's
Currently we always execute all insns out-of-line, including relative
jmp's and call's. This assumes that even if regs->ip points to nowhere
after the single-step, default_post_xol_op(UPROBE_FIX_IP) logic will
update it correctly.

However, this doesn't work if this regs->ip == xol_vaddr + insn_offset
is not canonical. In this case CPU generates #GP and general_protection()
kills the task which tries to execute this insn out-of-line.

Now that we have uprobe_xol_ops we can teach uprobes to emulate these
insns and solve the problem. This patch adds branch_xol_ops which has
a single branch_emulate_op() hook, so far it can only handle rel8/32
relative jmp's.

TODO: move ->fixup into the union along with rip_rela_target_address.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jonathan Lebon <jlebon@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:22 +02:00
Oleg Nesterov
8faaed1b9f uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and arch_uretprobe_hijack_return_addr()
1. Add the trivial sizeof_long() helper and change other callers of
   is_ia32_task() to use it.

   TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
   not necessarily mean 32bit. Fortunately syscall-like insns can't be
   probed so it actually works, but it would be better to rename and
   use is_ia32_frame().

2. As Jim pointed out "ncopied" in arch_uretprobe_hijack_return_addr()
   and adjust_ret_addr() should be named "nleft". And in fact only the
   last copy_to_user() in arch_uretprobe_hijack_return_addr() actually
   needs to inspect the non-zero error code.

TODO: adjust_ret_addr() should die. We can always calculate the value
we need to write into *regs->sp, just UPROBE_FIX_CALL should record
insn->length.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:21 +02:00
Oleg Nesterov
75f9ef0b7f uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible
SIGILL after the failed arch_uprobe_post_xol() should only be used as
a last resort, we should try to restart the probed insn if possible.

Currently only adjust_ret_addr() can fail, and this can only happen if
another thread unmapped our stack after we executed "call" out-of-line.
Most probably the application if buggy, but even in this case it can
have a handler for SIGSEGV/etc. And in theory it can be even correct
and do something non-trivial with its memory.

Of course we can't restart unconditionally, so arch_uprobe_post_xol()
does this only if ->post_xol() returns -ERESTART even if currently this
is the only possible error.

default_post_xol_op(UPROBE_FIX_CALL) can always restart, but as Jim
pointed out it should not forget to pop off the return address pushed
by this insn executed out-of-line.

Note: this is not "perfect", we do not want the extra handler_chain()
after restart, but I think this is the best solution we can realistically
do without too much uglifications.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:21 +02:00
Oleg Nesterov
014940bad8 uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails
Currently the error from arch_uprobe_post_xol() is silently ignored.
This doesn't look good and this can lead to the hard-to-debug problems.

1. Change handle_singlestep() to loudly complain and send SIGILL.

   Note: this only affects x86, ppc/arm can't fail.

2. Change arch_uprobe_post_xol() to call arch_uprobe_abort_xol() and
   avoid TF games if it is going to return an error.

   This can help to to analyze the problem, if nothing else we should
   not report ->ip = xol_slot in the core-file.

   Note: this means that handle_riprel_post_xol() can be called twice,
   but this is fine because it is idempotent.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:20 +02:00
Oleg Nesterov
e55848a4f8 uprobes/x86: Conditionalize the usage of handle_riprel_insn()
arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start,
but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic.
Move the callsite into "default" case and change the "0xff" case to
fall-through.

We are going to add the various hooks to handle the rip-relative
jmp/call instructions (and more), we need this change to enforce the
fact that the new code can not conflict with is_riprel_insn() logic
which, after this change, can only be used by default_xol_ops.

Note: arch_uprobe_abort_xol() still calls handle_riprel_post_xol()
directly. This is fine unless another _xol_ops we may add later will
need to reuse "UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX" bits in ->fixup.
In this case we can add uprobe_xol_ops->abort() hook, which (perhaps)
we will need anyway in the long term.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
2014-04-17 21:58:20 +02:00
Oleg Nesterov
8ad8e9d3fd uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops
Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops",
move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default
set of methods and change arch_uprobe_pre/post_xol() accordingly.

This way we can add the new uprobe_xol_ops's to handle the insns
which need the special processing (rip-relative jmp/call at least).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
2014-04-17 21:58:19 +02:00
Oleg Nesterov
34e7317d6a uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooks
No functional changes. Preparation to simplify the review of the next
change. Just reorder the code in arch_uprobe_pre/post_xol() functions
so that UPROBE_FIX_{RIP_*,IP,CALL} logic goes to the end.

Also change arch_uprobe_pre_xol() to use utask instead of autask, to
make the code more symmetrical with arch_uprobe_post_xol().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-17 21:58:18 +02:00
Oleg Nesterov
d20737c07a uprobes/x86: Gather "riprel" functions together
Cosmetic. Move pre_xol_rip_insn() and handle_riprel_post_xol() up to
the closely related handle_riprel_insn(). This way it is simpler to
read and understand this code, and this lessens the number of ifdef's.

While at it, update the comment in handle_riprel_post_xol() as Jim
suggested.

TODO: rename them somehow to make the naming consistent.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:17 +02:00
Oleg Nesterov
59078d4b96 uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove "mm" arg
Kill the "mm->context.ia32_compat" check in handle_riprel_insn(), if
it is true insn_rip_relative() must return false. validate_insn_bits()
passed "ia32_compat" as !x86_64 to insn_init(), and insn_rip_relative()
checks insn->x86_64.

Also, remove the no longer needed "struct mm_struct *mm" argument and
the unnecessary "return" at the end.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-17 21:58:17 +02:00
Oleg Nesterov
ddb69f276c uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn()
No functional changes, preparation.

Shift the code from prepare_fixups() to arch_uprobe_analyze_insn()
with the following modifications:

	- Do not call insn_get_opcode() again, it was already called
	  by validate_insn_bits().

	- Move "case 0xea" up. This way "case 0xff" can fall through
	  to default case.

	- change "case 0xff" to use the nested "switch (MODRM_REG)",
	  this way the code looks a bit simpler.

	- Make the comments look consistent.

While at it, kill the initialization of rip_rela_target_address and
->fixups, we can rely on kzalloc(). We will add the new members into
arch_uprobe, it would be better to assume that everything is zero by
default.

TODO: cleanup/fix the mess in validate_insn_bits() paths:

	- validate_insn_64bits() and validate_insn_32bits() should be
	  unified.

	- "ifdef" is not used consistently; if good_insns_64 depends
	  on CONFIG_X86_64, then probably good_insns_32 should depend
	  on CONFIG_X86_32/EMULATION

	- the usage of mm->context.ia32_compat looks wrong if the task
	  is TIF_X32.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-17 21:58:16 +02:00
Linus Torvalds
88764e0a3e Xen regression and bug fixes for 3.15-rc1.
- Fix completely broken 32-bit PV guests caused by x86 refactoring
   32-bit thread_info.
 - Only enable ticketlock slow path on Xen (not bare metal).
 - Fix two bugs with PV guests not shutting down when requested.
 - Fix a minor memory leak in xen-pciback error path.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQEcBAABAgAGBQJTT/hQAAoJEFxbo/MsZsTR6sMIAJs7mJXSqDQn3Z8O+TemRa53
 p92ZomTNYALjUMglXcxJ2Zua6IsZMWdu7jcV1GoXC70V4YLmUs8KaBgZmI5ayUQy
 bBpK+6WIAJyBkJdNH5fK3wggJ2UZjw0/twPNgd9gACwjUiYhx8iHN/hTGvu4qPBJ
 MGAIlg6wdnGwRydi72uk9Am/xpebEdQy4DRD20vjwA/qUkT4uHVv/AA4hc4AK29w
 ToK8qFSisgAlahcmq8/T4+OBFEKz78b9dQcdsGWyAk0ofWILfwD1l53xhzUin25s
 JUVevWhhLCKRZBOq4Ykc5qyqnLff4m56rm/THQ6f0oRdJn/OR+SWOImda2Qqmvs=
 =Gxpq
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen fixes from David Vrabel:
 "Xen regression and bug fixes for 3.15-rc1:

   - fix completely broken 32-bit PV guests caused by x86 refactoring
     32-bit thread_info.
   - only enable ticketlock slow path on Xen (not bare metal)
   - fix two bugs with PV guests not shutting down when requested
   - fix a minor memory leak in xen-pciback error path"

* tag 'stable/for-linus-3.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/manage: Poweroff forcefully if user-space is not yet up.
  xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
  xen/spinlock: Don't enable them unconditionally.
  xen-pciback: silence an unwanted debug printk
  xen: fix memory leak in __xen_pcibk_add_pci_dev()
  x86/xen: Fix 32-bit PV guests's usage of kernel_stack
2014-04-17 10:54:07 -07:00
Masami Hiramatsu
6381c24cd6 kprobes/x86: Fix page-fault handling logic
Current kprobes in-kernel page fault handler doesn't
expect that its single-stepping can be interrupted by
an NMI handler which may cause a page fault(e.g. perf
with callback tracing).

In that case, the page-fault handled by kprobes and it
misunderstands the page-fault has been caused by the
single-stepping code and tries to recover IP address
to probed address.

But the truth is the page-fault has been caused by the
NMI handler, and do_page_fault failes to handle real
page fault because the IP address is modified and
causes Kernel BUGs like below.

 ----
 [ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
 [ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40

To handle this correctly, I fixed the kprobes fault
handler to ensure the faulted ip address is its own
single-step buffer instead of checking current kprobe
state.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Sandeepa Prabhu <sandeepa.prabhu@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fche@redhat.com
Cc: systemtap@sourceware.org
Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-17 10:57:02 +02:00
Ingo Molnar
ea431643d6 x86/mce: Fix CMCI preemption bugs
The following commit:

  27f6c573e0 ("x86, CMCI: Add proper detection of end of CMCI storms")

Added two preemption bugs:

 - machine_check_poll() does a get_cpu_var() without a matching
   put_cpu_var(), which causes preemption imbalance and crashes upon
   bootup.

 - it does percpu ops without disabling preemption. Preemption is not
   disabled due to the mistaken use of a raw spinlock.

To fix these bugs fix the imbalance and change
cmci_discover_lock to a regular spinlock.

Reported-by: Owen Kibel <qmewlo@gmail.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Chen, Gong <gong.chen@linux.intel.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Todorov <atodorov@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
--
 arch/x86/kernel/cpu/mcheck/mce.c       |    4 +---
 arch/x86/kernel/cpu/mcheck/mce_intel.c |   18 +++++++++---------
 2 files changed, 10 insertions(+), 12 deletions(-)
2014-04-17 10:28:42 +02:00
Linus Torvalds
6ca2a88ad8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Various fixes:

   - reboot regression fix
   - build message spam fix
   - GPU quirk fix
   - 'make kvmconfig' fix

  plus the wire-up of the renameat2() system call on i386"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Remove the PCI reboot method from the default chain
  x86/build: Supress "Nothing to be done for ..." messages
  x86/gpu: Fix sign extension issue in Intel graphics stolen memory quirks
  x86/platform: Fix "make O=dir kvmconfig"
  i386: Wire up the renameat2() syscall
2014-04-16 16:40:18 -07:00
Linus Torvalds
2a83dc7e37 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Tooling fixes, plus a simple hardware-enablement patch for the Intel
  RAPL PMU (energy use measurement) on Haswell CPUs, which I hope is
  still fine at this stage"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Instead of redirecting flex output, use -o
  perf tools: Fix double free in perf test 21 (code-reading.c)
  perf stat: Initialize statistics correctly
  perf bench: Set more defaults in the 'numa' suite
  perf bench: Fix segfault at the end of an 'all' execution
  perf bench: Update manpage to mention numa and futex
  perf probe: Use dwarf_getcfi_elf() instead of dwarf_getcfi()
  perf probe: Fix to handle errors in line_range searching
  perf probe: Fix --line option behavior
  perf tools: Pick up libdw without explicit LIBDW_DIR
  MAINTAINERS: Change e-mail to kernel.org one
  perf callchains: Disable unwind libraries when libelf isn't found
  tools lib traceevent: Do not call warning() directly
  tools lib traceevent: Print event name when show warning if possible
  perf top: Fix documentation of invalid -s option
  perf/x86: Enable DRAM RAPL support on Intel Haswell
2014-04-16 16:38:57 -07:00
Ingo Molnar
5be44a6fb1 x86: Remove the PCI reboot method from the default chain
Steve reported a reboot hang and bisected it back to this commit:

  a4f1987e4c x86, reboot: Add EFI and CF9 reboot methods into the default list

He heroically tested all reboot methods and found the following:

  reboot=t       # triple fault                  ok
  reboot=k       # keyboard ctrl                 FAIL
  reboot=b       # BIOS                          ok
  reboot=a       # ACPI                          FAIL
  reboot=e       # EFI                           FAIL   [system has no EFI]
  reboot=p       # PCI 0xcf9                     FAIL

And I think it's pretty obvious that we should only try PCI 0xcf9 as a
last resort - if at all.

The other observation is that (on this box) we should never try
the PCI reboot method, but close with either the 'triple fault'
or the 'BIOS' (terminal!) reboot methods.

Thirdly, CF9_COND is a total misnomer - it should be something like
CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...

So this patch fixes the worst problems:

 - it orders the actual reboot logic to follow the reboot ordering
   pattern - it was in a pretty random order before for no good
   reason.

 - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
   BOOT_CF9_SAFE flags to make the code more obvious.

 - it tries the BIOS reboot method before the PCI reboot method.
   (Since 'BIOS' is a terminal reboot method resulting in a hang
    if it does not work, this is essentially equivalent to removing
    the PCI reboot method from the default reboot chain.)

 - just for the miraculous possibility of terminal (resulting
   in hang) reboot methods of triple fault or BIOS returning
   without having done their job, there's an ordering between
   them as well.

Reported-and-bisected-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Aubrey <aubrey.li@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-16 08:56:09 +02:00
Konrad Rzeszutek Wilk
e0fc17a936 xen/spinlock: Don't enable them unconditionally.
The git commit a945928ea2
('xen: Do not enable spinlocks before jump_label_init() has executed')
was added to deal with the jump machinery. Earlier the code
that turned on the jump label was only called by Xen specific
functions. But now that it had been moved to the initcall machinery
it gets called on Xen, KVM, and baremetal - ouch!. And the detection
machinery to only call it on Xen wasn't remembered in the heat
of merge window excitement.

This means that the slowpath is enabled on baremetal while it should
not be.

Reported-by: Waiman Long <waiman.long@hp.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
CC: stable@vger.kernel.org
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-04-15 17:41:28 +01:00
Boris Ostrovsky
4461bbc05b x86/xen: Fix 32-bit PV guests's usage of kernel_stack
Commit 198d208df4 ("x86: Keep
thread_info on thread stack in x86_32") made 32-bit kernels use
kernel_stack to point to thread_info. That change missed a couple of
updates needed by Xen's 32-bit PV guests:

1. kernel_stack needs to be initialized for secondary CPUs

2. GET_THREAD_INFO() now uses %fs register which may not be the
   kernel's version when executing xen_iret().

With respect to the second issue, we don't need GET_THREAD_INFO()
anymore: we used it as an intermediate step to get to per_cpu xen_vcpu
and avoid referencing %fs. Now that we are going to use %fs anyway we
may as well go directly to xen_vcpu.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-04-15 15:00:14 +01:00
Linus Torvalds
55101e2d6c Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Marcelo Tosatti:
 - Fix for guest triggerable BUG_ON (CVE-2014-0155)
 - CR4.SMAP support
 - Spurious WARN_ON() fix

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: remove WARN_ON from get_kernel_ns()
  KVM: Rename variable smep to cr4_smep
  KVM: expose SMAP feature to guest
  KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
  KVM: Add SMAP support when setting CR4
  KVM: Remove SMAP bit from CR4_RESERVED_BITS
  KVM: ioapic: try to recover if pending_eoi goes out of range
  KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)
2014-04-14 16:21:28 -07:00
Marcelo Tosatti
b351c39cc9 KVM: x86: remove WARN_ON from get_kernel_ns()
Function and callers can be preempted.

https://bugzilla.kernel.org/show_bug.cgi?id=73721

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-14 17:50:43 -03:00
Feng Wu
66386ade2a KVM: Rename variable smep to cr4_smep
Rename variable smep to cr4_smep, which can better reflect the
meaning of the variable.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:40 -03:00
Feng Wu
de935ae15b KVM: expose SMAP feature to guest
This patch exposes SMAP feature to guest

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:37 -03:00
Feng Wu
e1e746b3c5 KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
SMAP is disabled if CPU is in non-paging mode in hardware.
However KVM always uses paging mode to emulate guest non-paging
mode with TDP. To emulate this behavior, SMAP needs to be
manually disabled when guest switches to non-paging mode.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:35 -03:00
Feng Wu
97ec8c067d KVM: Add SMAP support when setting CR4
This patch adds SMAP handling logic when setting CR4 for guests

Thanks a lot to Paolo Bonzini for his suggestion to use the branchless
way to detect SMAP violation.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:34 -03:00
Feng Wu
56d6efc2de KVM: Remove SMAP bit from CR4_RESERVED_BITS
This patch removes SMAP bit from CR4_RESERVED_BITS.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:33 -03:00