Commit Graph

19102 Commits

Author SHA1 Message Date
Stephane Eranian
722e76e60f fix Haswell precise store data source encoding
This patch fixes a bug in  precise_store_data_hsw() whereby
it would set the data source memory level to the wrong value.

As per the the SDM Vol 3b Table 18-41 (Layout of Data Linear
Address Information in PEBS Record), when status bit 0 is set
this is a L1 hit, otherwise this is a L1 miss.

This patch encodes the memory level according to the specification.

In V2, we added the filtering on the store events.
Only the following events produce L1 information:
 * MEM_UOPS_RETIRED.STLB_MISS_STORES
 * MEM_UOPS_RETIRED.LOCK_STORES
 * MEM_UOPS_RETIRED.SPLIT_STORES
 * MEM_UOPS_RETIRED.ALL_STORES

Cc: mingo@elte.hu
Cc: acme@ghostprotocols.net
Cc: jolsa@redhat.com
Cc: jmario@redhat.com
Cc: ak@linux.intel.com
Tested-and-Reviewed-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140515155644.GA3884@quad
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-19 21:52:59 +09:00
Ingo Molnar
37b16beaa9 Merge branch 'perf/urgent' into perf/core, to avoid conflicts
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-07 13:39:22 +02:00
Yan, Zheng
a4b4f11b27 perf/x86/intel: Fix Silvermont's event constraints
Event 0x013c is not the same as fixed counter2, remove it from
Silvermont's event constraints.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1398755081-12471-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-07 11:33:16 +02:00
Linus Torvalds
0384dcae2b Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "This udpate delivers:

   - A fix for dynamic interrupt allocation on x86 which is required to
     exclude the GSI interrupts from the dynamic allocatable range.

     This was detected with the newfangled tablet SoCs which have GPIOs
     and therefor allocate a range of interrupts.  The MSI allocations
     already excluded the GSI range, so we never noticed before.

   - The last missing set_irq_affinity() repair, which was delayed due
     to testing issues

   - A few bug fixes for the armada SoC interrupt controller

   - A memory allocation fix for the TI crossbar interrupt controller

   - A trivial kernel-doc warning fix"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: irq-crossbar: Not allocating enough memory
  irqchip: armanda: Sanitize set_irq_affinity()
  genirq: x86: Ensure that dynamic irq allocation does not conflict
  linux/interrupt.h: fix new kernel-doc warnings
  irqchip: armada-370-xp: Fix releasing of MSIs
  irqchip: armada-370-xp: implement the ->check_device() msi_chip operation
  irqchip: armada-370-xp: fix invalid cast of signed value into unsigned variable
2014-05-03 08:32:48 -07:00
Linus Torvalds
0845e11c2a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "Two very small changes: one fix for the vSMP Foundation platform, and
  one to help LLVM not choke on options it doesn't understand (although
  it probably should)"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vsmp: Fix irq routing
  x86: LLVMLinux: Wrap -mno-80387 with cc-option
2014-05-02 14:04:52 -07:00
Linus Torvalds
e7e6d2a4a1 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
 - Fix for a Haswell regression in nested virtualization, introduced
   during the merge window.
 - A fix from Oleg to async page faults.
 - A bunch of small ARM changes.
 - A trivial patch to use the new MSI-X API introduced during the merge
   window.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: ARM: vgic: Fix the overlap check action about setting the GICD & GICC base address.
  KVM: arm/arm64: vgic: fix GICD_ICFGR register accesses
  KVM: async_pf: mm->mm_users can not pin apf->mm
  KVM: ARM: vgic: Fix sgi dispatch problem
  MAINTAINERS: co-maintainance of KVM/{arm,arm64}
  arm: KVM: fix possible misalignment of PGDs and bounce page
  KVM: x86: Check for host supported fields in shadow vmcs
  kvm: Use pci_enable_msix_exact() instead of pci_enable_msix()
  ARM: KVM: disable KVM in Kconfig on big-endian systems
2014-05-02 09:26:09 -07:00
Thomas Gleixner
62a08ae2a5 genirq: x86: Ensure that dynamic irq allocation does not conflict
On x86 the allocation of irq descriptors may allocate interrupts which
are in the range of the GSI interrupts. That's wrong as those
interrupts are hardwired and we don't have the irq domain translation
like PPC. So one of these interrupts can be hooked up later to one of
the devices which are hard wired to it and the io_apic init code for
that particular interrupt line happily reuses that descriptor with a
completely different configuration so hell breaks lose.

Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs,
except for a few usage sites which have not yet blown up in our face
for whatever reason. But for drivers which need an irq range, like the
GPIO drivers, we have no limit in place and we don't want to expose
such a detail to a driver.

To cure this introduce a function which an architecture can implement
to impose a lower bound on the dynamic interrupt allocations.

Implement it for x86 and set the lower bound to nr_gsi_irqs, which is
the end of the hardwired interrupt space, so all dynamic allocations
happen above.

That not only allows the GPIO driver to work sanely, it also protects
the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and
htirq code. They need to be cleaned up as well, but that's a separate
issue.

Reported-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Krogerus Heikki <heikki.krogerus@intel.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-28 12:20:00 +02:00
Bandan Das
fe2b201b3b KVM: x86: Check for host supported fields in shadow vmcs
We track shadow vmcs fields through two static lists,
one for read only and another for r/w fields. However, with
addition of new vmcs fields, not all fields may be supported on
all hosts. If so, copy_vmcs12_to_shadow() trying to vmwrite on
unsupported hosts will result in a vmwrite error. For example, commit
36be0b9deb introduced GUEST_BNDCFGS, which is not supported
by all processors. Filter out host unsupported fields before
letting guests use shadow vmcs

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-28 11:14:51 +02:00
Oren Twaig
39025ba382 x86/vsmp: Fix irq routing
Correct IRQ routing in case a vSMP box is detected
but the  Interrupt Routing Comply (IRC) value is set to
"comply", which leads to incorrect IRQ routing.

Before the patch:

When a vSMP box was detected and IRC was set to "comply",
users (and the kernel) couldn't effectively set the
destination of the IRQs. This is because the hook inside
vsmp_64.c always setup all CPUs as the IRQ destination using
cpumask_setall() as the return value for IRQ allocation mask.
Later, this "overrided" mask caused the kernel to set the IRQ
destination to the lowest online CPU in the mask (CPU0 usually).

After the patch:

When the IRC is set to "comply", users (and the kernel) can control
the destination of the IRQs as we will not be changing the
default "apic->vector_allocation_domain".

Signed-off-by: Oren Twaig <oren@scalemp.com>
Acked-by: Shai Fultheim <shai@scalemp.com>
Link: http://lkml.kernel.org/r/1398669697-2123-1-git-send-email-oren@scalemp.com
[ Minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-28 09:27:34 +02:00
Ingo Molnar
42ebd27bcb Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-25 10:04:22 +02:00
Stephane Eranian
9f7ff8931e perf/x86: Fix RAPL rdmsrl_safe() usage
This patch fixes a bug introduced by:

  2422365780 ("perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU")

The rdmsrl_safe() function returns 0 on success.
The current code was failing to detect the RAPL PMU
on real hardware  (missing /sys/devices/power) because
the return value of rdmsrl_safe() was misinterpreted.

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Venkatesh Srinivas <venkateshs@google.com>
Cc: peterz@infradead.org
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20140423170418.GA12767@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24 08:12:41 +02:00
Linus Torvalds
39bfe90706 Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vdso fix from Peter Anvin:
 "This is a single build fix for building with gold as opposed to GNU
  ld.  It got queued up separately and was expected to be pushed during
  the merge window, but it got left behind"

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, vdso: Make the vdso linker script compatible with Gold
2014-04-22 09:09:06 -07:00
Behan Webster
8f2dd677be x86: LLVMLinux: Wrap -mno-80387 with cc-option
Wrap -mno-80387 gcc options with cc-option so they don't break
clang.

Signed-off-by: Behan Webster <behanw@converseincode.com>
Cc: torvalds@linux-foundation.org
Cc: dwmw2@infradead.org
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/1398145227-25053-1-git-send-email-behanw@converseincode.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-22 11:41:16 +02:00
Linus Torvalds
6d4596905b Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
 "This fixes the preemption-count imbalance crash reported by Owen
  Kibel"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Fix CMCI preemption bugs
2014-04-19 10:41:43 -07:00
Linus Torvalds
8de3f7a705 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two kernel side fixes:

   - an Intel uncore PMU driver potential crash fix
   - a kprobes/perf-call-graph interaction fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
  kprobes/x86: Fix page-fault handling logic
2014-04-19 10:40:11 -07:00
Yan, Zheng
4a3dc121d3 perf/x86: Export perf_assign_events()
export perf_assign_events to allow building perf Intel uncore driver
as module

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1395133004-23205-3-git-send-email-zheng.z.yan@intel.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 12:54:46 +02:00
Ingo Molnar
1111b680d3 Merge branch 'perf/urgent' into perf/core, to pick up PMU driver fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 12:14:55 +02:00
Venkatesh Srinivas
2422365780 perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
CPUs which should support the RAPL counters according to
Family/Model/Stepping may still issue #GP when attempting to access
the RAPL MSRs. This may happen when Linux is running under KVM and
we are passing-through host F/M/S data, for example. Use rdmsrl_safe
to first access the RAPL_POWER_UNIT MSR; if this fails, do not
attempt to use this PMU.

Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com
Cc: zheng.z.yan@intel.com
Cc: eranian@google.com
Cc: ak@linux.intel.com
Cc: linux-kernel@vger.kernel.org
[ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 12:14:26 +02:00
Oleg Nesterov
6cc5e7ff2c uprobes/x86: Emulate relative conditional "near" jmp's
Change branch_setup_xol_ops() to simply use opc1 = OPCODE2(insn) - 0x10
if OPCODE1() == 0x0f; this matches the "short" jmp which checks the same
condition.

Thanks to lib/insn.c, it does the rest correctly. branch->ilen/offs are
correct no matter if this jmp is "near" or "short".

Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:25 +02:00
Oleg Nesterov
8f95505bc1 uprobes/x86: Emulate relative conditional "short" jmp's
Teach branch_emulate_op() to emulate the conditional "short" jmp's which
check regs->flags.

Note: this doesn't support jcxz/jcexz, loope/loopz, and loopne/loopnz.
They all are rel8 and thus they can't trigger the problem, but perhaps
we will add the support in future just for completeness.

Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:23 +02:00
Oleg Nesterov
8e89c0be17 uprobes/x86: Emulate relative call's
See the previous "Emulate unconditional relative jmp's" which explains
why we can not execute "jmp" out-of-line, the same applies to "call".

Emulating of rip-relative call is trivial, we only need to additionally
push the ret-address. If this fails, we execute this instruction out of
line and this should trigger the trap, the probed application should die
or the same insn will be restarted if a signal handler expands the stack.
We do not even need ->post_xol() for this case.

But there is a corner (and almost theoretical) case: another thread can
expand the stack right before we execute this insn out of line. In this
case it hit the same problem we are trying to solve. So we simply turn
the probed insn into "call 1f; 1:" and add ->post_xol() which restores
->sp and restarts.

Many thanks to Jonathan who finally found the standalone reproducer,
otherwise I would never resolve the "random SIGSEGV's under systemtap"
bug-report. Now that the problem is clear we can write the simplified
test-case:

	void probe_func(void), callee(void);

	int failed = 1;

	asm (
		".text\n"
		".align 4096\n"
		".globl probe_func\n"
		"probe_func:\n"
		"call callee\n"
		"ret"
	);

	/*
	 * This assumes that:
	 *
	 *	- &probe_func = 0x401000 + a_bit, aligned = 0x402000
	 *
	 *	- xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
	 *	  as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
	 *
	 * so we can target the non-canonical address from xol_vma using
	 * the simple math below, 100 * 4096 is just the random offset
	 */
	asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1  + 100 * 4096\n");

	void callee(void)
	{
		failed = 0;
	}

	int main(void)
	{
		probe_func();
		return failed;
	}

It SIGSEGV's if you probe "probe_func" (although this is not very reliable,
randomize_va_space/etc can change the placement of xol area).

Note: as Denys Vlasenko pointed out, amd and intel treat "callw" (0x66 0xe8)
differently. This patch relies on lib/insn.c and thus implements the intel's
behaviour: 0x66 is simply ignored. Fortunately nothing sane should ever use
this insn, so we postpone the fix until we decide what should we do; emulate
or not, support or not, etc.

Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:23 +02:00
Oleg Nesterov
d241006354 uprobes/x86: Emulate nop's using ops->emulate()
Finally we can kill the ugly (and very limited) code in __skip_sstep().
Just change branch_setup_xol_ops() to treat "nop" as jmp to the next insn.

Thanks to lib/insn.c, it is clever enough. OPCODE1() == 0x90 includes
"(rep;)+ nop;" at least, and (afaics) much more.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:22 +02:00
Oleg Nesterov
7ba6db2d68 uprobes/x86: Emulate unconditional relative jmp's
Currently we always execute all insns out-of-line, including relative
jmp's and call's. This assumes that even if regs->ip points to nowhere
after the single-step, default_post_xol_op(UPROBE_FIX_IP) logic will
update it correctly.

However, this doesn't work if this regs->ip == xol_vaddr + insn_offset
is not canonical. In this case CPU generates #GP and general_protection()
kills the task which tries to execute this insn out-of-line.

Now that we have uprobe_xol_ops we can teach uprobes to emulate these
insns and solve the problem. This patch adds branch_xol_ops which has
a single branch_emulate_op() hook, so far it can only handle rel8/32
relative jmp's.

TODO: move ->fixup into the union along with rip_rela_target_address.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jonathan Lebon <jlebon@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:22 +02:00
Oleg Nesterov
8faaed1b9f uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and arch_uretprobe_hijack_return_addr()
1. Add the trivial sizeof_long() helper and change other callers of
   is_ia32_task() to use it.

   TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
   not necessarily mean 32bit. Fortunately syscall-like insns can't be
   probed so it actually works, but it would be better to rename and
   use is_ia32_frame().

2. As Jim pointed out "ncopied" in arch_uretprobe_hijack_return_addr()
   and adjust_ret_addr() should be named "nleft". And in fact only the
   last copy_to_user() in arch_uretprobe_hijack_return_addr() actually
   needs to inspect the non-zero error code.

TODO: adjust_ret_addr() should die. We can always calculate the value
we need to write into *regs->sp, just UPROBE_FIX_CALL should record
insn->length.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:21 +02:00
Oleg Nesterov
75f9ef0b7f uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible
SIGILL after the failed arch_uprobe_post_xol() should only be used as
a last resort, we should try to restart the probed insn if possible.

Currently only adjust_ret_addr() can fail, and this can only happen if
another thread unmapped our stack after we executed "call" out-of-line.
Most probably the application if buggy, but even in this case it can
have a handler for SIGSEGV/etc. And in theory it can be even correct
and do something non-trivial with its memory.

Of course we can't restart unconditionally, so arch_uprobe_post_xol()
does this only if ->post_xol() returns -ERESTART even if currently this
is the only possible error.

default_post_xol_op(UPROBE_FIX_CALL) can always restart, but as Jim
pointed out it should not forget to pop off the return address pushed
by this insn executed out-of-line.

Note: this is not "perfect", we do not want the extra handler_chain()
after restart, but I think this is the best solution we can realistically
do without too much uglifications.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:21 +02:00
Oleg Nesterov
014940bad8 uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails
Currently the error from arch_uprobe_post_xol() is silently ignored.
This doesn't look good and this can lead to the hard-to-debug problems.

1. Change handle_singlestep() to loudly complain and send SIGILL.

   Note: this only affects x86, ppc/arm can't fail.

2. Change arch_uprobe_post_xol() to call arch_uprobe_abort_xol() and
   avoid TF games if it is going to return an error.

   This can help to to analyze the problem, if nothing else we should
   not report ->ip = xol_slot in the core-file.

   Note: this means that handle_riprel_post_xol() can be called twice,
   but this is fine because it is idempotent.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:20 +02:00
Oleg Nesterov
e55848a4f8 uprobes/x86: Conditionalize the usage of handle_riprel_insn()
arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start,
but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic.
Move the callsite into "default" case and change the "0xff" case to
fall-through.

We are going to add the various hooks to handle the rip-relative
jmp/call instructions (and more), we need this change to enforce the
fact that the new code can not conflict with is_riprel_insn() logic
which, after this change, can only be used by default_xol_ops.

Note: arch_uprobe_abort_xol() still calls handle_riprel_post_xol()
directly. This is fine unless another _xol_ops we may add later will
need to reuse "UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX" bits in ->fixup.
In this case we can add uprobe_xol_ops->abort() hook, which (perhaps)
we will need anyway in the long term.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
2014-04-17 21:58:20 +02:00
Oleg Nesterov
8ad8e9d3fd uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops
Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops",
move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default
set of methods and change arch_uprobe_pre/post_xol() accordingly.

This way we can add the new uprobe_xol_ops's to handle the insns
which need the special processing (rip-relative jmp/call at least).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
2014-04-17 21:58:19 +02:00
Oleg Nesterov
34e7317d6a uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooks
No functional changes. Preparation to simplify the review of the next
change. Just reorder the code in arch_uprobe_pre/post_xol() functions
so that UPROBE_FIX_{RIP_*,IP,CALL} logic goes to the end.

Also change arch_uprobe_pre_xol() to use utask instead of autask, to
make the code more symmetrical with arch_uprobe_post_xol().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-17 21:58:18 +02:00
Oleg Nesterov
d20737c07a uprobes/x86: Gather "riprel" functions together
Cosmetic. Move pre_xol_rip_insn() and handle_riprel_post_xol() up to
the closely related handle_riprel_insn(). This way it is simpler to
read and understand this code, and this lessens the number of ifdef's.

While at it, update the comment in handle_riprel_post_xol() as Jim
suggested.

TODO: rename them somehow to make the naming consistent.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:17 +02:00
Oleg Nesterov
59078d4b96 uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove "mm" arg
Kill the "mm->context.ia32_compat" check in handle_riprel_insn(), if
it is true insn_rip_relative() must return false. validate_insn_bits()
passed "ia32_compat" as !x86_64 to insn_init(), and insn_rip_relative()
checks insn->x86_64.

Also, remove the no longer needed "struct mm_struct *mm" argument and
the unnecessary "return" at the end.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-17 21:58:17 +02:00
Oleg Nesterov
ddb69f276c uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn()
No functional changes, preparation.

Shift the code from prepare_fixups() to arch_uprobe_analyze_insn()
with the following modifications:

	- Do not call insn_get_opcode() again, it was already called
	  by validate_insn_bits().

	- Move "case 0xea" up. This way "case 0xff" can fall through
	  to default case.

	- change "case 0xff" to use the nested "switch (MODRM_REG)",
	  this way the code looks a bit simpler.

	- Make the comments look consistent.

While at it, kill the initialization of rip_rela_target_address and
->fixups, we can rely on kzalloc(). We will add the new members into
arch_uprobe, it would be better to assume that everything is zero by
default.

TODO: cleanup/fix the mess in validate_insn_bits() paths:

	- validate_insn_64bits() and validate_insn_32bits() should be
	  unified.

	- "ifdef" is not used consistently; if good_insns_64 depends
	  on CONFIG_X86_64, then probably good_insns_32 should depend
	  on CONFIG_X86_32/EMULATION

	- the usage of mm->context.ia32_compat looks wrong if the task
	  is TIF_X32.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-17 21:58:16 +02:00
Linus Torvalds
88764e0a3e Merge tag 'stable/for-linus-3.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen fixes from David Vrabel:
 "Xen regression and bug fixes for 3.15-rc1:

   - fix completely broken 32-bit PV guests caused by x86 refactoring
     32-bit thread_info.
   - only enable ticketlock slow path on Xen (not bare metal)
   - fix two bugs with PV guests not shutting down when requested
   - fix a minor memory leak in xen-pciback error path"

* tag 'stable/for-linus-3.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/manage: Poweroff forcefully if user-space is not yet up.
  xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
  xen/spinlock: Don't enable them unconditionally.
  xen-pciback: silence an unwanted debug printk
  xen: fix memory leak in __xen_pcibk_add_pci_dev()
  x86/xen: Fix 32-bit PV guests's usage of kernel_stack
2014-04-17 10:54:07 -07:00
Masami Hiramatsu
6381c24cd6 kprobes/x86: Fix page-fault handling logic
Current kprobes in-kernel page fault handler doesn't
expect that its single-stepping can be interrupted by
an NMI handler which may cause a page fault(e.g. perf
with callback tracing).

In that case, the page-fault handled by kprobes and it
misunderstands the page-fault has been caused by the
single-stepping code and tries to recover IP address
to probed address.

But the truth is the page-fault has been caused by the
NMI handler, and do_page_fault failes to handle real
page fault because the IP address is modified and
causes Kernel BUGs like below.

 ----
 [ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
 [ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40

To handle this correctly, I fixed the kprobes fault
handler to ensure the faulted ip address is its own
single-step buffer instead of checking current kprobe
state.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Sandeepa Prabhu <sandeepa.prabhu@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fche@redhat.com
Cc: systemtap@sourceware.org
Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-17 10:57:02 +02:00
Ingo Molnar
ea431643d6 x86/mce: Fix CMCI preemption bugs
The following commit:

  27f6c573e0 ("x86, CMCI: Add proper detection of end of CMCI storms")

Added two preemption bugs:

 - machine_check_poll() does a get_cpu_var() without a matching
   put_cpu_var(), which causes preemption imbalance and crashes upon
   bootup.

 - it does percpu ops without disabling preemption. Preemption is not
   disabled due to the mistaken use of a raw spinlock.

To fix these bugs fix the imbalance and change
cmci_discover_lock to a regular spinlock.

Reported-by: Owen Kibel <qmewlo@gmail.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Chen, Gong <gong.chen@linux.intel.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Todorov <atodorov@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
--
 arch/x86/kernel/cpu/mcheck/mce.c       |    4 +---
 arch/x86/kernel/cpu/mcheck/mce_intel.c |   18 +++++++++---------
 2 files changed, 10 insertions(+), 12 deletions(-)
2014-04-17 10:28:42 +02:00
Linus Torvalds
6ca2a88ad8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Various fixes:

   - reboot regression fix
   - build message spam fix
   - GPU quirk fix
   - 'make kvmconfig' fix

  plus the wire-up of the renameat2() system call on i386"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Remove the PCI reboot method from the default chain
  x86/build: Supress "Nothing to be done for ..." messages
  x86/gpu: Fix sign extension issue in Intel graphics stolen memory quirks
  x86/platform: Fix "make O=dir kvmconfig"
  i386: Wire up the renameat2() syscall
2014-04-16 16:40:18 -07:00
Linus Torvalds
2a83dc7e37 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Tooling fixes, plus a simple hardware-enablement patch for the Intel
  RAPL PMU (energy use measurement) on Haswell CPUs, which I hope is
  still fine at this stage"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Instead of redirecting flex output, use -o
  perf tools: Fix double free in perf test 21 (code-reading.c)
  perf stat: Initialize statistics correctly
  perf bench: Set more defaults in the 'numa' suite
  perf bench: Fix segfault at the end of an 'all' execution
  perf bench: Update manpage to mention numa and futex
  perf probe: Use dwarf_getcfi_elf() instead of dwarf_getcfi()
  perf probe: Fix to handle errors in line_range searching
  perf probe: Fix --line option behavior
  perf tools: Pick up libdw without explicit LIBDW_DIR
  MAINTAINERS: Change e-mail to kernel.org one
  perf callchains: Disable unwind libraries when libelf isn't found
  tools lib traceevent: Do not call warning() directly
  tools lib traceevent: Print event name when show warning if possible
  perf top: Fix documentation of invalid -s option
  perf/x86: Enable DRAM RAPL support on Intel Haswell
2014-04-16 16:38:57 -07:00
Ingo Molnar
5be44a6fb1 x86: Remove the PCI reboot method from the default chain
Steve reported a reboot hang and bisected it back to this commit:

  a4f1987e4c x86, reboot: Add EFI and CF9 reboot methods into the default list

He heroically tested all reboot methods and found the following:

  reboot=t       # triple fault                  ok
  reboot=k       # keyboard ctrl                 FAIL
  reboot=b       # BIOS                          ok
  reboot=a       # ACPI                          FAIL
  reboot=e       # EFI                           FAIL   [system has no EFI]
  reboot=p       # PCI 0xcf9                     FAIL

And I think it's pretty obvious that we should only try PCI 0xcf9 as a
last resort - if at all.

The other observation is that (on this box) we should never try
the PCI reboot method, but close with either the 'triple fault'
or the 'BIOS' (terminal!) reboot methods.

Thirdly, CF9_COND is a total misnomer - it should be something like
CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...

So this patch fixes the worst problems:

 - it orders the actual reboot logic to follow the reboot ordering
   pattern - it was in a pretty random order before for no good
   reason.

 - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
   BOOT_CF9_SAFE flags to make the code more obvious.

 - it tries the BIOS reboot method before the PCI reboot method.
   (Since 'BIOS' is a terminal reboot method resulting in a hang
    if it does not work, this is essentially equivalent to removing
    the PCI reboot method from the default reboot chain.)

 - just for the miraculous possibility of terminal (resulting
   in hang) reboot methods of triple fault or BIOS returning
   without having done their job, there's an ordering between
   them as well.

Reported-and-bisected-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Aubrey <aubrey.li@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-16 08:56:09 +02:00
Konrad Rzeszutek Wilk
e0fc17a936 xen/spinlock: Don't enable them unconditionally.
The git commit a945928ea2
('xen: Do not enable spinlocks before jump_label_init() has executed')
was added to deal with the jump machinery. Earlier the code
that turned on the jump label was only called by Xen specific
functions. But now that it had been moved to the initcall machinery
it gets called on Xen, KVM, and baremetal - ouch!. And the detection
machinery to only call it on Xen wasn't remembered in the heat
of merge window excitement.

This means that the slowpath is enabled on baremetal while it should
not be.

Reported-by: Waiman Long <waiman.long@hp.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
CC: stable@vger.kernel.org
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-04-15 17:41:28 +01:00
Boris Ostrovsky
4461bbc05b x86/xen: Fix 32-bit PV guests's usage of kernel_stack
Commit 198d208df4 ("x86: Keep
thread_info on thread stack in x86_32") made 32-bit kernels use
kernel_stack to point to thread_info. That change missed a couple of
updates needed by Xen's 32-bit PV guests:

1. kernel_stack needs to be initialized for secondary CPUs

2. GET_THREAD_INFO() now uses %fs register which may not be the
   kernel's version when executing xen_iret().

With respect to the second issue, we don't need GET_THREAD_INFO()
anymore: we used it as an intermediate step to get to per_cpu xen_vcpu
and avoid referencing %fs. Now that we are going to use %fs anyway we
may as well go directly to xen_vcpu.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-04-15 15:00:14 +01:00
Linus Torvalds
55101e2d6c Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Marcelo Tosatti:
 - Fix for guest triggerable BUG_ON (CVE-2014-0155)
 - CR4.SMAP support
 - Spurious WARN_ON() fix

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: remove WARN_ON from get_kernel_ns()
  KVM: Rename variable smep to cr4_smep
  KVM: expose SMAP feature to guest
  KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
  KVM: Add SMAP support when setting CR4
  KVM: Remove SMAP bit from CR4_RESERVED_BITS
  KVM: ioapic: try to recover if pending_eoi goes out of range
  KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)
2014-04-14 16:21:28 -07:00
Marcelo Tosatti
b351c39cc9 KVM: x86: remove WARN_ON from get_kernel_ns()
Function and callers can be preempted.

https://bugzilla.kernel.org/show_bug.cgi?id=73721

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-14 17:50:43 -03:00
Feng Wu
66386ade2a KVM: Rename variable smep to cr4_smep
Rename variable smep to cr4_smep, which can better reflect the
meaning of the variable.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:40 -03:00
Feng Wu
de935ae15b KVM: expose SMAP feature to guest
This patch exposes SMAP feature to guest

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:37 -03:00
Feng Wu
e1e746b3c5 KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
SMAP is disabled if CPU is in non-paging mode in hardware.
However KVM always uses paging mode to emulate guest non-paging
mode with TDP. To emulate this behavior, SMAP needs to be
manually disabled when guest switches to non-paging mode.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:35 -03:00
Feng Wu
97ec8c067d KVM: Add SMAP support when setting CR4
This patch adds SMAP handling logic when setting CR4 for guests

Thanks a lot to Paolo Bonzini for his suggestion to use the branchless
way to detect SMAP violation.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:34 -03:00
Feng Wu
56d6efc2de KVM: Remove SMAP bit from CR4_RESERVED_BITS
This patch removes SMAP bit from CR4_RESERVED_BITS.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:33 -03:00
Ingo Molnar
740c699a8d Merge tag 'v3.15-rc1' into perf/urgent
Pick up the latest fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-14 16:44:42 +02:00
Masahiro Yamada
e6bcd1a897 x86/build: Supress "Nothing to be done for ..." messages
When we build an already built kernel again, arch/x86/syscalls/Makefile
and arch/x86/tools/Makefile emits "Nothing to be done for ..."
messages.

Here is the command log:

  $ make defconfig
     [ snip ]
  $ make
     [ snip ]
  $ make
  make[1]: Nothing to be done for `all'.            <-----
  make[1]: Nothing to be done for `relocs'.         <-----
    CHK     include/config/kernel.release
    CHK     include/generated/uapi/linux/version.h

Besides not emitting those, "all" and "relocs" should be added to PHONY as well.

Signed-off-by: Masahiro Yamada <yamada.m@jp.panasonic.com>
Acked-by: Peter Foley <pefoley2@pefoley.com>
Acked-by: Michal Marek <mmarek@suse.cz>
Link: http://lkml.kernel.org/r/1397093742-11144-1-git-send-email-yamada.m@jp.panasonic.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-14 11:44:36 +02:00
Ville Syrjälä
86e587623a x86/gpu: Fix sign extension issue in Intel graphics stolen memory quirks
Have the KB(),MB(),GB() macros produce unsigned longs to avoid
unintended sign extension issues with the gen2 memory size
detection.

What happens is first the uint8_t returned by
read_pci_config_byte() gets promoted to an int which gets
multiplied by another int from the MB() macro, and finally the
result gets sign extended to size_t.

Although this shouldn't be a problem in practice as all affected
gen2 platforms are 32bit AFAIK, so size_t will be 32 bits.

Reported-by: Bjorn Helgaas <bhelgaas@google.com>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1397382303-17525-1-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-14 08:50:56 +02:00