Commit Graph

23583 Commits

Author SHA1 Message Date
Stephane Eranian
b3e6246336 perf/x86/pebs: Add proper PEBS constraints for Broadwell
This patch adds a Broadwell specific PEBS event constraint table.

Broadwell has a fix for the HT corruption bug erratum HSD29 on
Haswell. Therefore, there is no need to mark events 0xd0, 0xd1, 0xd2,
0xd3 has requiring the exclusive mode across both sibling HT threads.
This holds true for regular counting and sampling (see core.c) and
PEBS (ds.c) which we fix in this patch.

In doing so, we relax evnt scheduling for these events, they can now
be programmed on any 4 counters without impacting what is measured on
the sibling thread.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@redhat.com
Cc: adrian.hunter@intel.com
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Cc: namhyung@kernel.org
Link: http://lkml.kernel.org/r/1457034642-21837-4-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 12:19:12 +01:00
Stephane Eranian
8077eca079 perf/x86/pebs: Add workaround for broken OVFL status on HSW+
This patch fixes an issue with the GLOBAL_OVERFLOW_STATUS bits on
Haswell, Broadwell and Skylake processors when using PEBS.

The SDM stipulates that when the PEBS iterrupt threshold is crossed,
an interrupt is posted and the kernel is interrupted. The kernel will
find GLOBAL_OVF_SATUS bit 62 set indicating there are PEBS records to
drain. But the bits corresponding to the actual counters should NOT be
set. The kernel follows the SDM and assumes that all PEBS events are
processed in the drain_pebs() callback. The kernel then checks for
remaining overflows on any other (non-PEBS) events and processes these
in the for_each_bit_set(&status) loop.

As it turns out, under certain conditions on HSW and later processors,
on PEBS buffer interrupt, bit 62 is set but the counter bits may be
set as well. In that case, the kernel drains PEBS and generates
SAMPLES with the EXACT tag, then it processes the counter bits, and
generates normal (non-EXACT) SAMPLES.

I ran into this problem by trying to understand why on HSW sampling on
a PEBS event was sometimes returning SAMPLES without the EXACT tag.
This should not happen on user level code because HSW has the
eventing_ip which always point to the instruction that caused the
event.

The workaround in this patch simply ensures that the bits for the
counters used for PEBS events are cleared after the PEBS buffer has
been drained. With this fix 100% of the PEBS samples on my user code
report the EXACT tag.

Before:
  $ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
  $ perf report -D | fgrep SAMPLES
  PERF_RECORD_SAMPLE(IP, 0x2): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
                           \--- EXACT tag is missing

After:
  $ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
  $ perf report -D | fgrep SAMPLES
  PERF_RECORD_SAMPLE(IP, 0x4002): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
                           \--- EXACT tag is set

The problem tends to appear more often when multiple PEBS events are used.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: namhyung@kernel.org
Link: http://lkml.kernel.org/r/1457034642-21837-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 12:18:35 +01:00
Stephane Eranian
5690ae28e4 perf/x86/intel: Add definition for PT PMI bit
This patch adds a definition for GLOBAL_OVFL_STATUS bit 55
which is used with the Processor Trace (PT) feature.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: namhyung@kernel.org
Link: http://lkml.kernel.org/r/1457034642-21837-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 12:18:34 +01:00
Kan Liang
c3d266c8a9 perf/x86/intel: Fix PEBS warning by only restoring active PMU in pmi
This patch tries to fix a PEBS warning found in my stress test. The
following perf command can easily trigger the pebs warning or spurious
NMI error on Skylake/Broadwell/Haswell platforms:

  sudo perf record -e 'cpu/umask=0x04,event=0xc4/pp,cycles,branches,ref-cycles,cache-misses,cache-references' --call-graph fp -b -c1000 -a

Also the NMI watchdog must be enabled.

For this case, the events number is larger than counter number. So
perf has to do multiplexing.

In perf_mux_hrtimer_handler, it does perf_pmu_disable(), schedule out
old events, rotate_ctx, schedule in new events and finally
perf_pmu_enable().

If the old events include precise event, the MSR_IA32_PEBS_ENABLE
should be cleared when perf_pmu_disable().  The MSR_IA32_PEBS_ENABLE
should keep 0 until the perf_pmu_enable() is called and the new event is
precise event.

However, there is a corner case which could restore PEBS_ENABLE to
stale value during the above period. In perf_pmu_disable(), GLOBAL_CTRL
will be set to 0 to stop overflow and followed PMI. But there may be
pending PMI from an earlier overflow, which cannot be stopped. So even
GLOBAL_CTRL is cleared, the kernel still be possible to get PMI. At
the end of the PMI handler, __intel_pmu_enable_all() will be called,
which will restore the stale values if old events haven't scheduled
out.

Once the stale pebs value is set, it's impossible to be corrected if
the new events are non-precise. Because the pebs_enabled will be set
to 0. x86_pmu.enable_all() will ignore the MSR_IA32_PEBS_ENABLE
setting. As a result, the following NMI with stale PEBS_ENABLE
trigger pebs warning.

The pending PMI after enabled=0 will become harmless if the NMI handler
does not change the state. This patch checks cpuc->enabled in pmi and
only restore the state when PMU is active.

Here is the dump:

  Call Trace:
   <NMI>  [<ffffffff813c3a2e>] dump_stack+0x63/0x85
   [<ffffffff810a46f2>] warn_slowpath_common+0x82/0xc0
   [<ffffffff810a483a>] warn_slowpath_null+0x1a/0x20
   [<ffffffff8100fe2e>] intel_pmu_drain_pebs_nhm+0x2be/0x320
   [<ffffffff8100caa9>] intel_pmu_handle_irq+0x279/0x460
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   [<ffffffff811f290d>] ? vunmap_page_range+0x20d/0x330
   [<ffffffff811f2f11>] ?  unmap_kernel_range_noflush+0x11/0x20
   [<ffffffff8148379f>] ? ghes_copy_tofrom_phys+0x10f/0x2a0
   [<ffffffff814839c8>] ? ghes_read_estatus+0x98/0x170
   [<ffffffff81005a7d>] perf_event_nmi_handler+0x2d/0x50
   [<ffffffff810310b9>] nmi_handle+0x69/0x120
   [<ffffffff810316f6>] default_do_nmi+0xe6/0x100
   [<ffffffff810317f2>] do_nmi+0xe2/0x130
   [<ffffffff817aea71>] end_repeat_nmi+0x1a/0x1e
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   <<EOE>>  <IRQ>  [<ffffffff81006df8>] ?  x86_perf_event_set_period+0xd8/0x180
   [<ffffffff81006eec>] x86_pmu_start+0x4c/0x100
   [<ffffffff8100722d>] x86_pmu_enable+0x28d/0x300
   [<ffffffff811994d7>] perf_pmu_enable.part.81+0x7/0x10
   [<ffffffff8119cb70>] perf_mux_hrtimer_handler+0x200/0x280
   [<ffffffff8119c970>] ?  __perf_install_in_context+0xc0/0xc0
   [<ffffffff8110f92d>] __hrtimer_run_queues+0xfd/0x280
   [<ffffffff811100d8>] hrtimer_interrupt+0xa8/0x190
   [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
   [<ffffffff81051bd8>] local_apic_timer_interrupt+0x38/0x60
   [<ffffffff817af01d>] smp_apic_timer_interrupt+0x3d/0x50
   [<ffffffff817ad15c>] apic_timer_interrupt+0x8c/0xa0
   <EOI>  [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
   [<ffffffff81123de5>] ?  smp_call_function_single+0xd5/0x130
   [<ffffffff81123ddb>] ?  smp_call_function_single+0xcb/0x130
   [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
   [<ffffffff8119765a>] event_function_call+0x10a/0x120
   [<ffffffff8119c660>] ? ctx_resched+0x90/0x90
   [<ffffffff811971e0>] ? cpu_clock_event_read+0x30/0x30
   [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
   [<ffffffff8119772b>] _perf_event_enable+0x5b/0x70
   [<ffffffff81197388>] perf_event_for_each_child+0x38/0xa0
   [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
   [<ffffffff811a0ffd>] perf_ioctl+0x12d/0x3c0
   [<ffffffff8134d855>] ? selinux_file_ioctl+0x95/0x1e0
   [<ffffffff8124a3a1>] do_vfs_ioctl+0xa1/0x5a0
   [<ffffffff81036d29>] ? sched_clock+0x9/0x10
   [<ffffffff8124a919>] SyS_ioctl+0x79/0x90
   [<ffffffff817ac4b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
  ---[ end trace aef202839fe9a71d ]---
  Uhhuh. NMI received for unknown reason 2d on CPU 2.
  Do you have a strange power saving mode enabled?

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1457046448-6184-1-git-send-email-kan.liang@intel.com
[ Fixed various typos and other small details. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 12:18:32 +01:00
Jiri Olsa
e72daf3f4d perf/x86/intel: Use PAGE_SIZE for PEBS buffer size on Core2
Using PAGE_SIZE buffers makes the WRMSR to PERF_GLOBAL_CTRL in
intel_pmu_enable_all() mysteriously hang on Core2. As a workaround, we
don't do this.

The hard lockup is easily triggered by running 'perf test attr'
repeatedly. Most of the time it gets stuck on sample session with
small periods.

  # perf test attr -vv
  14: struct perf_event_attr setup                             :
  --- start ---
  ...
    'PERF_TEST_ATTR=/tmp/tmpuEKz3B /usr/bin/perf record -o /tmp/tmpuEKz3B/perf.data -c 123 kill >/dev/null 2>&1' ret 1

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20160301190352.GA8355@krava.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 12:18:32 +01:00
Ingo Molnar
6f6e151692 perf/x86/uncore: Fix build on UP-IOAPIC configs
Commit:

  cf6d445f68 ("perf/x86/uncore: Track packages, not per CPU data")

reorganized the uncore code to track packages, and introduced a dependency
on MAX_APIC_ID. This constant is not available on UP-IOAPIC builds:

  arch/x86/events/intel/uncore.c:1350:44: error: 'MAX_LOCAL_APIC' undeclared here (not in a function)

Include asm/apicdef.h explicitly to pick it up.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-03 09:12:27 +01:00
Thomas Gleixner
9de8d68695 perf/x86/intel/rapl: Convert it to a per package facility
RAPL is a per package facility and we already have a mechanism for a dedicated
per package reader. So there is no point to have multiple CPUs doing the
same. The current implementation actually starts two timers on two CPUs if one
does:

	perf stat -C1,2 -e -e power/energy-pkg ....

which makes the whole concept of 1 reader per package moot.

What's worse is that the above returns the double of the actual energy
consumption, but that's a different problem to address and cannot be solved by
removing the pointless per cpuness of that mechanism.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221012.845369524@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:26 +01:00
Thomas Gleixner
8a6d2f8f73 perf/x86/intel/rapl: Utilize event->pmu_private
Store the PMU pointer in event->pmu_private and use it instead of the per CPU
data. Preparatory step to get rid of the per CPU allocations. The usage sites
are the perf fast path, so we keep that even after the conversion to per
package storage as a CPU to package lookup involves 3 loads versus 1 with the
pmu_private pointer.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221012.748151799@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:25 +01:00
Thomas Gleixner
a208749c64 perf/x86/intel/rapl: Make PMU lock raw
This lock is taken in hard interrupt context even on Preempt-RT. Make it raw
so RT does not have to patch it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221012.669411833@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:25 +01:00
Thomas Gleixner
7162b8fea6 perf/x86/intel/rapl: Refactor the code some more
Split out code from init into seperate functions. Tidy up the code and get rid
of pointless comments. I wish there would be comments for code which is not
obvious....

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221012.588544679@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:24 +01:00
Thomas Gleixner
512089d984 perf/x86/intel/rapl: Clean up the printk output
The output is inconsistent. Use a proper pr_fmt prefix and split out the
advertisement into a seperate function.

Remove the WARN_ON() in the failure case. It's pointless as we already know
where it failed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221012.504551295@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:23 +01:00
Thomas Gleixner
75c7003fbf perf/x86/intel/rapl: Calculate timing once
No point in doing the same calculation over and over. Do it once in
rapl_check_hw_unit().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221012.409238136@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:23 +01:00
Thomas Gleixner
b8b3319a47 perf/x86/intel/rapl: Sanitize the quirk handling
There is no point in having a quirk machinery for a single possible
function. Get rid of it and move the quirk to a place where it actually
makes sense.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221012.311639465@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:22 +01:00
Thomas Gleixner
55f2890f07 perf/x86/intel/rapl: Add proper error handling
Like uncore the rapl driver lacks error handling. It leaks memory and leaves
the hotplug notifier registered.

Add the proper error checks, cleanup the memory and register the hotplug
notifier only on success.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221012.231222076@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:22 +01:00
Thomas Gleixner
4d120c535d perf/x86/intel/rapl: Make Knights Landings support functional
The Knights Landings support added the events and the detection case, but then
returns 0 without actually initializing the driver.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Fixes: 3a2a779732 "perf/x86/intel/rapl: Add support for Knights Landing (KNL)"
Link: http://lkml.kernel.org/r/20160222221012.149331888@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:21 +01:00
Thomas Gleixner
827db839cd perf/x86/intel/cqm: Get rid of the silly for_each_cpu() lookups
CQM is a strict per package facility. Use the proper cpumasks to lookup the
readers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221012.054916179@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:21 +01:00
Thomas Gleixner
5485592c17 perf/x86/intel/uncore: Make PCI and MSR uncore independent
Andi wanted to do this before, but the patch fell down the cracks. Implement
it with the proper error handling.

Requested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221011.799159968@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:19 +01:00
Thomas Gleixner
7b672d6433 perf/x86/intel/uncore: Clear all hardware state on exit
The only missing bit is to completely clear the hardware state on failure
exit. This is now a pretty simple exercise.

Undo the box->init_box() setup on all packages which have been initialized so
far.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221011.702452407@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:19 +01:00
Thomas Gleixner
cf6d445f68 perf/x86/uncore: Track packages, not per CPU data
Uncore is a per package facility, but the code tries to mimick a per CPU
facility with completely convoluted constructs.

Simplify the whole machinery by tracking per package information. While at it,
avoid the kfree/alloc dance when a CPU goes offline and online again. There is
no point in freeing the box after it was allocated. We just keep proper
refcounting and the first CPU which comes online in a package does the
initialization/activation of the box.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221011.622258933@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:18 +01:00
Thomas Gleixner
1f12e32f4c x86/topology: Create logical package id
For per package oriented services we must be able to rely on the number of CPU
packages to be within bounds. Create a tracking facility, which

- calculates the number of possible packages depending on nr_cpu_ids after boot

- makes sure that the package id is within the number of possible packages. If
  the apic id is outside we map it to a logical package id if there is enough
  space available.

Provide interfaces for drivers to query the mapping and do translations from
physcial to logical ids.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221011.541071755@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:18 +01:00
Thomas Gleixner
1f2569fac6 perf/x86/intel/uncore: Store box in event->pmu_private
Store the PMU pointer in event->pmu_private, so we can get rid of the
per CPU data storage.

We keep it after converting to per package data, because a CPU to
package lookup will be 3 loads versus one and these usage sites are
in the perf fast path.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221011.460851335@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:17 +01:00
Thomas Gleixner
1384c70442 perf/x86/uncore: Make uncore_pcibus_to_physid() static
No users outside of this file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221011.285504825@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:16 +01:00
Thomas Gleixner
1229735b29 perf/x86/intel/uncore: Make code more readable
Clean up the code a bit before reworking it completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221011.204771538@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:15 +01:00
Thomas Gleixner
a46195f178 perf/x86/intel/uncore: Clean up hardware on exit
When tearing down the boxes nothing undoes the hardware state which was setup
by box->init_box(). Add a box->exit_box() callback and implement it for the
uncores which have an init_box() callback.

This misses the cleanup in the error exit pathes, but I cannot be bothered to
implement it before cleaning up the rest of the driver, which makes that task
way simpler.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221011.023930023@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:15 +01:00
Thomas Gleixner
83f8ebd2eb perf/x86/intel/uncore: Add sanity checks for PCI dev package id
The storage array is size limited, but misses a sanity check

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221010.929967806@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:14 +01:00
Thomas Gleixner
4f089678d0 perf/x86/intel/uncore: Fix error handling
This driver lacks any form of proper error handling. If initialization fails
or hotplug prepare fails, it lets the facility with half initialized stuff
around.

Fix the state and memory leaks in a first step. As a second step we need to
undo the hardware state which is set via uncore_box_init() on some of the
uncore implementations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221010.848880559@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:14 +01:00
Thomas Gleixner
ffeda00380 perf/x86/intel/uncore: Simplify error rollback
No point in doing partial rollbacks. Robustify uncore_exit_type() so it does
not dereference type->pmus unconditionally and remove all the partial rollback
hackery.

Preparatory patch for proper error handling.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221010.751077467@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:13 +01:00
Thomas Gleixner
3ccca9eca6 perf/x86/intel/uncore: Remove pointless mask check
uncore_cpumask_init() is only ever called from intel_uncore_init() where the
mask is guaranteed to be empty.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221010.657326866@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:12 +01:00
Kan Liang
6cb2f1d9af perf/x86/intel/uncore: Remove SBOX support for BDX-DE
BDX-DE and BDX-EP share the same uncore code path. But there is no sbox
in BDX-DE. This patch remove SBOX support for BDX-DE.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: <tonyb@cybernetics.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Battersby <tonyb@cybernetics.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/37D7C6CF3E00A74B8858931C1DB2F0770589D336@SHSMSX103.ccr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:11 +01:00
Ingo Molnar
0a7348925f Linux 4.5-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW0yM6AAoJEHm+PkMAQRiGeUwIAJRTHFPJTFpJcJjeZEV4/EL1
 7Pl0WSHs/CWBkXIevAg2HgkECSQ9NI9FAUFvoGxCldDpFAnL1U2QV8+Ur2qhiXMG
 5v0jILJuiw57qT/NfhEudZolerlRoHILmB3JRTb+DUV4GHZuWpTkJfUSI9j5aTEl
 w83XUgtK4bKeIyFbHdWQk6xqfzfFBSuEITuSXreOMwkFfMmeScE0WXOPLBZWyhPa
 v0rARJLYgM+vmRAnJjnG8unH+SgnqiNcn2oOFpevKwmpVcOjcEmeuxh/HdeZf7HM
 /R8F86OwdmXsO+z8dQxfcucLg+I9YmKfFr8b6hopu1sRztss2+Uk6H1j2J7IFIg=
 =tvkh
 -----END PGP SIGNATURE-----

Merge tag 'v4.5-rc6' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:04:01 +01:00
Linus Torvalds
4b696dcb1a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "This update contains:

   - Hopefully the last ASM CLAC fixups

   - A fix for the Quark family related to the IMR lock which makes
     kexec work again

   - A off-by-one fix in the MPX code.  Ironic, isn't it?

   - A fix for X86_PAE which addresses once more an unsigned long vs
     phys_addr_t hickup"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mpx: Fix off-by-one comparison with nr_registers
  x86/mm: Fix slow_virt_to_phys() for X86_PAE again
  x86/entry/compat: Add missing CLAC to entry_INT80_32
  x86/entry/32: Add an ASM_CLAC to entry_SYSENTER_32
  x86/platform/intel/quark: Change the kernel's IMR lock bit to false
2016-02-28 07:49:23 -08:00
Linus Torvalds
691429e13d Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  dax: move writeback calls into the filesystems
  dax: give DAX clearing code correct bdev
  ext4: online defrag not supported with DAX
  ext2, ext4: only set S_DAX for regular inodes
  block: disable block device DAX by default
  ocfs2: unlock inode if deleting inode from orphan fails
  mm: ASLR: use get_random_long()
  drivers: char: random: add get_random_long()
  mm: numa: quickly fail allocations for NUMA balancing on full nodes
  mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
2016-02-27 12:46:16 -08:00
Linus Torvalds
a9f8094aae PCI updates for v4.5:
Enumeration
     Revert x86 pcibios_alloc_irq() to fix regression (Bjorn Helgaas)
 
   Marvell MVEBU host bridge driver
     Restrict build to 32-bit ARM (Thierry Reding)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW0gP1AAoJEFmIoMA60/r84qkQAJXOFW20cie2yepQXIk7f5aN
 M2/+iFte8YHf4ZFgZWA/oS+mZAp1OqctSTjWg1KTPZsPHAiB6DkL7WOV6fK+uXr9
 fX8D7Ec2eLgeIFl78iSQaAht4kfmfz8f5LlU6Oi9kvQOt+35gp4lP834HClx7Jep
 XT2qZy/zUQy8GylTzRqueMBpXBCnBQR8iyaD8j4rmklQB3yLXaEMTs7HzwJKBmhM
 ZDnH1xrV5cWYb7niSCBkq4IomCmezJZCvxcDjh/Z8gjDKbVl7TLYOdU8Jh4wNO++
 ng0J8WDSKQJ9Hfv6H+5dgPzoqgrIrWb/Oz5GXd8i6cqv00szG5S/w8nHcO8LPSJv
 dJxxfTlz4KRxdv/sqOVW4cDFUmScODMkDMh+hAeEVYKl9ty5fQ4O2iNwNehzrdNj
 FRrgN1980amYN2n09NZNF863dvVN+DMJ4Ll2VT01rOIUH3bwt4cO6rVWrEUlEKCn
 DiSvJlXHm5nLLCQpkkGKAeq5hYl25DFtYVwLopIbUSHFXCASHPtQewDvgzfn9zYi
 M7J8bDa/uTscSqJsGsb4/gHLEblCfju7Pj2gEHoiK4XtbCuuamFA3nsA7lzcAG9j
 W5pVDQTqctdgHq/UMLKIeoBJ592fhYzKipY8vELOKwkieDR9F3g3u8nWt4ZAUIXE
 /oS5F1eWMkDMvdjyZO4C
 =yQ41
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "Enumeration:
    Revert x86 pcibios_alloc_irq() to fix regression (Bjorn Helgaas)

  Marvell MVEBU host bridge driver:
    Restrict build to 32-bit ARM (Thierry Reding)"

* tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: mvebu: Restrict build to 32-bit ARM
  Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"
  Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed"
  Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled"
2016-02-27 12:33:42 -08:00
Daniel Cashman
5ef11c35ce mm: ASLR: use get_random_long()
Replace calls to get_random_int() followed by a cast to (unsigned long)
with calls to get_random_long().  Also address shifting bug which, in
case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.

Signed-off-by: Daniel Cashman <dcashman@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Kralevich <nnk@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Bjorn Helgaas
6c777e8799 Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"
991de2e590 ("PCI, x86: Implement pcibios_alloc_irq() and
pcibios_free_irq()") appeared in v4.3 and helps support IOAPIC hotplug.

Олег reported that the Elcus-1553 TA1-PCI driver worked in v4.2 but not
v4.3 and bisected it to 991de2e590.  Sunjin reported that the RocketRAID
272x driver worked in v4.2 but not v4.3.  In both cases booting with
"pci=routirq" is a workaround.

I think the problem is that after 991de2e590, we no longer call
pcibios_enable_irq() for upstream bridges.  Prior to 991de2e590, when a
driver called pci_enable_device(), we recursively called
pcibios_enable_irq() for upstream bridges via pci_enable_bridge().

After 991de2e590, we call pcibios_enable_irq() from pci_device_probe()
instead of the pci_enable_device() path, which does *not* call
pcibios_enable_irq() for upstream bridges.

Revert 991de2e590 to fix these driver regressions.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
Fixes: 991de2e590 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
Reported-and-tested-by: Олег Мороз <oleg.moroz@mcc.vniiem.ru>
Reported-by: Sunjin Yang <fan4326@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
CC: Jiang Liu <jiang.liu@linux.intel.com>
2016-02-27 08:52:20 -06:00
Colin Ian King
9bf148cb08 x86/mpx: Fix off-by-one comparison with nr_registers
In the unlikely event that regno == nr_registers then we get an array
overrun on regoff because the invalid register check is currently
off-by-one. Fix this with a check that regno is >= nr_registers instead.

Detected with static analysis using CoverityScan.

Fixes: fcc7ffd679 "x86, mpx: Decode MPX instruction to get bound violation information"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1456512931-3388-1-git-send-email-colin.king@canonical.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-02-26 22:12:47 +01:00
Linus Torvalds
73056bbc68 KVM/ARM fixes:
- Fix per-vcpu vgic bitmap allocation
 - Do not give copy random memory on MMIO read
 - Fix GICv3 APR register restore order
 
 KVM/x86 fixes:
 - Fix ubsan warning
 - Fix hardware breakpoints in a guest vs. preempt notifiers
 - Fix Hurd
 
 Generic:
 - use __GFP_NOWARN together with GFP_NOWAIT
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWzsReAAoJEL/70l94x66DT6cH/3K/X/eciQIQTjLWKQ9BUhsN
 +4WN+PX51GCvRZgoGgXXxTUzWVpSHNE7iD5FR/yqiUpC6lq+GWYKyQYBU6S2tw7N
 QrzVFUAOIAExfzw4ztLz8pvIIwsF6EC2sA0DRZO85FWApO4P3BJN/1nBa+THJchH
 6RamguztCjVSfboFwpulPzmgzJwIQ1ai+KoO1z/1ifrxjOHLytF5wn6UegPXIkc6
 PAWG0b6w2ZnSwTNhEdsjzlcEANd/otwOoTlcft//KLuBkSS0GgU3vgxv7OXeSn67
 +Wa9wWT/rU6M4Ol0noXcyr/kiF5629bQ4IyLK7YFgOUPFt4Tmg+A1ABGc92WJa4=
 =/9Sf
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "KVM/ARM fixes:
   - Fix per-vcpu vgic bitmap allocation
   - Do not give copy random memory on MMIO read
   - Fix GICv3 APR register restore order

  KVM/x86 fixes:
   - Fix ubsan warning
   - Fix hardware breakpoints in a guest vs. preempt notifiers
   - Fix Hurd

  Generic:
   - use __GFP_NOWARN together with GFP_NOWAIT"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: MMU: fix ubsan index-out-of-range warning
  arm64: KVM: vgic-v3: Restore ICH_APR0Rn_EL2 before ICH_APR1Rn_EL2
  KVM: async_pf: do not warn on page allocation failures
  KVM: x86: fix conversion of addresses to linear in 32-bit protected mode
  KVM: x86: fix missed hardware breakpoints
  arm/arm64: KVM: Feed initialized memory to MMIO accesses
  KVM: arm/arm64: vgic: Ensure bitmaps are long enough
2016-02-25 19:53:54 -08:00
Dexuan Cui
bf70e5513d x86/mm: Fix slow_virt_to_phys() for X86_PAE again
"d1cd12108346: x86, pageattr: Prevent overflow in slow_virt_to_phys() for
X86_PAE" was unintentionally removed by the recent "34437e67a672: x86/mm: Fix
slow_virt_to_phys() to handle large PAT bit".

And, the variable 'phys_addr' was defined as "unsigned long" by mistake -- it should
be "phys_addr_t".

As a result, Hyper-V network driver in 32-PAE Linux guest can't work again.

Fixes: commit 34437e67a6: "x86/mm: Fix slow_virt_to_phys() to handle large PAT bit"
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Cc: olaf@aepfle.de
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: driverdev-devel@linuxdriverproject.org
Cc: linux-mm@kvack.org
Cc: apw@canonical.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Link: http://lkml.kernel.org/r/1456394292-9030-1-git-send-email-decui@microsoft.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-02-25 19:53:15 +01:00
Mike Krinkin
17e4bce0ae KVM: x86: MMU: fix ubsan index-out-of-range warning
Ubsan reports the following warning due to a typo in
update_accessed_dirty_bits template, the patch fixes
the typo:

[  168.791851] ================================================================================
[  168.791862] UBSAN: Undefined behaviour in arch/x86/kvm/paging_tmpl.h:252:15
[  168.791866] index 4 is out of range for type 'u64 [4]'
[  168.791871] CPU: 0 PID: 2950 Comm: qemu-system-x86 Tainted: G           O L  4.5.0-rc5-next-20160222 #7
[  168.791873] Hardware name: LENOVO 23205NG/23205NG, BIOS G2ET95WW (2.55 ) 07/09/2013
[  168.791876]  0000000000000000 ffff8801cfcaf208 ffffffff81c9f780 0000000041b58ab3
[  168.791882]  ffffffff82eb2cc1 ffffffff81c9f6b4 ffff8801cfcaf230 ffff8801cfcaf1e0
[  168.791886]  0000000000000004 0000000000000001 0000000000000000 ffffffffa1981600
[  168.791891] Call Trace:
[  168.791899]  [<ffffffff81c9f780>] dump_stack+0xcc/0x12c
[  168.791904]  [<ffffffff81c9f6b4>] ? _atomic_dec_and_lock+0xc4/0xc4
[  168.791910]  [<ffffffff81da9e81>] ubsan_epilogue+0xd/0x8a
[  168.791914]  [<ffffffff81daafa2>] __ubsan_handle_out_of_bounds+0x15c/0x1a3
[  168.791918]  [<ffffffff81daae46>] ? __ubsan_handle_shift_out_of_bounds+0x2bd/0x2bd
[  168.791922]  [<ffffffff811287ef>] ? get_user_pages_fast+0x2bf/0x360
[  168.791954]  [<ffffffffa1794050>] ? kvm_largepages_enabled+0x30/0x30 [kvm]
[  168.791958]  [<ffffffff81128530>] ? __get_user_pages_fast+0x360/0x360
[  168.791987]  [<ffffffffa181b818>] paging64_walk_addr_generic+0x1b28/0x2600 [kvm]
[  168.792014]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
[  168.792019]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
[  168.792044]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
[  168.792076]  [<ffffffffa181c36d>] paging64_gva_to_gpa+0x7d/0x110 [kvm]
[  168.792121]  [<ffffffffa181c2f0>] ? paging64_walk_addr_generic+0x2600/0x2600 [kvm]
[  168.792130]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
[  168.792178]  [<ffffffffa17d9a4a>] emulator_read_write_onepage+0x27a/0x1150 [kvm]
[  168.792208]  [<ffffffffa1794d44>] ? __kvm_read_guest_page+0x54/0x70 [kvm]
[  168.792234]  [<ffffffffa17d97d0>] ? kvm_task_switch+0x160/0x160 [kvm]
[  168.792238]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
[  168.792263]  [<ffffffffa17daa07>] emulator_read_write+0xe7/0x6d0 [kvm]
[  168.792290]  [<ffffffffa183b620>] ? em_cr_write+0x230/0x230 [kvm]
[  168.792314]  [<ffffffffa17db005>] emulator_write_emulated+0x15/0x20 [kvm]
[  168.792340]  [<ffffffffa18465f8>] segmented_write+0xf8/0x130 [kvm]
[  168.792367]  [<ffffffffa1846500>] ? em_lgdt+0x20/0x20 [kvm]
[  168.792374]  [<ffffffffa14db512>] ? vmx_read_guest_seg_ar+0x42/0x1e0 [kvm_intel]
[  168.792400]  [<ffffffffa1846d82>] writeback+0x3f2/0x700 [kvm]
[  168.792424]  [<ffffffffa1846990>] ? em_sidt+0xa0/0xa0 [kvm]
[  168.792449]  [<ffffffffa185554d>] ? x86_decode_insn+0x1b3d/0x4f70 [kvm]
[  168.792474]  [<ffffffffa1859032>] x86_emulate_insn+0x572/0x3010 [kvm]
[  168.792499]  [<ffffffffa17e71dd>] x86_emulate_instruction+0x3bd/0x2110 [kvm]
[  168.792524]  [<ffffffffa17e6e20>] ? reexecute_instruction.part.110+0x2e0/0x2e0 [kvm]
[  168.792532]  [<ffffffffa14e9a81>] handle_ept_misconfig+0x61/0x460 [kvm_intel]
[  168.792539]  [<ffffffffa14e9a20>] ? handle_pause+0x450/0x450 [kvm_intel]
[  168.792546]  [<ffffffffa15130ea>] vmx_handle_exit+0xd6a/0x1ad0 [kvm_intel]
[  168.792572]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
[  168.792597]  [<ffffffffa17f6bcd>] kvm_arch_vcpu_ioctl_run+0xd3d/0x6090 [kvm]
[  168.792621]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
[  168.792627]  [<ffffffff8293b530>] ? __ww_mutex_lock_interruptible+0x1630/0x1630
[  168.792651]  [<ffffffffa17f5e90>] ? kvm_arch_vcpu_runnable+0x4f0/0x4f0 [kvm]
[  168.792656]  [<ffffffff811eeb30>] ? preempt_notifier_unregister+0x190/0x190
[  168.792681]  [<ffffffffa17e0447>] ? kvm_arch_vcpu_load+0x127/0x650 [kvm]
[  168.792704]  [<ffffffffa178e9a3>] kvm_vcpu_ioctl+0x553/0xda0 [kvm]
[  168.792727]  [<ffffffffa178e450>] ? vcpu_put+0x40/0x40 [kvm]
[  168.792732]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
[  168.792735]  [<ffffffff82946087>] ? _raw_spin_unlock+0x27/0x40
[  168.792740]  [<ffffffff8163a943>] ? handle_mm_fault+0x1673/0x2e40
[  168.792744]  [<ffffffff8129daa8>] ? trace_hardirqs_on_caller+0x478/0x6c0
[  168.792747]  [<ffffffff8129dcfd>] ? trace_hardirqs_on+0xd/0x10
[  168.792751]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
[  168.792756]  [<ffffffff81725a80>] do_vfs_ioctl+0x1b0/0x12b0
[  168.792759]  [<ffffffff817258d0>] ? ioctl_preallocate+0x210/0x210
[  168.792763]  [<ffffffff8174aef3>] ? __fget+0x273/0x4a0
[  168.792766]  [<ffffffff8174acd0>] ? __fget+0x50/0x4a0
[  168.792770]  [<ffffffff8174b1f6>] ? __fget_light+0x96/0x2b0
[  168.792773]  [<ffffffff81726bf9>] SyS_ioctl+0x79/0x90
[  168.792777]  [<ffffffff82946880>] entry_SYSCALL_64_fastpath+0x23/0xc1
[  168.792780] ================================================================================

Signed-off-by: Mike Krinkin <krinkin.m.u@gmail.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-25 09:50:35 +01:00
Andy Lutomirski
3d44d51bd3 x86/entry/compat: Add missing CLAC to entry_INT80_32
This doesn't seem to fix a regression -- I don't think the CLAC was
ever there.

I double-checked in a debugger: entries through the int80 gate do
not automatically clear AC.

Stable maintainers: I can provide a backport to 4.3 and earlier if
needed.  This needs to be backported all the way to 3.10.

Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org> # v3.10 and later
Fixes: 63bcff2a30 ("x86, smap: Add STAC and CLAC instructions to control user space access")
Link: http://lkml.kernel.org/r/b02b7e71ae54074be01fc171cbd4b72517055c0e.1456345086.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-25 08:31:20 +01:00
Paolo Bonzini
0c1d77f4ba KVM: x86: fix conversion of addresses to linear in 32-bit protected mode
Commit e8dd2d2d64 ("Silence compiler warning in arch/x86/kvm/emulate.c",
2015-09-06) broke boot of the Hurd.  The bug is that the "default:"
case actually could modify "la", but after the patch this change is
not reflected in *linear.

The bug is visible whenever a non-zero segment base causes the linear
address to wrap around the 4GB mark.

Fixes: e8dd2d2d64
Cc: stable@vger.kernel.org
Reported-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-24 14:47:45 +01:00
Paolo Bonzini
172b2386ed KVM: x86: fix missed hardware breakpoints
Sometimes when setting a breakpoint a process doesn't stop on it.
This is because the debug registers are not loaded correctly on
VCPU load.

The following simple reproducer from Oleg Nesterov tries using debug
registers in two threads.  To see the bug, run a 2-VCPU guest with
"taskset -c 0" and run "./bp 0 1" inside the guest.

    #include <unistd.h>
    #include <signal.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <sys/wait.h>
    #include <sys/ptrace.h>
    #include <sys/user.h>
    #include <asm/debugreg.h>
    #include <assert.h>

    #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

    unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
    {
        unsigned long dr7;

        dr7 = ((len | type) & 0xf)
            << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
        if (enable)
            dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));

        return dr7;
    }

    int write_dr(int pid, int dr, unsigned long val)
    {
        return ptrace(PTRACE_POKEUSER, pid,
                offsetof (struct user, u_debugreg[dr]),
                val);
    }

    void set_bp(pid_t pid, void *addr)
    {
        unsigned long dr7;
        assert(write_dr(pid, 0, (long)addr) == 0);
        dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
        assert(write_dr(pid, 7, dr7) == 0);
    }

    void *get_rip(int pid)
    {
        return (void*)ptrace(PTRACE_PEEKUSER, pid,
                offsetof(struct user, regs.rip), 0);
    }

    void test(int nr)
    {
        void *bp_addr = &&label + nr, *bp_hit;
        int pid;

        printf("test bp %d\n", nr);
        assert(nr < 16); // see 16 asm nops below

        pid = fork();
        if (!pid) {
            assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
            kill(getpid(), SIGSTOP);
            for (;;) {
                label: asm (
                    "nop; nop; nop; nop;"
                    "nop; nop; nop; nop;"
                    "nop; nop; nop; nop;"
                    "nop; nop; nop; nop;"
                );
            }
        }

        assert(pid == wait(NULL));
        set_bp(pid, bp_addr);

        for (;;) {
            assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0);
            assert(pid == wait(NULL));

            bp_hit = get_rip(pid);
            if (bp_hit != bp_addr)
                fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n",
                    bp_hit - &&label, nr);
        }
    }

    int main(int argc, const char *argv[])
    {
        while (--argc) {
            int nr = atoi(*++argv);
            if (!fork())
                test(nr);
        }

        while (wait(NULL) > 0)
            ;
        return 0;
    }

Cc: stable@vger.kernel.org
Suggested-by: Nadav Amit <namit@cs.technion.ac.il>
Reported-by: Andrey Wagin <avagin@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-24 14:47:39 +01:00
Andy Lutomirski
04d1d281dc x86/entry/32: Add an ASM_CLAC to entry_SYSENTER_32
Both before and after 5f310f739b ("x86/entry/32: Re-implement
SYSENTER using the new C path"), we relied on a uaccess very early
in the SYSENTER path to clear AC.  After that change, though, we can
potentially make it all the way into C code with AC set, which
enlarges the attack surface for SMAP bypass by doing SYSENTER with
AC set.

Strengthen the SMAP protection by addding the missing ASM_CLAC right
at the beginning.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3e36be110724896e32a4a1fe73bacb349d3cba94.1456262295.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24 08:43:04 +01:00
Linus Torvalds
de9e478b9d x86: fix SMAP in 32-bit environments
In commit 11f1a4b975 ("x86: reorganize SMAP handling in user space
accesses") I changed how the stac/clac instructions were generated
around the user space accesses, which then made it possible to do
batched accesses efficiently for user string copies etc.

However, in doing so, I completely spaced out, and didn't even think
about the 32-bit case.  And nobody really even seemed to notice, because
SMAP doesn't even exist until modern Skylake processors, and you'd have
to be crazy to run 32-bit kernels on a modern CPU.

Which brings us to Andy Lutomirski.

He actually tested the 32-bit kernel on new hardware, and noticed that
it doesn't work.  My bad.  The trivial fix is to add the required
uaccess begin/end markers around the raw accesses in <asm/uaccess_32.h>.

I feel a bit bad about this patch, just because that header file really
should be cleaned up to avoid all the duplicated code in it, and this
commit just expands on the problem.  But this just fixes the bug without
any bigger cleanup surgery.

Reported-and-tested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-23 16:25:20 -08:00
Bryan O'Donoghue
dd71a17b11 x86/platform/intel/quark: Change the kernel's IMR lock bit to false
Currently when setting up an IMR around the kernel's .text section we lock
that IMR, preventing further modification. While superficially this appears
to be the right thing to do, in fact this doesn't account for a legitimate
change in the memory map such as when executing a new kernel via kexec.

In such a scenario a second kernel can have a different size and location
to it's predecessor and can view some of the memory occupied by it's
predecessor as legitimately usable DMA RAM. If this RAM were then
subsequently allocated to DMA agents within the system it could conceivably
trigger an IMR violation.

This patch fixes the this potential situation by keeping the kernel's .text
section IMR lock bit false by default.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boon.leong.ong@intel.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/1456190999-12685-2-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-23 07:35:53 +01:00
Linus Torvalds
692b8c663c Xen bug fixes for 4.5-rc5
- Two scsiback fixes (resource leak and spurious warning).
 - Fix DMA mapping of compound pages on arm/arm64.
 - Fix some pciback regressions in MSI-X handling.
 - Fix a pcifront crash due to some uninitialize state.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWyvatAAoJEFxbo/MsZsTRBFcH+wWnv0/N+gKib3cKCI4lwmTg
 n8iVgf8dNWwD36M2s/OlzCAglAIt8Xr6ySNvPqTerpm7lT9yXlIVQxGXTbIGuTAA
 h8Kt8WiC0BNLHHlLxBuCz62KR47DvMhsr84lFURE8FmpUiulFjXmRcbrZkHIMYRS
 l/X+xJWO1vxwrSYho0P9n3ksTWHm488DTPvZz3ICNI2G2sndDfbT3gv3tMDaQhcX
 ZaQR93vtIoldqk29Ga59vaVtksbgxHZIbasY9PQ8rqOxHJpDQbPzpjocoLxAzf50
 cioQVyKQ7i9vUvZ+B3TTAOhxisA2hDwNhLGQzmjgxe2TXeKdo3yjYwO6m1dDBzY=
 =VY/S
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.5-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:

 - Two scsiback fixes (resource leak and spurious warning).

 - Fix DMA mapping of compound pages on arm/arm64.

 - Fix some pciback regressions in MSI-X handling.

 - Fix a pcifront crash due to some uninitialize state.

* tag 'for-linus-4.5-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pcifront: Fix mysterious crashes when NUMA locality information was extracted.
  xen/pcifront: Report the errors better.
  xen/pciback: Save the number of MSI-X entries to be copied later.
  xen/pciback: Check PF instead of VF for PCI_COMMAND_MEMORY
  xen: fix potential integer overflow in queue_reply
  xen/arm: correctly handle DMA mapping of compound pages
  xen/scsiback: avoid warnings when adding multiple LUNs to a domain
  xen/scsiback: correct frontend counting
2016-02-22 13:57:01 -08:00
Linus Torvalds
0389075ecf Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "This is unusually large, partly due to the EFI fixes that prevent
  accidental deletion of EFI variables through efivarfs that may brick
  machines.  These fixes are somewhat involved to maintain compatibility
  with existing install methods and other usage modes, while trying to
  turn off the 'rm -rf' bricking vector.

  Other fixes are for large page ioremap()s and for non-temporal
  user-memcpy()s"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix vmalloc_fault() to handle large pages properly
  hpet: Drop stale URLs
  x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()
  x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
  lib/ucs2_string: Correct ucs2 -> utf8 conversion
  efi: Add pstore variables to the deletion whitelist
  efi: Make efivarfs entries immutable by default
  efi: Make our variable validation list include the guid
  efi: Do variable name validation tests in utf8
  efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version
  lib/ucs2_string: Add ucs2 -> utf8 helper functions
2016-02-20 09:32:40 -08:00
Linus Torvalds
06b74c658c Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "A handful of CPU hotplug related fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Plug potential memory leak in CPU_UP_PREPARE
  perf/core: Remove the bogus and dangerous CPU_DOWN_FAILED hotplug state
  perf/core: Remove bogus UP_CANCELED hotplug state
  perf/x86/amd/uncore: Plug reference leak
2016-02-20 09:30:42 -08:00
Linus Torvalds
87d9ac712b Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: slab: free kmem_cache_node after destroy sysfs file
  ipc/shm: handle removed segments gracefully in shm_mmap()
  MAINTAINERS: update Kselftest Framework mailing list
  devm_memremap_release(): fix memremap'd addr handling
  mm/hugetlb.c: fix incorrect proc nr_hugepages value
  mm, x86: fix pte_page() crash in gup_pte_range()
  fsnotify: turn fsnotify reaper thread into a workqueue job
  Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"
  mm: fix regression in remap_file_pages() emulation
  thp, dax: do not try to withdraw pgtable from non-anon VMA
2016-02-19 13:36:00 -08:00
Linus Torvalds
705d43dbe1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fixes from Jiri Kosina:

 - regression (from 4.4) fix for ordering issue, introduced by an
   earlier ftrace change, that broke live patching of modules.

   The fix replaces the ftrace module notifier by direct call in order
   to make the ordering guaranteed and well-defined.  The patch, from
   Jessica Yu, has been acked both by Steven and Rusty

 - error message fix from Miroslav Benes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  ftrace/module: remove ftrace module notifier
  livepatch: change the error message in asm/livepatch.h header files
2016-02-18 16:34:15 -08:00