linux

Author	SHA1	Message	Date
Stephane Eranian	c02cdbf60b	perf/x86/intel: Limit to half counters when the HT workaround is enabled, to avoid exclusive mode starvation This patch limits the number of counters available to each CPU when the HT bug workaround is enabled. This is necessary to avoid situation of counter starvation. Such can arise from configuration where one HT thread, HT0, is using all 4 counters with corrupting events which require exclusion the the sibling HT, HT1. In such case, HT1 would not be able to schedule any event until HT0 is done. To mitigate this problem, this patch artificially limits the number of counters to 2. That way, we can gurantee that at least 2 counters are not in exclusive mode and therefore allow the sibling thread to schedule events of the same type (system vs. per-thread). The 2 counters are not determined in advance. We simply set the limit to two events per HT. This helps mitigate starvation in case of events with specific counter constraints such a PREC_DIST. Note that this does not elimintate the starvation is all cases. But it is better than not having it. (Solution suggested by Peter Zjilstra.) Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-11-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:14 +02:00
Stephane Eranian	a90738c2cb	perf/x86/intel: Fix intel_get_event_constraints() for dynamic constraints With dynamic constraint, we need to restart from the static constraints each time the intel_get_event_constraints() is called. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-10-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:14 +02:00
Maria Dimakopoulou	b63b4b459a	perf/x86/intel: Enforce HT bug workaround with PEBS for SNB/IVB/HSW This patch modifies the PEBS constraint tables for SNB/IVB/HSW such that corrupting events supporting PEBS activate the HT workaround. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-9-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:13 +02:00
Maria Dimakopoulou	93fcf72cc0	perf/x86/intel: Enforce HT bug workaround for SNB/IVB/HSW This patches activates the HT bug workaround for the SNB/IVB/HSW processors. This covers non-PEBS mode. Activation is done thru the constraint tables. Both client and server processors needs this workaround. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-8-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:12 +02:00
Maria Dimakopoulou	e979121b1b	perf/x86/intel: Implement cross-HT corruption bug workaround This patch implements a software workaround for a HW erratum on Intel SandyBridge, IvyBridge and Haswell processors with Hyperthreading enabled. The errata are documented for each processor in their respective specification update documents: - SandyBridge: BJ122 - IvyBridge: BV98 - Haswell: HSD29 The bug causes silent counter corruption across hyperthreads only when measuring certain memory events (0xd0, 0xd1, 0xd2, 0xd3). Counters measuring those events may leak counts to the sibling counter. For instance, counter 0, thread 0 measuring event 0xd0, may leak to counter 0, thread 1, regardless of the event measured there. The size of the leak is not predictible. It all depends on the workload and the state of each sibling hyper-thread. The corrupting events do undercount as a consequence of the leak. The leak is compensated automatically only when the sibling counter measures the exact same corrupting event AND the workload is on the two threads is the same. Given, there is no way to guarantee this, a work-around is necessary. Furthermore, there is a serious problem if the leaked count is added to a low-occurrence event. In that case the corruption on the low occurrence event can be very large, e.g., orders of magnitude. There is no HW or FW workaround for this problem. The bug is very easy to reproduce on a loaded system. Here is an example on a Haswell client, where CPU0, CPU4 are siblings. We load the CPUs with a simple triad app streaming large floating-point vector. We use 0x81d0 corrupting event (MEM_UOPS_RETIRED:ALL_LOADS) and 0x20cc (ROB_MISC_EVENTS:LBR_INSERTS). Given we are not using the LBR, the 0x20cc event should be zero. $ taskset -c 0 triad & $ taskset -c 4 triad & $ perf stat -a -C 0 -e r81d0 sleep 100 & $ perf stat -a -C 4 -r20cc sleep 10 Performance counter stats for 'system wide': 139 277 291 r20cc 10,000969126 seconds time elapsed In this example, 0x81d0 and r20cc ar eusing sinling counters on CPU0 and CPU4. 0x81d0 leaks into 0x20cc and corrupts it from 0 to 139 millions occurrences. This patch provides a software workaround to this problem by modifying the way events are scheduled onto counters by the kernel. The patch forces cross-thread mutual exclusion between counters in case a corrupting event is measured by one of the hyper-threads. If thread 0, counter 0 is measuring event 0xd0, then nothing can be measured on counter 0, thread 1. If no corrupting event is measured on any hyper-thread, event scheduling proceeds as before. The same example run with the workaround enabled, yield the correct answer: $ taskset -c 0 triad & $ taskset -c 4 triad & $ perf stat -a -C 0 -e r81d0 sleep 100 & $ perf stat -a -C 4 -r20cc sleep 10 Performance counter stats for 'system wide': 0 r20cc 10,000969126 seconds time elapsed The patch does provide correctness for all non-corrupting events. It does not "repatriate" the leaked counts back to the leaking counter. This is planned for a second patch series. This patch series makes this repatriation more easy by guaranteeing the sibling counter is not measuring any useful event. The patch introduces dynamic constraints for events. That means that events which did not have constraints, i.e., could be measured on any counters, may now be constrained to a subset of the counters depending on what is going on the sibling thread. The algorithm is similar to a cache coherency protocol. We call it XSU in reference to Exclusive, Shared, Unused, the 3 possible states of a PMU counter. As a consequence of the workaround, users may see an increased amount of event multiplexing, even in situtations where there are fewer events than counters measured on a CPU. Patch has been tested on all three impacted processors. Note that when HT is off, there is no corruption. However, the workaround is still enabled, yet not costing too much. Adding a dynamic detection of HT on turned out to be complex are requiring too much to code to be justified. This patch addresses the issue when PEBS is not used. A subsequent patch fixes the problem when PEBS is used. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> [spinlock_t -> raw_spinlock_t] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-7-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:12 +02:00
Maria Dimakopoulou	6f6539cad9	perf/x86/intel: Add cross-HT counter exclusion infrastructure This patch adds a new shared_regs style structure to the per-cpu x86 state (cpuc). It is used to coordinate access between counters which must be used with exclusion across HyperThreads on Intel processors. This new struct is not needed on each PMU, thus is is allocated on demand. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> [peterz: spinlock_t -> raw_spinlock_t] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-6-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:11 +02:00
Stephane Eranian	79cba82244	perf/x86: Add 'index' param to get_event_constraint() callback This patch adds an index parameter to the get_event_constraint() x86_pmu callback. It is expected to represent the index of the event in the cpuc->event_list[] array. When the callback is used for fake_cpuc (evnet validation), then the index must be -1. The motivation for passing the index is to use it to index into another cpuc array. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-5-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:10 +02:00
Maria Dimakopoulou	c5362c0c37	perf/x86: Add 3 new scheduling callbacks This patch adds 3 new PMU model specific callbacks during the event scheduling done by x86_schedule_events(). ->start_scheduling(): invoked when entering the schedule routine. ->stop_scheduling(): invoked at the end of the schedule routine ->commit_scheduling(): invoked for each committed event To be used optionally by model-specific code. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-4-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:09 +02:00
Stephane Eranian	9041346431	perf/x86: Vectorize cpuc->kfree_on_online Make the cpuc->kfree_on_online a vector to accommodate more than one entry and add the second entry to be used by a later patch. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:08 +02:00
Stephane Eranian	9a5e3fb52a	perf/x86: Rename x86_pmu::er_flags to 'flags' Because it will be used for more than just tracking the presence of extra registers. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:33:08 +02:00
Ingo Molnar	c2b078e78a	Merge branch 'perf/urgent' into perf/core, before applying dependent patches Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:17:46 +02:00
Alexander Shishkin	8062382c8d	perf/x86/intel/bts: Add BTS PMU driver Add support for Branch Trace Store (BTS) via kernel perf event infrastructure. The difference with the existing implementation of BTS support is that this one is a separate PMU that exports events' trace buffers to userspace by means of AUX area of the perf buffer, which is zero-copy mapped into userspace. The immediate benefit is that the buffer size can be much bigger, resulting in fewer interrupts and no kernel side copying is involved and little to no trace data loss. Also, kernel code can be traced with this driver. The old way of collecting BTS traces still works. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1422614435-114702-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:14:21 +02:00
Alexander Shishkin	52ca9ced3f	perf/x86/intel/pt: Add Intel PT PMU driver Add support for Intel Processor Trace (PT) to kernel's perf events. PT is an extension of Intel Architecture that collects information about software execuction such as control flow, execution modes and timings and formats it into highly compressed binary packets. Even being compressed, these packets are generated at hundreds of megabytes per second per core, which makes it impractical to decode them on the fly in the kernel. This driver exports trace data by through AUX space in the perf ring buffer, which is zero-copy mapped into userspace for faster data retrieval. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1422614392-114498-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:14:20 +02:00
Alexander Shishkin	4807034248	perf/x86: Mark Intel PT and LBR/BTS as mutually exclusive Intel PT cannot be used at the same time as LBR or BTS and will cause a general protection fault if they are used together. In order to avoid fixing up GPs in the fast path, instead we disallow creating LBR/BTS events when PT events are present and vice versa. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1421237903-181015-12-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:14:19 +02:00
Alexander Shishkin	ed69628b3b	x86: Add Intel Processor Trace (INTEL_PT) cpu feature detection Intel Processor Trace is an architecture extension that allows for program flow tracing. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1421237903-181015-11-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:14:18 +02:00
Andi Kleen	c420f19b9c	perf/x86/intel: Fix Haswell CYCLE_ACTIVITY.* counter constraints Some of the CYCLE_ACTIVITY.* events can only be scheduled on counter 2. Due to a typo Haswell matched those with INTEL_EVENT_CONSTRAINT, which lead to the events never matching as the comparison does not expect anything in the umask too. Fix the typo. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1425925222-32361-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:07:43 +02:00
Kan Liang	687805e4a6	perf/x86/intel: Filter branches for PEBS event For supporting Intel LBR branches filtering, Intel LBR sharing logic mechanism is introduced from commit `b36817e886` ("perf/x86: Add Intel LBR sharing logic"). It modifies __intel_shared_reg_get_constraints() to config lbr_sel, which is finally used to set LBR_SELECT. However, the intel_shared_regs_constraints() function is called after intel_pebs_constraints(). The PEBS event will return immediately after intel_pebs_constraints(). So it's impossible to filter branches for PEBS events. This patch moves intel_shared_regs_constraints() ahead of intel_pebs_constraints(). We can safely do that because the intel_shared_regs_constraints() function only returns empty constraint if its rejecting the event, otherwise it returns NULL such that we continue calling intel_pebs_constraints() and x86_get_event_constraint(). Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1427467105-9260-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 17:07:42 +02:00
Linus Torvalds	7fc377ecf4	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "Fix x86 syscall exit code bug that resulted in spurious non-execution of TIF-driven user-return worklets, causing big trouble for things like KVM that rely on user notifiers for correctness of their vcpu model, causing crashes like double faults" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm/entry: Check for syscall exit work with IRQs disabled	2015-03-28 11:25:04 -07:00
Peter Zijlstra	34f439278c	perf: Add per event clockid support While thinking on the whole clock discussion it occurred to me we have two distinct uses of time: 1) the tracking of event/ctx/cgroup enabled/running/stopped times which includes the self-monitoring support in struct perf_event_mmap_page. 2) the actual timestamps visible in the data records. And we've been conflating them. The first is all about tracking time deltas, nobody should really care in what time base that happens, its all relative information, as long as its internally consistent it works. The second however is what people are worried about when having to merge their data with external sources. And here we have the discussion on MONOTONIC vs MONOTONIC_RAW etc.. Where MONOTONIC is good for correlating between machines (static offset), MONOTNIC_RAW is required for correlating against a fixed rate hardware clock. This means configurability; now 1) makes that hard because it needs to be internally consistent across groups of unrelated events; which is why we had to have a global perf_clock(). However, for 2) it doesn't really matter, perf itself doesn't care what it writes into the buffer. The below patch makes the distinction between these two cases by adding perf_event_clock() which is used for the second case. It further makes this configurable on a per-event basis, but adds a few sanity checks such that we cannot combine events with different clocks in confusing ways. And since we then have per-event configurability we might as well retain the 'legacy' behaviour as a default. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 10:13:22 +01:00
Ingo Molnar	b381e63b48	Merge branch 'perf/core' into perf/timer, before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 10:10:47 +01:00
Ingo Molnar	4e6d7c2aa9	Merge branch 'timers/core' into perf/timer, to apply dependent patch An upcoming patch will depend on tai_ns() and NMI-safe ktime_get_raw_fast(), so merge timers/core here in a separate topic branch until it's all cooked and timers/core is merged upstream. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 10:09:21 +01:00
David Ahern	9332d250b4	perf/x86: Remove redundant calls to perf_pmu_{dis\|en}able() perf_pmu_disable() is called before pmu->add() and perf_pmu_enable() is called afterwards. No need to call these inside of x86_pmu_add() as well. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1424281543-67335-1-git-send-email-dsahern@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 09:49:44 +01:00
Ingo Molnar	936c663aed	Merge branch 'perf/x86' into perf/core, because it's ready Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 09:46:19 +01:00
Ingo Molnar	072e5a1cfa	Merge branch 'perf/urgent' into perf/core, to pick up fixes and to refresh the tree Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 09:46:03 +01:00
Peter Zijlstra	876e78818d	time: Rename timekeeper::tkr to timekeeper::tkr_mono In preparation of adding another tkr field, rename this one to tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base, since the structure is not specific to CLOCK_MONOTONIC and the mono name got added to the tk_read_base instance. Lots of trivial churn. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 09:45:06 +01:00
Andi Kleen	294fe0f52a	perf/x86/intel: Add INST_RETIRED.ALL workarounds On Broadwell INST_RETIRED.ALL cannot be used with any period that doesn't have the lowest 6 bits cleared. And the period should not be smaller than 128. This is erratum BDM11 and BDM55: http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf BDM11: When using a period < 100; we may get incorrect PEBS/PMI interrupts and/or an invalid counter state. BDM55: When bit0-5 of the period are !0 we may get redundant PEBS records on overflow. Add a new callback to enforce this, and set it for Broadwell. How does this handle the case when an app requests a specific period with some of the bottom bits set? Short answer: Any useful instruction sampling period needs to be 4-6 orders of magnitude larger than 128, as an PMI every 128 instructions would instantly overwhelm the system and be throttled. So the +-64 error from this is really small compared to the period, much smaller than normal system jitter. Long answer (by Peterz): IFF we guarantee perf_event_attr::sample_period >= 128. Suppose we start out with sample_period=192; then we'll set period_left to 192, we'll end up with left = 128 (we truncate the lower bits). We get an interrupt, find that period_left = 64 (>0 so we return 0 and don't get an overflow handler), up that to 128. Then we trigger again, at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get an overflow). We increment with sample_period so we get left = 128. We fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an overflow). And on and on. So while the individual interrupts are 'wrong' we get then with interval=256,128 in exactly the right ratio to average out at 192. And this works for everything >=128. So the num_samples*fixed_period thing is still entirely correct +- 127, which is good enough I'd say, as you already have that error anyhow. So no need to 'fix' the tools, al we need to do is refuse to create INST_RETIRED:ALL events with sample_period < 128. Signed-off-by: Andi Kleen <ak@linux.intel.com> [ Updated comments and changelog a bit. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 09:14:03 +01:00
Andi Kleen	91f1b70582	perf/x86/intel: Add Broadwell core support Add Broadwell support for Broadwell to perf. The basic support is very similar to Haswell. We use the new cache event list added for Haswell earlier. The only differences are a few bits related to remote nodes. To avoid an extra, mostly identical, table these are patched up in the initialization code. The constraint list has one new event that needs to be handled over Haswell. Includes code and testing from Kan Liang. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1424225886-18652-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 09:14:02 +01:00
Andi Kleen	0f1b5ca240	perf/x86/intel: Add new cache events table for Haswell Haswell offcore events are quite different from Sandy Bridge. Add a new table to handle Haswell properly. Note that the offcore bits listed in the SDM are not quite correct (this is currently being fixed). An uptodate list of bits is in the patch. The basic setup is similar to Sandy Bridge. The prefetch columns have been removed, as prefetch counting is not very reliable on Haswell. One L1 event that is not in the event list anymore has been also removed. - data reads do not include code reads (comparable to earlier Sandy Bridge tables) - data counts include speculative execution (except L1 write, dtlb, bpu) - remote node access includes both remote memory, remote cache, remote mmio. - prefetches are not included in the counts for consistency (different from Sandy Bridge, which includes prefetches in the remote node) Signed-off-by: Andi Kleen <ak@linux.intel.com> [ Removed the HSM30 comments; we don't have them for SNB/IVB either. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1424225886-18652-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 09:14:01 +01:00
Linus Torvalds	0d33cd0afb	Merge git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Marcelo Tosatti: "Fix for higher-order page allocation failures, fix Xen-on-KVM with x2apic, L1 crash with unrestricted guest mode (nested VMX)" * git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: avoid page allocation failure in kvm_set_memory_region() KVM: x86: call irq notifiers with directed EOI KVM: nVMX: mask unrestricted_guest if disabled on L0	2015-03-24 17:13:44 -07:00
Andy Lutomirski	b3494a4ab2	x86/asm/entry: Check for syscall exit work with IRQs disabled We currently have a race: if we're preempted during syscall exit, we can fail to process syscall return work that is queued up while we're preempted in ret_from_sys_call after checking ti.flags. Fix it by disabling interrupts before checking ti.flags. Reported-by: Stefan Seyfried <stefan.seyfried@googlemail.com> Reported-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tejun Heo <tj@kernel.org> Fixes: `96b6352c12` ("x86_64, entry: Remove the syscall exit audit") Link: http://lkml.kernel.org/r/189320d42b4d671df78c10555976bb10af1ffc75.1427137498.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-24 21:08:28 +01:00
Radim Krčmář	c806a6ad35	KVM: x86: call irq notifiers with directed EOI kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled. We need to do that for irq notifiers. (Like with edge interrupts.) Fix it by skipping EOI broadcast only. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=82211 Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Bandan Das <bsd@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-23 20:29:05 -03:00
Peter Zijlstra	50f16a8bf9	perf: Remove type specific target pointers The only reason CQM had to use a hard-coded pmu type was so it could use cqm_target in hw_perf_event. Do away with the {tp,bp,cqm}_target pointers and provide a non type specific one. This allows us to do away with that silly pmu type as well. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Vince Weaver <vince@deater.net> Cc: acme@kernel.org Cc: acme@redhat.com Cc: hpa@zytor.com Cc: jolsa@redhat.com Cc: kanaka.d.juvva@intel.com Cc: matt.fleming@intel.com Cc: tglx@linutronix.de Cc: torvalds@linux-foundation.org Cc: vikas.shivappa@linux.intel.com Link: http://lkml.kernel.org/r/20150305211019.GU21418@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 10:58:04 +01:00
Matt Fleming	4e16ed9941	perf/x86/intel: Fix Makefile to actually build the cqm driver Someone fat fingered a merge conflict and lost the Makefile hunk. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <acme@redhat.com> Cc: <hpa@zytor.com> Cc: <jolsa@redhat.com> Cc: <kanaka.d.juvva@intel.com> Cc: <tglx@linutronix.de> Cc: <torvalds@linux-foundation.org> Cc: <vikas.shivappa@linux.intel.com> Link: http://lkml.kernel.org/r/1424976420.15321.35.camel@mfleming-mobl1.ger.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 10:58:03 +01:00
Linus Torvalds	3d7a6db537	Power management and ACPI fixes for v4.0-rc5 - Revert a recent PCI commit related to IRQ resources management that introduced a regression for drivers attempting to bind to devices whose previous drivers did not balance pci_enable_device() and pci_disable_device() as expected (Rafael J Wysocki). - Fix a deadlock in at91_rtc_interrupt() introduced by a typo in a recent commit related to wakeup interrupt handling (Dan Carpenter). - Allow the power capping RAPL (Running-Average Power Limit) driver to use different energy units for domains within one CPU package which is necessary to handle Intel Haswell EP processors correctly (Jacob Pan). - Improve the cpuidle mvebu driver's handling of Armada XP SoCs by updating the target residency and exit latency numbers for those chips (Sebastien Rannou). - Prevent the cpuidle mvebu driver from calling cpu_pm_enter() twice in a row before cpu_pm_exit() is called on the same CPU which breaks the core's assumptions regarding the usage of those functions (Gregory Clement). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJVDLwIAAoJEILEb/54YlRxSrMQAKn/DwfNMqVWP5vf/YWauSfL S51+E/emGvy+3fwBsa46KkddRQ0ysxc1wKIHcWXc1UPtA+lKNS3MoCUdD+isUFt7 PUrMsblgjh/e6LiXOBqElAiuugVoH7JCVMKvlv5Tsn3qxY3AJEoxGwV7p4XyP6lJ PumqAvWFtaIFKThJFdKPGC511tYTQWoZ/3u843aEsHtpvmiytgUrvxpuCXlSSKT1 vbOdHAJXi0QyQYWIZ0VNN+MZ2WvaU9t1QCpBJUnzZMi2kuG3HP9rzY40GOnoMn6/ jXaxegeT7UX5JY5NWU9VrrVwKzppIpyKW6yckIRcKD+ovwKdGbMrfMco2iyK1xgV Q6B5h5guYTTynjBoi9XO3d7AWN3gM+8OYCPJgcRG2BMQEunlS0D+i3cRDqeHzW0M W+OaENK9MnxG9KVEq0PIrWomGZL1SlOtHfHm9xu8hpqGx4h1iTSgiAEFQQ+Zmmzh +g1OLgddHkWjkPoZ/Y8d1NpdnTf+kbkm8Wqm9Uyie1/HnUJMnHYNbzZTyF4ZjlV2 MAl2P0zBqWhLEDb4STHWHdnBZVhvGCpg1J2pFaSRjDEn+EP0YBH+LscWi//xONNr 5acBoVzid92co+JwrYn3/MYHctV8bBLdXqeGUiuKD6tk9u+aLke24RTBwm5frPBE SjHr1sLhmzubzXtzQIrp =4Oz9 -----END PGP SIGNATURE----- Merge tag 'pm+acpi-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "These are fixes for recent regressions (PCI/ACPI resources and at91 RTC locking), a stable-candidate powercap RAPL driver fix and two ARM cpuidle fixes (one stable-candidate too). Specifics: - Revert a recent PCI commit related to IRQ resources management that introduced a regression for drivers attempting to bind to devices whose previous drivers did not balance pci_enable_device() and pci_disable_device() as expected (Rafael J Wysocki). - Fix a deadlock in at91_rtc_interrupt() introduced by a typo in a recent commit related to wakeup interrupt handling (Dan Carpenter). - Allow the power capping RAPL (Running-Average Power Limit) driver to use different energy units for domains within one CPU package which is necessary to handle Intel Haswell EP processors correctly (Jacob Pan). - Improve the cpuidle mvebu driver's handling of Armada XP SoCs by updating the target residency and exit latency numbers for those chips (Sebastien Rannou). - Prevent the cpuidle mvebu driver from calling cpu_pm_enter() twice in a row before cpu_pm_exit() is called on the same CPU which breaks the core's assumptions regarding the usage of those functions (Gregory Clement)" * tag 'pm+acpi-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "x86/PCI: Refine the way to release PCI IRQ resources" rtc: at91rm9200: double locking bug in at91_rtc_interrupt() powercap / RAPL: handle domains with different energy units cpuidle: mvebu: Update cpuidle thresholds for Armada XP SOCs cpuidle: mvebu: Fix the CPU PM notifier usage	2015-03-21 12:51:36 -07:00
Rafael J. Wysocki	9e8ce4b96b	Revert "x86/PCI: Refine the way to release PCI IRQ resources" Commit `b4b55cda58` (Refine the way to release PCI IRQ resources) introduced a regression in the PCI IRQ resource management by causing the IRQ resource of a device, established when pci_enabled_device() is called on a fully disabled device, to be released when the driver is unbound from the device, regardless of the enable_cnt. This leads to the situation that an ill-behaved driver can now make a device unusable to subsequent drivers by an imbalance in their use of pci_enable/disable_device(). That is a serious problem for secondary drivers like vfio-pci, which are innocent of the transgressions of the previous driver. Since the solution of this problem is not immediate and requires further discussion, revert commit `b4b55cda58` and the issue it was supposed to address (a bug related to xen-pciback) will be taken care of in a different way going forward. Reported-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-03-20 14:56:19 +01:00
Linus Torvalds	ec3fbff030	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "Fix a bug in the ARM XTS implementation that can cause failures in decrypting encrypted disks, and fix is a memory overwrite bug that can cause a crash which can be triggered from userspace" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: aesni - fix memory usage in GCM decryption crypto: arm/aes update NEON AES module to latest OpenSSL version	2015-03-18 11:10:41 -07:00
Radim Krčmář	0790ec172d	KVM: nVMX: mask unrestricted_guest if disabled on L0 If EPT was enabled, unrestricted_guest was allowed in L1 regardless of L0. L1 triple faulted when running L2 guest that required emulation. Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)' in L0's dmesg: WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] () Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when the host doesn't have it enabled. Fixes: `78051e3b7e` ("KVM: nVMX: Disable unrestricted mode if ept=0") Cc: stable@vger.kernel.org Tested-By: Kashyap Chamarthy <kchamart@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-17 22:09:17 -03:00
Linus Torvalds	c58616580e	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes from all around the place: - a KASLR related revert where we ran out of time to get a fix - this represents a substantial portion of the diffstat, - two FPU fixes, - two x86 platform fixes: an ACPI reduced-hw fix and a NumaChip fix, - an entry code fix, - and a VDSO build fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "x86/mm/ASLR: Propagate base load address calculation" x86/fpu: Drop_fpu() should not assume that tsk equals current x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig() x86/apic/numachip: Fix sibling map with NumaChip x86/platform, acpi: Bypass legacy PIC and PIT in ACPI hardware reduced mode x86/asm/entry/32: Fix user_mode() misuses x86/vdso: Fix the build on GCC5	2015-03-17 13:32:17 -07:00
Linus Torvalds	2fc67756e3	Merge git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Marcelo Tosatti: "KVM bug fixes (ARM and x86)" * git://git.kernel.org/pub/scm/virt/kvm/kvm: arm/arm64: KVM: Keep elrsr/aisr in sync with software model KVM: VMX: Set msr bitmap correctly if vcpu is in guest mode arm/arm64: KVM: fix missing unlock on error in kvm_vgic_create() kvm: x86: i8259: return initialized data on invalid-size read arm64: KVM: Fix outdated comment about VTCR_EL2.PS arm64: KVM: Do not use pgd_index to index stage-2 pgd arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting kvm: move advertising of KVM_CAP_IRQFD to common code	2015-03-17 10:31:36 -07:00
Eugene Shatokhin	c80e5c0c23	kprobes/x86: Return correct length in __copy_instruction() On x86-64, __copy_instruction() always returns 0 (error) if the instruction uses %rip-relative addressing. This is because kernel_insn_init() is called the second time for 'insn' instance in such cases and sets all its fields to 0. Because of this, trying to place a kprobe on such instruction will fail, register_kprobe() will return -EINVAL. This patch fixes the problem. Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20150317100918.28349.94654.stgit@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-17 14:00:38 +01:00
Borislav Petkov	69797dafe3	Revert "x86/mm/ASLR: Propagate base load address calculation" This reverts commit: `f47233c2d3` ("x86/mm/ASLR: Propagate base load address calculation") The main reason for the revert is that the new boot flag does not work at all currently, and in order to make this work, we need non-trivial changes to the x86 boot code which we didn't manage to get done in time for merging. And even if we did, they would've been too risky so instead of rushing things and break booting 4.1 on boxes left and right, we will be very strict and conservative and will take our time with this to fix and test it properly. Reported-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Baoquan He <bhe@redhat.com> Cc: H. Peter Anvin <hpa@linux.intel.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Junjie Mao <eternal.n08@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/20150316100628.GD22995@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-16 11:18:21 +01:00
Linus Torvalds	f47e331042	xen: bug fixes for 4.0-rc3 - Fix a PV regression in 3.19. - Fix a dom0 crash on hosts with large numbers of PIRQs. - Prevent pcifront from disabling memory or I/O port access, which may trigger host crashes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJVAx7tAAoJEFxbo/MsZsTRBFwH/2Uoza52iMRhHkC6kLRSAhTQ HxRbObmweDQCqru25IgDsX+09TqCcWMtqnUTwJ5KPt0ZiwPA4GS0n4InJ9ZbrhBM 9lXSWFfCKPUuhL6tyACQul5W4SDmZD0UHNl5uQYMH/C8UhktrdjF+CdUO3AvBAWU uMfwzNsI0HH0uPHhZv6npUoGgI7Pt2Vw7KOilZKCnRBztizQpLb+KUTTBKJT1YDN TsA10rQcmdVMd0Qjry0O0V2Hn3EWwA/1rMl29/6lf5dTcCdQVW1FK2X7B3DXh71D rZKkZYXkXRIcMRzy7JybumIuXfB21nw2jD32ItLFjYjrj7y0H3zxYuLEyocexkc= =pFjm -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-4.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: - fix a PV regression in 3.19. - fix a dom0 crash on hosts with large numbers of PIRQs. - prevent pcifront from disabling memory or I/O port access, which may trigger host crashes. * tag 'stable/for-linus-4.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen-pciback: limit guest control of command register xen/events: avoid NULL pointer dereference in dom0 on large machines xen: Remove trailing semicolon from xenbus_register_frontend() definition x86/xen: correct bug in p2m list initialization	2015-03-13 13:34:38 -07:00
Wincy Van	670125bda1	KVM: VMX: Set msr bitmap correctly if vcpu is in guest mode In commit `3af18d9c5f` ("KVM: nVMX: Prepare for using hardware MSR bitmap"), we are setting MSR_BITMAP in prepare_vmcs02 if we should use hardware. This is not enough since the field will be modified by following vmx_set_efer. Fix this by setting vmx_msr_bitmap_nested in vmx_set_msr_bitmap if vcpu is in guest mode. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-13 09:24:51 -03:00
Oleg Nesterov	f4c3686386	x86/fpu: Drop_fpu() should not assume that tsk equals current drop_fpu() does clear_used_math() and usually this is correct because tsk == current. However switch_fpu_finish()->restore_fpu_checking() is called before __switch_to() updates the "current_task" variable. If it fails, we will wrongly clear the PF_USED_MATH flag of the previous task. So use clear_stopped_child_used_math() instead. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pekka Riikonen <priikone@iki.fi> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150309171041.GB11388@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-13 12:44:29 +01:00
Oleg Nesterov	a7c80ebcac	x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig() math_state_restore() assumes it is called with irqs disabled, but this is not true if the caller is __restore_xstate_sig(). This means that if ia32_fxstate == T and __copy_from_user() fails, __restore_xstate_sig() returns with irqs disabled too. This triggers: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41 dump_stack ___might_sleep ? _raw_spin_unlock_irqrestore __might_sleep down_read ? _raw_spin_unlock_irqrestore print_vma_addr signal_fault sys32_rt_sigreturn Change __restore_xstate_sig() to call set_used_math() unconditionally. This avoids enabling and disabling interrupts in math_state_restore(). If copy_from_user() fails, we can simply do fpu_finit() by hand. [ Note: this is only the first step. math_state_restore() should not check used_math(), it should set this flag. While init_fpu() should simply die. ] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pekka Riikonen <priikone@iki.fi> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150307153844.GB25954@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-13 12:44:28 +01:00
Stephan Mueller	ccfe8c3f7e	crypto: aesni - fix memory usage in GCM decryption The kernel crypto API logic requires the caller to provide the length of (ciphertext \|\| authentication tag) as cryptlen for the AEAD decryption operation. Thus, the cipher implementation must calculate the size of the plaintext output itself and cannot simply use cryptlen. The RFC4106 GCM decryption operation tries to overwrite cryptlen memory in req->dst. As the destination buffer for decryption only needs to hold the plaintext memory but cryptlen references the input buffer holding (ciphertext \|\| authentication tag), the assumption of the destination buffer length in RFC4106 GCM operation leads to a too large size. This patch simply uses the already calculated plaintext size. In addition, this patch fixes the offset calculation of the AAD buffer pointer: as mentioned before, cryptlen already includes the size of the tag. Thus, the tag does not need to be added. With the addition, the AAD will be written beyond the already allocated buffer. Note, this fixes a kernel crash that can be triggered from user space via AF_ALG(aead) -- simply use the libkcapi test application from [1] and update it to use rfc4106-gcm-aes. Using [1], the changes were tested using CAVS vectors to demonstrate that the crypto operation still delivers the right results. [1] http://www.chronox.de/libkcapi.html CC: Tadeusz Struk <tadeusz.struk@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-13 21:32:21 +11:00
Petr Matousek	c1a6bff28c	kvm: x86: i8259: return initialized data on invalid-size read If data is read from PIC with invalid access size, the return data stays uninitialized even though success is returned. Fix this by always initializing the data. Signed-off-by: Petr Matousek <pmatouse@redhat.com> Reported-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-12 22:02:46 -03:00
Daniel J Blueman	c8a470cab0	x86/apic/numachip: Fix sibling map with NumaChip On NumaChip systems, the physical processor ID assignment wasn't accounting for the number of nodes in AMD multi-module processors, giving an incorrect sibling map: $ cd /sys/devices/system/cpu/cpu29/topology $ grep . * core_id:5 core_siblings:00000000,ff000000 core_siblings_list:24-31 physical_package_id:3 thread_siblings:00000000,30000000 thread_siblings_list:28-29 This fixes it: $ cd /sys/devices/system/cpu/cpu29/topology $ grep . * core_id:5 core_siblings:00000000,ffff0000 core_siblings_list:16-31 physical_package_id:1 thread_siblings:00000000,30000000 thread_siblings_list:28-29 Signed-off-by: Daniel J Blueman <daniel@numascale.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Steffen Persvold <sp@numascale.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426135950-10110-1-git-send-email-daniel@numascale.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-12 16:58:59 +01:00
Li, Aubrey	7486341a98	x86/platform, acpi: Bypass legacy PIC and PIT in ACPI hardware reduced mode On a platform in ACPI Hardware-reduced mode, the legacy PIC and PIT may not be initialized even though they may be present in silicon. Touching these legacy components causes unexpected results on the system. On the Bay Trail-T(ASUS-T100) platform, touching these legacy components blocks platform hardware low idle power state(S0ix) during system suspend. So we should bypass them in ACPI hardware reduced mode. Suggested-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Li Aubrey <aubrey.li@linux.intel.com> Cc: <alan@linux.intel.com> Cc: Alan Cox <alan@linux.intel.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Link: http://lkml.kernel.org/r/54FFF81C.20703@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-12 12:07:13 +01:00
Paolo Bonzini	dc9be0fac7	kvm: move advertising of KVM_CAP_IRQFD to common code POWER supports irqfds but forgot to advertise them. Some userspace does not check for the capability, but others check it---thus they work on x86 and s390 but not POWER. To avoid that other architectures in the future make the same mistake, let common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE. Reported-and-tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: `297e21053a` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 21:18:59 -03:00

1 2 3 4 5 ...

20981 Commits