Commit Graph

11820 Commits

Author SHA1 Message Date
Yan, Zheng
156174999d perf/intel/x86: Enlarge the PEBS buffer
Currently the PEBS buffer size is 4k, it can only hold about 21
PEBS records. This patch enlarges the PEBS buffer size to 64k
(the same as the BTS buffer).

64k memory can hold about 330 PEBS records. This will significantly
reduce the number of PMIs when batched PEBS interrupts are enabled.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1430940834-8964-7-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:57 +02:00
Yan, Zheng
9c964efa43 perf/x86/intel: Drain the PEBS buffer during context switches
Flush the PEBS buffer during context switches if PEBS interrupt threshold
is larger than one. This allows perf to supply TID for sample outputs.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1430940834-8964-6-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:54 +02:00
Yan, Zheng
3569c0d7c5 perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)
PEBS always had the capability to log samples to its buffers without
an interrupt. Traditionally perf has not used this but always set the
PEBS threshold to one.

For frequently occurring events (like cycles or branches or load/store)
this in term requires using a relatively high sampling period to avoid
overloading the system, by only processing PMIs. This in term increases
sampling error.

For the common cases we still need to use the PMI because the PEBS
hardware has various limitations. The biggest one is that it can not
supply a callgraph. It also requires setting a fixed period, as the
hardware does not support adaptive period. Another issue is that it
cannot supply a time stamp and some other options. To supply a TID it
requires flushing on context switch. It can however supply the IP, the
load/store address, TSX information, registers, and some other things.

So we can make PEBS work for some specific cases, basically as long as
you can do without a callgraph and can set the period you can use this
new PEBS mode.

The main benefit is the ability to support much lower sampling period
(down to -c 1000) without extensive overhead.

One use cases is for example to increase the resolution of the c2c tool.
Another is double checking when you suspect the standard sampling has
too much sampling error.

Some numbers on the overhead, using cycle soak, comparing the elapsed
time from "kernbench -M -H" between plain (threshold set to one) and
multi (large threshold).

The test command for plain:
  "perf record --time -e cycles:p -c $period -- kernbench -M -H"

The test command for multi:
  "perf record --no-time -e cycles:p -c $period -- kernbench -M -H"

( The only difference of test command between multi and plain is time
  stamp options. Since time stamp is not supported by large PEBS
  threshold, it can be used as a flag to indicate if large threshold is
  enabled during the test. )

	period    plain(Sec)  multi(Sec)  Delta
	10003     32.7        16.5        16.2
	20003     30.2        16.2        14.0
	40003     18.6        14.1        4.5
	80003     16.8        14.6        2.2
	100003    16.9        14.1        2.8
	800003    15.4        15.7        -0.3
	1000003   15.3        15.2        0.2
	2000003   15.3        15.1        0.1

With periods below 100003, plain (threshold one) cause much more
overhead. With 10003 sampling period, the Elapsed Time for multi is
even 2X faster than plain.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1430940834-8964-5-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:49 +02:00
Yan, Zheng
21509084f9 perf/x86/intel: Handle multiple records in the PEBS buffer
When the PEBS interrupt threshold is larger than one record and the
machine supports multiple PEBS events, the records of these events are
mixed up and we need to demultiplex them.

Demuxing the records is hard because the hardware is deficient. The
hardware has two issues that, when combined, create impossible
scenarios to demux.

The first issue is that the 'status' field of the PEBS record is a copy
of the GLOBAL_STATUS MSR at PEBS assist time. To see why this is a
problem let us first describe the regular PEBS cycle:

A) the CTRn value reaches 0:
  - the corresponding bit in GLOBAL_STATUS gets set
  - we start arming the hardware assist
  < some unspecified amount of time later -- this could cover multiple
    events of interest >

B) the hardware assist is armed, any next event will trigger it

C) a matching event happens:
  - the hardware assist triggers and generates a PEBS record
    this includes a copy of GLOBAL_STATUS at this moment
  - if we auto-reload we (re)set CTRn
  - we clear the relevant bit in GLOBAL_STATUS

Now consider the following chain of events:

  A0, B0, A1, C0

The event generated for counter 0 will include a status with counter 1
set, even though its not at all related to the record. A similar thing
can happen with a !PEBS event if it just happens to overflow at the
right moment.

The second issue is that the hardware will only emit one record for two
or more counters if the event that triggers the assist is 'close'. The
'close' can be several cycles. In some cases even the complete assist,
if the event is something that doesn't need retirement.

For instance, consider this chain of events:

  A0, B0, A1, B1, C01

Where C01 is an event that triggers both hardware assists, we will
generate but a single record, but again with both counters listed in the
status field.

This time the record pertains to both events.

Note that these two cases are different but undistinguishable with the
data as generated. Therefore demuxing records with multiple PEBS bits
(we can safely ignore status bits for !PEBS counters) is impossible.

Furthermore we cannot emit the record to both events because that might
cause a data leak -- the events might not have the same privileges -- so
what this patch does is discard such events.

The assumption/hope is that such discards will be rare.

Here lists some possible ways you may get high discard rate.

  - when you count the same thing multiple times. But it is not a useful
    configuration.
  - you can be unfortunate if you measure with a userspace only PEBS
    event along with either a kernel or unrestricted PEBS event. Imagine
    the event triggering and setting the overflow flag right before
    entering the kernel. Then all kernel side events will end up with
    multiple bits set.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
[ Changelog improvements. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1430940834-8964-4-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:45 +02:00
Yan, Zheng
43cf76312f perf/x86/intel: Introduce setup_pebs_sample_data()
Move code that sets up the PEBS sample data to a separate function.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1430940834-8964-3-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:40 +02:00
Yan, Zheng
851559e35f perf/x86/intel: Use the PEBS auto reload mechanism when possible
When a fixed period is specified, this patch makes perf use the PEBS
auto reload mechanism. This makes normal profiling faster, because
it avoids one costly MSR write in the PMI handler.

However, the reset value will be loaded by hardware assist. There is a
small delay compared to the previous non-auto-reload mechanism. The
delay time is arbitrary, but very small. The assist cost is 400-800
cycles, assuming common cases with everything cached. The minimum period
the patch currently uses is 10000. In that extreme case it can be ~10%
if cycles are used.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1430940834-8964-2-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:35 +02:00
Stephane Eranian
7b74cfb2ec perf/x86/intel: add support for PERF_SAMPLE_BRANCH_IND_JUMP
This patch enables support for branch sampling filter
for indirect jumps (IND_JUMP). It enables LBR IND_JMP
filtering where available. There is also software filtering
support.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@redhat.com
Cc: dsahern@gmail.com
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Cc: namhyung@kernel.org
Link: http://lkml.kernel.org/r/1431637800-31061-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:27 +02:00
Frederic Weisbecker
4eaca0a887 preempt: Use preempt_schedule_context() as the official tracing preemption point
preempt_schedule_context() is a tracing safe preemption point but it's
only used when CONFIG_CONTEXT_TRACKING=y. Other configs have tracing
recursion issues since commit:

  b30f0e3ffe ("sched/preempt: Optimize preemption operations on __schedule() callers")

introduced function based preemp_count_*() ops.

Lets make it available on all configs and give it a more appropriate
name for its new position.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433432349-1021-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:57:42 +02:00
Kan Liang
8cf1a3de97 perf/x86/intel/uncore: Fix CBOX bit wide and UBOX reg on Haswell-EP
CBOX counters are increased to 48b on HSX.

Correct the MSR address for HSWEP_U_MSR_PMON_CTR0 and
HSWEP_U_MSR_PMON_CTL0.

See specification in:
http://www.intel.com/content/www/us/en/processors/xeon/
xeon-e5-v3-uncore-performance-monitoring.html

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432645835-7918-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:46:50 +02:00
Andy Shevchenko
7b179b8feb x86/microcode: Correct CPU family related variable types
Change the type of variables and function prototypes to be in
alignment with what the x86_*() / __x86_*() family/model
functions return.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433436928-31903-21-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:38:15 +02:00
Borislav Petkov
ee38a90709 x86/microcode: Disable builtin microcode loading on 32-bit for now
Andy Shevchenko reported machine freezes when booting latest tip
on 32-bit setups. Problem is, the builtin microcode handling cannot
really work that early, when we haven't even enabled paging.

A proper fix would involve handling that case specially as every
other early 32-bit boot case in the microcode loader and would
require much more involved changes for which it is too late now,
more than a week before the upcoming merge window.

So, disable the builtin microcode loading on 32-bit for now.

Reported-and-tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433436928-31903-20-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:38:14 +02:00
Ingo Molnar
c2f9b0af8b Merge branch 'x86/ras' into x86/core, to fix conflicts
Conflicts:
	arch/x86/include/asm/irq_vectors.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:35:27 +02:00
Borislav Petkov
c8e56d20f2 x86: Kill CONFIG_X86_HT
In talking to Aravind recently about making certain AMD topology
attributes available to the MCE injection module, it seemed like
that CONFIG_X86_HT thing is more or less superfluous. It is
def_bool y, depends on SMP and gets enabled in the majority of
.configs - distro and otherwise - out there.

So let's kill it and make code behind it depend directly on SMP.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Walter <dwalter@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433436928-31903-18-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:33:44 +02:00
Ashok Raj
243d657eaf x86/mce: Handle Local MCE events
Add the necessary changes to do_machine_check() to be able to
process MCEs signaled as local MCEs. Typically, only recoverable
errors (SRAR type) will be Signaled as LMCE. The architecture
does not restrict to only those errors, however.

When errors are signaled as LMCE, there is no need for the MCE
handler to perform rendezvous with other logical processors
unlike earlier processors that would broadcast machine check
errors.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1433436928-31903-17-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:33:15 +02:00
Ashok Raj
88d538672e x86/mce: Add infrastructure to support Local MCE
Initialize and prepare for handling LMCEs. Add a boot-time
option to disable LMCEs.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
[ Simplify stuff, align statements for better readability, reflow comments; kill
  unused lmce_clear(); save us an MSR write if LMCE is already enabled. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1433436928-31903-16-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:33:14 +02:00
Linus Torvalds
51d0f0cb3a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - early_idt_handlers[] fix that fixes the build with bleeding edge
     tooling

   - build warning fix on GCC 5.1

   - vm86 fix plus self-test to make it harder to break it again"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
  x86/asm/entry/32, selftests: Add a selftest for kernel entries from VM86 mode
  x86/boot: Add CONFIG_PARAVIRT_SPINLOCKS quirk to arch/x86/boot/compressed/misc.h
  x86/asm/entry/32: Really make user_mode() work correctly for VM86 mode
2015-06-05 10:03:48 -07:00
Linus Torvalds
a0e9c6efa5 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "The biggest chunk of the changes are two regression fixes: a HT
  workaround fix and an event-group scheduling fix.  It's been verified
  with 5 days of fuzzer testing.

  Other fixes:

   - eBPF fix
   - a BIOS breakage detection fix
   - PMU driver fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/pt: Fix a refactoring bug
  perf/x86: Tweak broken BIOS rules during check_hw_exists()
  perf/x86/intel/pt: Untangle pt_buffer_reset_markers()
  perf: Disallow sparse AUX allocations for non-SG PMUs in overwrite mode
  perf/x86: Improve HT workaround GP counter constraint
  perf/x86: Fix event/group validation
  perf: Fix race in BPF program unregister
2015-06-05 10:00:53 -07:00
Wei Yang
f2af7d25b4 x86/boot/setup: Clean up the e820_reserve_setup_data() code
Deobfuscate the 'found' logic, it can be replaced with a simple:

	if (!pa_data)
		return;

and 'found' can be eliminated.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1433398729-8314-1-git-send-email-weiyang@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-05 13:53:22 +02:00
Alexander Shishkin
b44a2b53be perf/x86/intel/pt: Fix a refactoring bug
Commit 066450be41 ("perf/x86/intel/pt: Clean up the control flow
in pt_pmu_hw_init()") changed attribute initialization so that
only the first attribute gets initialized using
sysfs_attr_init(), which upsets lockdep.

This patch fixes the glitch so that all allocated attributes are
properly initialized thus fixing the lockdep warning reported by
Tvrtko and Imre.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reported-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: <linux-kernel@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-04 16:07:51 +02:00
Ingo Molnar
00398a0018 x86/asm/entry: Move the vsyscall code to arch/x86/entry/vsyscall/
The vsyscall code is entry code too, so move it to arch/x86/entry/vsyscall/.

Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-04 07:37:37 +02:00
Dave Airlie
a8a50fce60 Linux 4.1-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVa7zvAAoJEHm+PkMAQRiGtfMIAILs3sxFtrC1hApgcfRLF/7z
 K34bwTRqErzqUO/orTwakEr9kSIpIL0zIPSryTCOTPZLfMGkQjhHXO3KR/DSbbTV
 MZ8y/BM/yelFA/Np+1LjbiYjTNRnTRvCoaQihkIH8Rn02g7ob9HyL4gIGKpuGFcZ
 04GacL2cgChqsRSACdNef948jCoJXKgcuDpe39DXphDWZnBKNZ3HFuJ6bryGJf9A
 1/eCI4is85BNwKPemQUYR0xx83UIzDfrghatZP2mOCDDSA2MNg8HNxLTd12LGoQD
 tfgX4B7aftzW9Y7GSEDfZ0IKm2NRzgPmCVj6PjVR/iI0lIK4Aq0Z/lDJxxEq3XQ=
 =AJM5
 -----END PGP SIGNATURE-----

Merge tag 'v4.1-rc6' into drm-next

Linux 4.1-rc6

backmerge 4.1-rc6 as some of the later pull reqs are based on newer bases
and I'd prefer to do the fixup myself.
2015-06-04 09:23:51 +10:00
Ingo Molnar
905a36a285 x86/asm/entry: Move entry_64.S and entry_32.S to arch/x86/entry/
Create a new directory hierarchy for the low level x86 entry code:

    arch/x86/entry/*

This will host all the low level glue that is currently scattered
all across arch/x86/.

Start with entry_64.S and entry_32.S.

Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-03 18:51:28 +02:00
Stephen Rothwell
d6472302f2 x86/mm: Decouple <linux/vmalloc.h> from <asm/io.h>
Nothing in <asm/io.h> uses anything from <linux/vmalloc.h>, so
remove it from there and fix up the resulting build problems
triggered on x86 {64|32}-bit {def|allmod|allno}configs.

The breakages were triggering in places where x86 builds relied
on vmalloc() facilities but did not include <linux/vmalloc.h>
explicitly and relied on the implicit inclusion via <asm/io.h>.

Also add:

  - <linux/init.h> to <linux/io.h>
  - <asm/pgtable_types> to <asm/io.h>

... which were two other implicit header file dependencies.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[ Tidied up the changelog. ]
Acked-by: David Miller <davem@davemloft.net>
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Colin Cross <ccross@android.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: James E.J. Bottomley <JBottomley@odin.com>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Suma Ramars <sramars@cisco.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-03 12:02:00 +02:00
Ingo Molnar
71966f3a0b Merge branch 'locking/core' into x86/core, to prepare for dependent patch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-03 10:07:35 +02:00
Ingo Molnar
34e7724c07 Merge branches 'x86/mm', 'x86/build', 'x86/apic' and 'x86/platform' into x86/core, to apply dependent patch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-03 10:05:18 +02:00
Borislav Petkov
ee098e1aed x86/cpu: Trim model ID whitespace
We did try trimming whitespace surrounding the 'model name'
field in /proc/cpuinfo since reportedly some userspace uses it
in string comparisons and there were discrepancies:

  [thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
  ______1_model_name      :_AMD_Opteron(TM)_Processor_6272
  _____63_model_name      :_AMD_Opteron(TM)_Processor_6272_________________

However, there were issues with overlapping buffers, string
sizes and non-byte-sized copies in the previous proposed
solutions; see Link tags below for the whole farce.

So, instead of diddling with this more, let's simply extend what
was there originally with trimming any present trailing
whitespace. Final result is really simple and obvious.

Testing with the most insane model IDs qemu can generate, looks
good:

  .model_id = "            My funny model ID CPU          ",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "My funny model ID CPU          ",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "            My funny model ID CPU",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "            ",
  ______4_model_name      :__

  .model_id = "",
  ______4_model_name      :_15/02

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 10:38:11 +02:00
Jan Beulich
2f63b9db72 x86/asm/entry/64: Fold identical code paths
retint_kernel doesn't require %rcx to be pointing to thread info
(anymore?), and the code on the two alternative paths is - not
really surprisingly - identical.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/556C664F020000780007FB64@mail.emea.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 10:10:09 +02:00
Andy Lutomirski
425be5679f x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
The early_idt_handlers asm code generates an array of entry
points spaced nine bytes apart.  It's not really clear from that
code or from the places that reference it what's going on, and
the code only works in the first place because GAS never
generates two-byte JMP instructions when jumping to global
labels.

Clean up the code to generate the correct array stride (member size)
explicitly. This should be considerably more robust against
screw-ups, as GAS will warn if a .fill directive has a negative
count.  Using '. =' to advance would have been even more robust
(it would generate an actual error if it tried to move
backwards), but it would pad with nulls, confusing anyone who
tries to disassemble the code.  The new scheme should be much
clearer to future readers.

While we're at it, improve the comments and rename the array and
common code.

Binutils may start relaxing jumps to non-weak labels.  If so,
this change will fix our build, and we may need to backport this
change.

Before, on x86_64:

  0000000000000000 <early_idt_handlers>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 00 00 00 00          jmpq   9 <early_idt_handlers+0x9>
                          5: R_X86_64_PC32        early_idt_handler-0x4
  ...
    48:   66 90                   xchg   %ax,%ax
    4a:   6a 08                   pushq  $0x8
    4c:   e9 00 00 00 00          jmpq   51 <early_idt_handlers+0x51>
                          4d: R_X86_64_PC32       early_idt_handler-0x4
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   e9 00 00 00 00          jmpq   120 <early_idt_handler>
                          11c: R_X86_64_PC32      early_idt_handler-0x4

After:

  0000000000000000 <early_idt_handler_array>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 14 01 00 00          jmpq   11d <early_idt_handler_common>
  ...
    48:   6a 08                   pushq  $0x8
    4a:   e9 d1 00 00 00          jmpq   120 <early_idt_handler_common>
    4f:   cc                      int3
    50:   cc                      int3
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   eb 03                   jmp    120 <early_idt_handler_common>
   11d:   cc                      int3
   11e:   cc                      int3
   11f:   cc                      int3

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Binutils <binutils@sourceware.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 09:39:40 +02:00
Ingo Molnar
085c789783 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

  - Initialization/Kconfig updates: hide most Kconfig options from unsuspecting users.
    There's now a single high level configuration option:

      *
      * RCU Subsystem
      *
      Make expert-level adjustments to RCU configuration (RCU_EXPERT) [N/y/?] (NEW)

    Which if answered in the negative, leaves us with a single interactive
    configuration option:

      Offload RCU callback processing from boot-selected CPUs (RCU_NOCB_CPU) [N/y/?] (NEW)

    All the rest of the RCU options are configured automatically.

  - Remove all uses of RCU-protected array indexes: replace the
    rcu_[access|dereference]_index_check() APIs with READ_ONCE() and rcu_lockdep_assert().

  - RCU CPU-hotplug cleanups.

  - Updates to Tiny RCU: a race fix and further code shrinkage.

  - RCU torture-testing updates: fixes, speedups, cleanups and
    documentation updates.

  - Miscellaneous fixes.

  - Documentation updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 08:18:34 +02:00
Ingo Molnar
f407a82586 Merge branch 'linus' into sched/core, to resolve conflict
Conflicts:
	arch/sparc/include/asm/topology_64.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 08:05:42 +02:00
Ingo Molnar
131484c8da x86/debug: Remove perpetually broken, unmaintainable dwarf annotations
So the dwarf2 annotations in low level assembly code have
become an increasing hindrance: unreadable, messy macros
mixed into some of the most security sensitive code paths
of the Linux kernel.

These debug info annotations don't even buy the upstream
kernel anything: dwarf driven stack unwinding has caused
problems in the past so it's out of tree, and the upstream
kernel only uses the much more robust framepointers based
stack unwinding method.

In addition to that there's a steady, slow bitrot going
on with these annotations, requiring frequent fixups.
There's no tooling and no functionality upstream that
keeps it correct.

So burn down the sick forest, allowing new, healthier growth:

   27 files changed, 350 insertions(+), 1101 deletions(-)

Someone who has the willingness and time to do this
properly can attempt to reintroduce dwarf debuginfo in x86
assembly code plus dwarf unwinding from first principles,
with the following conditions:

 - it should be maximally readable, and maximally low-key to
   'ordinary' code reading and maintenance.

 - find a build time method to insert dwarf annotations
   automatically in the most common cases, for pop/push
   instructions that manipulate the stack pointer. This could
   be done for example via a preprocessing step that just
   looks for common patterns - plus special annotations for
   the few cases where we want to depart from the default.
   We have hundreds of CFI annotations, so automating most of
   that makes sense.

 - it should come with build tooling checks that ensure that
   CFI annotations are sensible. We've seen such efforts from
   the framepointer side, and there's no reason it couldn't be
   done on the dwarf side.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 07:57:48 +02:00
Luiz Capitulino
0ad83caa21 x86: kvmclock: set scheduler clock stable
If you try to enable NOHZ_FULL on a guest today, you'll get
the following error when the guest tries to deactivate the
scheduler tick:

 WARNING: CPU: 3 PID: 2182 at kernel/time/tick-sched.c:192 can_stop_full_tick+0xb9/0x290()
 NO_HZ FULL will not work with unstable sched clock
 CPU: 3 PID: 2182 Comm: kworker/3:1 Not tainted 4.0.0-10545-gb9bb6fb #204
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 Workqueue: events flush_to_ldisc
  ffffffff8162a0c7 ffff88011f583e88 ffffffff814e6ba0 0000000000000002
  ffff88011f583ed8 ffff88011f583ec8 ffffffff8104d095 ffff88011f583eb8
  0000000000000000 0000000000000003 0000000000000001 0000000000000001
 Call Trace:
  <IRQ>  [<ffffffff814e6ba0>] dump_stack+0x4f/0x7b
  [<ffffffff8104d095>] warn_slowpath_common+0x85/0xc0
  [<ffffffff8104d146>] warn_slowpath_fmt+0x46/0x50
  [<ffffffff810bd2a9>] can_stop_full_tick+0xb9/0x290
  [<ffffffff810bd9ed>] tick_nohz_irq_exit+0x8d/0xb0
  [<ffffffff810511c5>] irq_exit+0xc5/0x130
  [<ffffffff814f180a>] smp_apic_timer_interrupt+0x4a/0x60
  [<ffffffff814eff5e>] apic_timer_interrupt+0x6e/0x80
  <EOI>  [<ffffffff814ee5d1>] ? _raw_spin_unlock_irqrestore+0x31/0x60
  [<ffffffff8108bbc8>] __wake_up+0x48/0x60
  [<ffffffff8134836c>] n_tty_receive_buf_common+0x49c/0xba0
  [<ffffffff8134a6bf>] ? tty_ldisc_ref+0x1f/0x70
  [<ffffffff81348a84>] n_tty_receive_buf2+0x14/0x20
  [<ffffffff8134b390>] flush_to_ldisc+0xe0/0x120
  [<ffffffff81064d05>] process_one_work+0x1d5/0x540
  [<ffffffff81064c81>] ? process_one_work+0x151/0x540
  [<ffffffff81065191>] worker_thread+0x121/0x470
  [<ffffffff81065070>] ? process_one_work+0x540/0x540
  [<ffffffff8106b4df>] kthread+0xef/0x110
  [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
  [<ffffffff814ef4f2>] ret_from_fork+0x42/0x70
  [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
 ---[ end trace 06e3507544a38866 ]---

However, it turns out that kvmclock does provide a stable
sched_clock callback. So, let the scheduler know this which
in turn makes NOHZ_FULL work in the guest.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-29 14:01:52 +02:00
Dan Williams
ad5fb870c4 e820, efi: add ACPI 6.0 persistent memory types
ACPI 6.0 formalizes e820-type-7 and efi-type-14 as persistent memory.
Mark it "reserved" and allow it to be claimed by a persistent memory
device driver.

This definition is in addition to the Linux kernel's existing type-12
definition that was recently added in support of shipping platforms with
NVDIMM support that predate ACPI 6.0 (which now classifies type-12 as
OEM reserved).

Note, /proc/iomem can be consulted for differentiating legacy
"Persistent Memory (legacy)" E820_PRAM vs standard "Persistent Memory"
E820_PMEM.

Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-05-27 21:46:05 -04:00
Paul E. McKenney
29c6820f51 mce: mce_chrdev_write() can be static
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:56:17 -07:00
Paul E. McKenney
e90328b87e mce: Stop using array-index-based RCU primitives
Because mce is arch-specific x86 code, there is little or no
performance benefit of using rcu_dereference_index_check() over using
smp_load_acquire().  It also turns out that mce is the only place that
array-index-based RCU is used, and it would be convenient to drop
this portion of the RCU API.

This patch therefore changes rcu_dereference_index_check() uses to
smp_load_acquire(), but keeping the lockdep diagnostics, and also
changes rcu_access_index() uses to READ_ONCE().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: linux-edac@vger.kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
2015-05-27 12:56:16 -07:00
Bartosz Golaszewski
7d79a7bd75 x86: Replace cpu_**_mask() with topology_**_cpumask()
The former duplicate the functionalities of the latter but are
neither documented nor arch-independent.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benoit Cousson <bcousson@baylibre.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/1432645896-12588-9-git-send-email-bgolaszewski@baylibre.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 15:22:17 +02:00
Bartosz Golaszewski
06931e6224 sched/topology: Rename topology_thread_cpumask() to topology_sibling_cpumask()
Rename topology_thread_cpumask() to topology_sibling_cpumask()
for more consistency with scheduler code.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Benoit Cousson <bcousson@baylibre.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/1432645896-12588-2-git-send-email-bgolaszewski@baylibre.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 15:22:15 +02:00
Luis R. Rodriguez
cb32edf65b x86/mm/pat: Wrap pat_enabled into a function API
We use pat_enabled in x86-specific code to see if PAT is enabled
or not but we're granting full access to it even though readers
do not need to set it. If, for instance, we granted access to it
to modules later they then could override the variable
setting... no bueno.

This renames pat_enabled to a new static variable __pat_enabled.
Folks are redirected to use pat_enabled() now.

Code that sets this can only be internal to pat.c. Apart from
the early kernel parameter "nopat" to disable PAT, we also have
a few cases that disable it later and make use of a helper
pat_disable(). It is wrapped under an ifdef but since that code
cannot run unless PAT was enabled its not required to wrap it
with ifdefs, unwrap that. Likewise, since "nopat" doesn't really
change non-PAT systems just remove that ifdef as well.

Although we could add and use an early_param_off(), these
helpers don't use __read_mostly but we want to keep
__read_mostly for __pat_enabled as this is a hot path -- upon
boot, for instance, a simple guest may see ~4k accesses to
pat_enabled(). Since __read_mostly early boot params are not
that common we don't add a helper for them just yet.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1430425520-22275-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-13-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:41:01 +02:00
Luis R. Rodriguez
f9626104a5 x86/mm/mtrr: Generalize runtime disabling of MTRRs
It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT and end
up with a system with MTRR functionality disabled but PAT
functionality enabled. This can happen, for instance, when the
Xen hypervisor is used where MTRRs are not supported but PAT is.
This can happen on Linux as of commit

  47591df505 ("xen: Support Xen pv-domains using PAT")

by Juergen, introduced in v3.19.

Technically, we should assume the proper CPU bits would be set
to disable MTRRs but we can't always rely on this. At least on
the Xen Hypervisor, for instance, only X86_FEATURE_MTRR was
disabled as of Xen 4.4 through Xen commit 586ab6a [0], but not
X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR, or
X86_FEATURE_CYRIX_ARR for instance.

Roger Pau Monné has clarified though that although this is
technically true we will never support PVH on these CPU types so
Xen has no need to disable these bits on those systems. As per
Roger, AMD K6, Centaur and VIA chips don't have the necessary
hardware extensions to allow running PVH guests [1].

As per Toshi it is also possible for the BIOS to disable MTRR
support, in such cases get_mtrr_state() would update the MTRR
state as per the BIOS, we need to propagate this information as
well.

x86 MTRR code relies on quite a bit of checks for mtrr_if being
set to check to see if MTRRs did get set up. Instead, lets
provide a generic getter for that. This also adds a few checks
where they were not before which could potentially safeguard
ourselves against incorrect usage of MTRR where this was not
desirable.

Where possible match error codes as if MTRRs were disabled on
arch/x86/include/asm/mtrr.h.

Lastly, since disabling MTRRs can happen at run time and we
could end up with PAT enabled, best record now in our logs when
MTRRs are disabled.

[0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a 4.4.0-rc1~18
[1] http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg03460.html

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: bhelgaas@google.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: konrad.wilk@oracle.com
Cc: venkatesh.pallipadi@intel.com
Cc: ville.syrjala@linux.intel.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1426893517-2511-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-12-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:41:01 +02:00
Luis R. Rodriguez
7d010fdf29 x86/mm/mtrr: Avoid #ifdeffery with phys_wc_to_mtrr_index()
There is only one user but since we're going to bury MTRR next
out of access to drivers, expose this last piece of API to
drivers in a general fashion only needing io.h for access to
helpers.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Abhilash Kesavan <a.kesavan@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Cristian Stoica <cristian.stoica@freescale.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1429722736-4473-1-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-11-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:41:00 +02:00
Luis R. Rodriguez
2f9e897353 x86/mm/mtrr, pat: Document Write Combining MTRR type effects on PAT / non-PAT pages
As part of the effort to phase out MTRR use document
write-combining MTRR effects on pages with different non-PAT
page attributes flags and different PAT entry values. Extend
arch_phys_wc_add() documentation to clarify power of two sizes /
boundary requirements as we phase out mtrr_add() use.

Lastly hint towards ioremap_uc() for corner cases on device
drivers working with devices with mixed regions where MTRR size
requirements would otherwise not enable write-combining
effective memory types.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-fbdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1430343851-967-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:40:59 +02:00
Toshi Kani
b73522e0c1 x86/mm/mtrr: Enhance MTRR checks in kernel mapping helpers
This patch adds the argument 'uniform' to mtrr_type_lookup(),
which gets set to 1 when a given range is covered uniformly by
MTRRs, i.e. the range is fully covered by a single MTRR entry or
the default type.

Change pud_set_huge() and pmd_set_huge() to honor the 'uniform'
flag to see if it is safe to create a huge page mapping in the
range.

This allows them to create a huge page mapping in a range
covered by a single MTRR entry of any memory type. It also
detects a non-optimal request properly. They continue to check
with the WB type since it does not effectively change the
uniform mapping even if a request spans multiple MTRR entries.

pmd_set_huge() logs a warning message to a non-optimal request
so that driver writers will be aware of such a case. Drivers
should make a mapping request aligned to a single MTRR entry
when the range is covered by MTRRs.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
[ Realign, flesh out comments, improve warning message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-7-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:40:58 +02:00
Toshi Kani
0cc705f56e x86/mm/mtrr: Clean up mtrr_type_lookup()
MTRRs contain fixed and variable entries. mtrr_type_lookup() may
repeatedly call __mtrr_type_lookup() to handle a request that
overlaps with variable entries.

However, __mtrr_type_lookup() also handles the fixed entries,
which do not have to be repeated. Therefore, this patch creates
separate functions, mtrr_type_lookup_fixed() and
mtrr_type_lookup_variable(), to handle the fixed and variable
ranges respectively.

The patch also updates the function headers to clarify the
return values and output argument. It updates comments to
clarify that the repeating is necessary to handle overlaps with
the default type, since overlaps with multiple entries alone can
be handled without such repeating.

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-6-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:40:57 +02:00
Toshi Kani
3d3ca416d9 x86/mm/mtrr: Use symbolic define as a retval for disabled MTRRs
mtrr_type_lookup() returns verbatim 0xFF when MTRRs are
disabled. This patch defines MTRR_TYPE_INVALID to clarify the
meaning of this value, and documents its usage.

Document the return values of the kernel virtual address mapping
helpers pud_set_huge(), pmd_set_huge, pud_clear_huge() and
pmd_clear_huge().

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-5-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:40:57 +02:00
Toshi Kani
9b3aca6208 x86/mm/mtrr: Fix MTRR state checks in mtrr_type_lookup()
'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
and E (MTRRs enabled) flags in MSR_MTRRdefType.  Intel SDM,
section 11.11.2.1, defines these flags as follows:

 - All MTRRs are disabled when the E flag is clear.
   The FE flag has no affect when the E flag is clear.
 - The default type is enabled when the E flag is set.
 - MTRR variable ranges are enabled when the E flag is set.
 - MTRR fixed ranges are enabled when both E and FE flags
   are set.

MTRR state checks in __mtrr_type_lookup() do not match with SDM.

Hence, this patch makes the following changes:
 - The current code detects MTRRs disabled when both E and
   FE flags are clear in mtrr_state.enabled.  Fix to detect
   MTRRs disabled when the E flag is clear.
 - The current code does not check if the FE bit is set in
   mtrr_state.enabled when looking at the fixed entries.
   Fix to check the FE flag.
 - The current code returns the default type when the E flag
   is clear in mtrr_state.enabled. However, the default type
   is UC when the E flag is clear.  Remove the code as this
   case is handled as MTRR disabled with the 1st change.

In addition, this patch defines the E and FE flags in
mtrr_state.enabled as follows.
 - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
 - E  flag: MTRR_STATE_MTRR_ENABLED

print_mtrr_state() and x86_get_mtrr_mem_range() are also updated
accordingly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-4-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:40:56 +02:00
Toshi Kani
7f0431e3dc x86/mm/mtrr: Fix MTRR lookup to handle an inclusive entry
When an MTRR entry is inclusive to a requested range, i.e. the
start and end of the request are not within the MTRR entry range
but the range contains the MTRR entry entirely:

  range_start ... [mtrr_start ... mtrr_end] ... range_end

__mtrr_type_lookup() ignores such a case because both
start_state and end_state are set to zero.

This bug can cause the following issues:

1) reserve_memtype() tracks an effective memory type in case
   a request type is WB (ex. /dev/mem blindly uses WB). Missing
   to track with its effective type causes a subsequent request
   to map the same range with the effective type to fail.

2) pud_set_huge() and pmd_set_huge() check if a requested range
   has any overlap with MTRRs. Missing to detect an overlap may
   cause a performance penalty or undefined behavior.

This patch fixes the bug by adding a new flag, 'inclusive',
to detect the inclusive case.  This case is then handled in
the same way as end_state:1 since the first region is the same.
With this fix, __mtrr_type_lookup() handles the inclusive case
properly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-3-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:40:56 +02:00
Ingo Molnar
d563a6bb3d Linux 4.1-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVYnloAAoJEHm+PkMAQRiGCgkH/j3r2djOOm4h83FXrShaHORY
 p8TBI3FNj4fzLk2PfzqbmiDw2T2CwygB+pxb2Ac9CE99epw8qPk2SRvPXBpdKR7t
 lolhhwfzApLJMZbhzNLVywUCDUhFoiEWRhmPqIfA3WXFcIW3t5VNXAoIFjV5HFr6
 sYUlaxSI1XiQ5tldVv8D6YSFHms41pisziBIZmzhIUg10P6Vv3D0FbE74fjAJwx0
 +08zj3EO7yQMv7Aeeq8F8AJ3628142rcZf0NWF5ohlKLRK3gt0cl9jO5U4Co2dDt
 29v03LP5EI6jDKkIbuWlqRMq9YxJz7N3wnkzV0EJiqXucoqPLFDqzbxB4gnS1pI=
 =7vbA
 -----END PGP SIGNATURE-----

Merge tag 'v4.1-rc5' into x86/mm, to refresh the tree before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:40:10 +02:00
Xie XiuQi
5c31b2800d x86/mce: Fix monarch timeout setting through the mce= cmdline option
Using "mce=1,10000000" on the kernel cmdline to change the
monarch timeout does not work. The cause is that get_option()
does parse a subsequent comma in the option string and signals
that with a return value. So we don't need to check for a second
comma ourselves.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1432120943-25028-1-git-send-email-xiexiuqi@huawei.com
Link: http://lkml.kernel.org/r/1432628901-18044-19-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:39:14 +02:00
Prarit Bhargava
adafb98da6 x86/cpu: Strip any /proc/cpuinfo model name field whitespace
When comparing the 'model name' field of each core in
/proc/cpuinfo it was noticed that there is a whitespace
difference between the cores' model names.

After some quick investigation it was noticed that the model
name fields were actually different -- processor 0's model name
field had trailing whitespace removed, while the other
processors did not.

Another way of seeing this behaviour is to convert spaces into
underscores in the output of /proc/cpuinfo,

  [thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
  ______1_model_name      :_AMD_Opteron(TM)_Processor_6272
  _____63_model_name      :_AMD_Opteron(TM)_Processor_6272_________________

which shows the discrepancy.

This occurs because the kernel calls strim() on cpu 0's
x86_model_id field to output a pretty message to the console in
print_cpu_info(), and as a result strips the whitespace at the
end of the ->x86_model_id field.

But, the ->x86_model_id field should be the same for the all
identical CPUs in the box. Thus, we need to remove both leading
and trailing whitespace.

As a result, the print_cpu_info() output looks like

  smpboot: CPU0: AMD Opteron(TM) Processor 6272 (fam: 15, model: 01, stepping: 02)

and the x86_model_id field is correct on all processors on AMD
platforms:

  _____64_model_name      :_AMD_Opteron(TM)_Processor_6272

Output is still correct on an Intel box:

  ____144_model_name      :_Intel(R)_Xeon(R)_CPU_E7-8890_v3_@_2.50GHz

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
Link: http://lkml.kernel.org/r/1432628901-18044-15-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:38:24 +02:00
Huang Rui
0fb0328d34 sched/x86: Drop repeated word from mwait_idle() comment
A single "default" is fine.

Signed-off-by: Huang Rui <ray.huang@amd.com>
[ Fix another typo and reflow comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432022472-2224-5-git-send-email-ray.huang@amd.com
Link: http://lkml.kernel.org/r/1432628901-18044-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:38:04 +02:00
Ingo Molnar
d65fcd608f x86/fpu: Simplify copy_kernel_to_xregs_booting()
copy_kernel_to_xregs_booting() has a second parameter that is the mask
of xfeatures that should be copied - but this parameter is always -1.

Simplify the call site of this function, this also makes it more
similar to the function call signature of other copy_kernel_to*regs()
functions.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:33 +02:00
Ingo Molnar
003e2e8b57 x86/fpu: Standardize the parameter type of copy_kernel_to_fpregs()
Bring the __copy_fpstate_to_fpregs() and copy_fpstate_to_fpregs() functions
in line with the parameter passing convention of other kernel-to-FPU-registers
copying functions: pass around an in-memory FPU register state pointer,
instead of struct fpu *.

NOTE: This patch also changes the assembly constraint of the FXSAVE-leak
      workaround from 'fpu->fpregs_active' to 'fpstate' - but that is fine,
      as we only need a valid memory address there for the FILDL instruction.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:32 +02:00
Ingo Molnar
9ccc27a5d2 x86/fpu: Remove error return values from copy_kernel_to_*regs() functions
None of the copy_kernel_to_*regs() FPU register copying functions are
supposed to fail, and all of them have debugging checks that enforce
this.

Remove their return values and simplify their call sites, which have
redundant error checks and error handling code paths.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:30 +02:00
Ingo Molnar
3e1bf47e5c x86/fpu: Rename copy_fpstate_to_fpregs() to copy_kernel_to_fpregs()
Bring the __copy_fpstate_to_fpregs() and copy_fpstate_to_fpregs() functions
in line with the naming of other kernel-to-FPU-registers copying functions.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:29 +02:00
Ingo Molnar
ce2a1e67f1 x86/fpu: Add debugging check to fpu__restore()
The copy_fpstate_to_fpregs() function is never supposed to fail,
so add a debugging check to its call site in fpu__restore().

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:27 +02:00
Ingo Molnar
343763c3b0 x86/fpu: Optimize fpu__activate_fpstate_write()
fpu__activate_fpstate_write() is used before ptrace writes to the fpstate
context. Because it expects the modified registers to be reloaded on the
nexts context switch, it's only valid to call this function for stopped
child tasks.

  - add a debugging check for this assumption

  - remove code that only runs if the current task's FPU state needs
    to be saved, which cannot occur here

  - update comments to match the implementation

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:27 +02:00
Ingo Molnar
6a81d7eb33 x86/fpu: Rename fpu__activate_fpstate() to fpu__activate_fpstate_write()
Remaining users of fpu__activate_fpstate() are all places that want to modify
FPU registers, rename the function to fpu__activate_fpstate_write() according
to this usage.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:26 +02:00
Ingo Molnar
9ba6b79102 x86/fpu: Optimize fpu__activate_fpstate_read()
fpu__activate_fpstate_read() is used before FPU registers are
read from the fpstate by ptrace and core dumping.

It's not necessary to unlazy non-current child tasks in this case,
since the reading of registers is non-destructive.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:25 +02:00
Ingo Molnar
0560281266 x86/fpu: Split out the fpu__activate_fpstate_read() method
Currently fpu__activate_fpstate() is used for two distinct purposes:

  - read access by ptrace and core dumping, where in the core dumping
    case the current task's FPU state may be examined as well.

  - write access by ptrace, which modifies FPU registers and expects
    the modified registers to be reloaded on the next context switch.

Split out the reading side into fpu__activate_fpstate_read().

( Note that this is just a pure duplication of fpu__activate_fpstate()
  for the time being, we'll optimize the new function in the next patch. )

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:24 +02:00
Ingo Molnar
47f01e8cc2 x86/fpu: Fix FPU register read access to the current task
Bobby Powers reported the following FPU warning during ELF coredumping:

   WARNING: CPU: 0 PID: 27452 at arch/x86/kernel/fpu/core.c:324 fpu__activate_stopped+0x8a/0xa0()

This warning unearthed an invalid assumption about fpu__activate_stopped()
that I added in:

  67e97fc2ec ("x86/fpu: Rename init_fpu() to fpu__unlazy_stopped() and add debugging check")

the old init_fpu() function had an (intentional but obscure) side effect:
when FPU registers are accessed for the current task, for reading, then
it synchronized live in-register FPU state with the fpstate by saving it.

So fix this bug by saving the FPU if we are the current task. We'll
still warn in fpu__save() if this is called for not yet stopped
child tasks, so the debugging check is still preserved.

Also rename the function to fpu__activate_fpstate(), because it's not
exclusively used for stopped tasks, but for the current task as well.

( Note that this bug calls for a cleaner separation of access-for-read
  and access-for-modification FPU methods, but we'll do that in separate
  patches. )

Reported-by: Bobby Powers <bobbypowers@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 12:40:18 +02:00
Alexander Shishkin
a82d24edfe perf/x86/intel/pt: Remove redundant variable declaration
There is a 'pt' variable in the outer scope of pt_event_stop() with the same
type, we don't really need another one in the inner scope.

This patch removes the redundant variable declaration.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1432308626-18845-8-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:48 +02:00
Alexander Shishkin
0a487aad2d perf/x86/intel/pt: Kill pt_is_running()
Initially, we were trying to guard against scenarios where somebody
attaches to the system with a hardware debugger while PT is enabled
from software and pt_is_running() tries to make sure we handle this
better, but the truth is, there is still a race window no matter what
and people with hardware debuggers should really know what they are
doing anyway.

In other words, there is no point in keeping this one around, and
it's one RDMSR instructions fewer in the fast path.

The case when PT is enabled by the BIOS at boot time is handled
in the driver initialization path and doesn't use pt_is_running().

This patch gets rid of it.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1429622177-22843-6-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:48 +02:00
Alexander Shishkin
5b1dbd17c0 perf/x86/intel/pt: Document pt_buffer_reset_offsets()
Currently, the description of pt_buffer_reset_offsets() lacks information
about its calling constraints and ordering with regards to other buffer
management functions.

Add a clarification about when this function has to be called.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1429622177-22843-5-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:47 +02:00
Alexander Shishkin
cf302bfdf3 perf/x86/intel/pt: Document pt_buffer_reset_markers()
The comments in the driver don't make it absolutely clear as to what
exactly is the calling order and other possible constraints of buffer
management functions.

Document constraints and calling order for the buffer configuration
functions. While at it, replace a redundant check in
pt_buffer_reset_markers() with an explanation why it is not needed.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1429622177-22843-4-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:47 +02:00
Alexander Shishkin
74387bcb71 perf/x86/intel/pt: Kill an unused variable
Currently, there's a set-but-not-used variable in setup_topa_index();
this patch gets rid of it. And while at it, fixes a style issue with
brackets around a one-line block.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1429622177-22843-2-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:46 +02:00
Peter Zijlstra
ba040653b4 perf/x86/intel: Simplify put_exclusive_constraints()
Don't bother with taking locks if we're not actually going to do
anything. Also, drop the _irqsave(), this is very much only called
from IRQ-disabled context.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:46 +02:00
Peter Zijlstra
8736e548db perf/x86: Simplify the x86_schedule_events() logic
!x && y == ! (x || !y)

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:45 +02:00
Peter Zijlstra
43ef205bde perf/x86/intel: Remove intel_excl_states::init_state
For some obscure reason intel_{start,stop}_scheduling() copy the HT
state to an intermediate array. This would make sense if we ever were
to make changes to it which we'd have to discard.

Except we don't. By the time we call intel_commit_scheduling() we're;
as the name implies; committed to them. We'll never back out.

A further hint its pointless is that stop_scheduling() unconditionally
publishes the state.

So the intermediate array is pointless, modify the state in place and
kill the extra array.

And remove the pointless array initialization: INTEL_EXCL_UNUSED == 0.

Note; all is serialized by intel_excl_cntr::lock.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:45 +02:00
Peter Zijlstra
1fe684e349 perf/x86/intel: Remove pointless tests
Both intel_commit_scheduling() and intel_get_excl_contraints() test
for cntr < 0.

The only way that can happen (aside from a bug) is through
validate_event(), however that is already captured by the
cpuc->is_fake test.

So remove these test and simplify the code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:44 +02:00
Peter Zijlstra
0c41e756b9 perf/x86/intel: Clean up intel_commit_scheduling() placement
Move the code of intel_commit_scheduling() to the right place, which is
in between start() and stop().

No change in functionality.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:44 +02:00
Peter Zijlstra
17186ccda3 perf/x86/intel: Make WARN()ings consistent
The intel_commit_scheduling() callback is pointlessly different from
the start and stop scheduling callback.

Furthermore, the constraint should never be NULL, so remove that test.

Even though we'll never get called (because we NULL the callbacks)
when !is_ht_workaround_enabled() put that test in.

Collapse the (pointless) WARN_ON_ONCE() and bail on !cpuc->excl_cntrs --
this is doubly pointless, because its the same condition as
is_ht_workaround_enabled() which was already pointless because the
whole method won't ever be called.

Furthremore, make all the !excl_cntrs test WARN_ON_ONCE(); they're all
pointless, because the above, either the function
({get,put}_excl_constraint) are already predicated on it existing or
the is_ht_workaround_enabled() thing is the same test.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:43 +02:00
Peter Zijlstra
aaf932e816 perf/x86/intel: Simplify the dynamic constraint code somewhat
We have two 'struct event_constraint' local variables in
intel_get_excl_constraints(): 'cx' and 'c'.

Instead of using 'cx' after the dynamic allocation, put all 'cx' inside
the dynamic allocation block and use 'c' outside of it.

Also use direct assignment to copy the structure; let the compiler
figure it out.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:43 +02:00
Peter Zijlstra
b32ed7f5de perf/x86/intel: Add lockdep assert
Lockdep is very good at finding incorrect IRQ state while locking and
is far better at telling us if we hold a lock than the _is_locked()
API. It also generates less code for !DEBUG kernels.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:42 +02:00
Peter Zijlstra
1c565833ac perf/x86/intel: Correct local vs remote sibling state
For some obscure reason the current code accounts the current SMT
thread's state on the remote thread and reads the remote's state on
the local SMT thread.

While internally consistent, and 'correct' its pointless confusion we
can do without.

Flip them the right way around.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:42 +02:00
Matt Fleming
adafa99960 perf/x86/intel/cqm: Use 'u32' data type for RMIDs
Since we write RMID values to MSRs the correct type to use is 'u32'
because that clearly articulates we're writing a hardware register
value.

Fix up all uses of RMID in this code to consistently use the correct data
type.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/1432285182-17180-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:41 +02:00
Thomas Gleixner
bf926731e1 perf/x86/intel/cqm: Add storage for 'closid' and clean up 'struct intel_pqr_state'
'closid' (CLass Of Service ID) is used for the Class based Cache
Allocation Technology (CAT). Add explicit storage to the per cpu cache
for it, so it can be used later with the CAT support (requires to move
the per cpu data).

While at it:

 - Rename the structure to intel_pqr_state which reflects the actual
   purpose of the struct: cache values which go into the PQR MSR

 - Rename 'cnt' to rmid_usecnt which reflects the actual purpose of
   the counter.

 - Document the structure and the struct members.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235150.240899319@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:41 +02:00
Thomas Gleixner
43d0c2f6dc perf/x86/intel/cqm: Remove useless wrapper function
intel_cqm_event_del() is a 1:1 wrapper for intel_cqm_event_stop().
Remove the useless indirection.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235150.159779847@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:40 +02:00
Thomas Gleixner
0bac237845 perf/x86/intel/cqm: Avoid pointless MSR write
If the usage counter is non-zero there is no point to update the rmid
in the PQR MSR.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235150.080844281@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:40 +02:00
Thomas Gleixner
9e7eaac95a perf/x86/intel/cqm: Remove pointless spinlock from state cache
'struct intel_cqm_state' is a strict per CPU cache of the rmid and the
usage counter. It can never be modified from a remote CPU.

The three functions which modify the content: intel_cqm_event[start|stop|del]
(del maps to stop) are called from the perf core with interrupts disabled
which is enough protection for the per CPU state values.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235150.001006529@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:39 +02:00
Thomas Gleixner
b3df4ec442 perf/x86/intel/cqm: Use proper data types
'int' is really not a proper data type for an MSR. Use u32 to make it
clear that we are dealing with a 32-bit unsigned hardware value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235149.919350144@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:39 +02:00
Thomas Gleixner
f4d9757ca6 perf/x86/intel/cqm: Document PQR MSR abuse
The CQM code acts like it owns the PQR MSR completely. That's not true
because only the lower 10 bits are used for CQM. The upper 32 bits are
used for the 'CLass Of Service ID' (CLOSID). Document the abuse. Will be
fixed in a later patch.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235149.823214798@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:38 +02:00
Ingo Molnar
8d12ded3dd Merge branch 'perf/urgent' into perf/core, before applying dependent patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:21 +02:00
Don Zickus
68ab747604 perf/x86: Tweak broken BIOS rules during check_hw_exists()
I stumbled upon an AMD box that had the BIOS using a hardware performance
counter. Instead of printing out a warning and continuing, it failed and
blocked further perf counter usage.

Looking through the history, I found this commit:

  a5ebe0ba3d ("perf/x86: Check all MSRs before passing hw check")

which tweaked the rules for a Xen guest on an almost identical box and now
changed the behaviour.

Unfortunately the rules were tweaked incorrectly and will always lead to
MSR failures even though the MSRs are completely fine.

What happens now is in arch/x86/kernel/cpu/perf_event.c::check_hw_exists():

<snip>
        for (i = 0; i < x86_pmu.num_counters; i++) {
                reg = x86_pmu_config_addr(i);
                ret = rdmsrl_safe(reg, &val);
                if (ret)
                        goto msr_fail;
                if (val & ARCH_PERFMON_EVENTSEL_ENABLE) {
                        bios_fail = 1;
                        val_fail = val;
                        reg_fail = reg;
                }
        }

<snip>
        /*
         * Read the current value, change it and read it back to see if it
         * matches, this is needed to detect certain hardware emulators
         * (qemu/kvm) that don't trap on the MSR access and always return 0s.
         */
        reg = x86_pmu_event_addr(0);
				^^^^

if the first perf counter is enabled, then this routine will always fail
because the counter is running. :-(

        if (rdmsrl_safe(reg, &val))
                goto msr_fail;
        val ^= 0xffffUL;
        ret = wrmsrl_safe(reg, val);
        ret |= rdmsrl_safe(reg, &val_new);
        if (ret || val != val_new)
                goto msr_fail;

The above bios_fail used to be a 'goto' which is why it worked in the past.

Further, most vendors have migrated to using fixed counters to hide their
evilness hence this problem rarely shows up now days except on a few old boxes.

I fixed my problem and kept the spirit of the original Xen fix, by recording a
safe non-enable register to be used safely for the reading/writing check.
Because it is not enabled, this passes on bare metal boxes (like metal), but
should continue to throw an msr_fail on Xen guests because the register isn't
emulated yet.

Now I get a proper bios_fail error message and Xen should still see their
msr_fail message (untested).

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: george.dunlap@eu.citrix.com
Cc: konrad.wilk@oracle.com
Link: http://lkml.kernel.org/r/1431976608-56970-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:16:20 +02:00
Alexander Shishkin
f73ec48c90 perf/x86/intel/pt: Untangle pt_buffer_reset_markers()
Currently, pt_buffer_reset_markers() is a difficult to read knot of
arithmetics with a redundant check for multiple-entry TOPA capability,
a commented out wakeup marker placement and a logical error wrt to
stop marker placement. The latter happens when write head is not page
aligned and results in stop marker being placed one page earlier than
it actually should.

All these problems only affect PT implementations that support
multiple-entry TOPA tables (read: proper scatter-gather).

For single-entry TOPA implementations, there is no functional impact.

This patch deals with all of the above. Tested on both single-entry
and multiple-entry TOPA PT implementations.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1432308626-18845-4-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:16:20 +02:00
Peter Zijlstra
cc1790cf54 perf/x86: Improve HT workaround GP counter constraint
The (SNB/IVB/HSW) HT bug only affects events that can be programmed
onto GP counters, therefore we should only limit the number of GP
counters that can be used per cpu -- iow we should not constrain the
FP counters.

Furthermore, we should only enfore such a limit when there are in fact
exclusive events being scheduled on either sibling.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Fixed build fail for the !CONFIG_CPU_SUP_INTEL case. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:16:03 +02:00
Peter Zijlstra
b371b59431 perf/x86: Fix event/group validation
Commit 43b4578071 ("perf/x86: Reduce stack usage of
x86_schedule_events()") violated the rule that 'fake' scheduling; as
used for event/group validation; should not change the event state.

This went mostly un-noticed because repeated calls of
x86_pmu::get_event_constraints() would give the same result. And
x86_pmu::put_event_constraints() would mostly not do anything.

Commit e979121b1b ("perf/x86/intel: Implement cross-HT corruption
bug workaround") made the situation much worse by actually setting the
event->hw.constraint value to NULL, so when validation and actual
scheduling interact we get NULL ptr derefs.

Fix it by removing the constraint pointer from the event and move it
back to an array, this time in cpuc instead of on the stack.

validate_group()
  x86_schedule_events()
    event->hw.constraint = c; # store

      <context switch>
        perf_task_event_sched_in()
          ...
            x86_schedule_events();
              event->hw.constraint = c2; # store

              ...

              put_event_constraints(event); # assume failure to schedule
                intel_put_event_constraints()
                  event->hw.constraint = NULL;

      <context switch end>

    c = event->hw.constraint; # read -> NULL

    if (!test_bit(hwc->idx, c->idxmsk)) # <- *BOOM* NULL deref

This in particular is possible when the event in question is a
cpu-wide event and group-leader, where the validate_group() tries to
add an event to the group.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 43b4578071 ("perf/x86: Reduce stack usage of x86_schedule_events()")
Fixes: e979121b1b ("perf/x86/intel: Implement cross-HT corruption bug workaround")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 08:46:44 +02:00
Ingo Molnar
6e5535940f x86/fpu: Fix fpu__init_system_xstate() comments
Remove obsolete comment about __init limitations: in the new code there aren't any.

Also standardize the comment style in the function while at it.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-25 12:49:34 +02:00
Andy Lutomirski
cdeb604894 x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
The early_idt_handlers asm code generates an array of entry
points spaced nine bytes apart.  It's not really clear from that
code or from the places that reference it what's going on, and
the code only works in the first place because GAS never
generates two-byte JMP instructions when jumping to global
labels.

Clean up the code to generate the correct array stride (member size)
explicitly. This should be considerably more robust against
screw-ups, as GAS will warn if a .fill directive has a negative
count.  Using '. =' to advance would have been even more robust
(it would generate an actual error if it tried to move
backwards), but it would pad with nulls, confusing anyone who
tries to disassemble the code.  The new scheme should be much
clearer to future readers.

While we're at it, improve the comments and rename the array and
common code.

Binutils may start relaxing jumps to non-weak labels.  If so,
this change will fix our build, and we may need to backport this
change.

Before, on x86_64:

  0000000000000000 <early_idt_handlers>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 00 00 00 00          jmpq   9 <early_idt_handlers+0x9>
                          5: R_X86_64_PC32        early_idt_handler-0x4
  ...
    48:   66 90                   xchg   %ax,%ax
    4a:   6a 08                   pushq  $0x8
    4c:   e9 00 00 00 00          jmpq   51 <early_idt_handlers+0x51>
                          4d: R_X86_64_PC32       early_idt_handler-0x4
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   e9 00 00 00 00          jmpq   120 <early_idt_handler>
                          11c: R_X86_64_PC32      early_idt_handler-0x4

After:

  0000000000000000 <early_idt_handler_array>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 14 01 00 00          jmpq   11d <early_idt_handler_common>
  ...
    48:   6a 08                   pushq  $0x8
    4a:   e9 d1 00 00 00          jmpq   120 <early_idt_handler_common>
    4f:   cc                      int3
    50:   cc                      int3
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   eb 03                   jmp    120 <early_idt_handler_common>
   11d:   cc                      int3
   11e:   cc                      int3
   11f:   cc                      int3

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Binutils <binutils@sourceware.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-24 08:35:03 +02:00
Ingo Molnar
6f56a8d024 Merge branch 'x86/urgent' into x86/fpu, to resolve a conflict
Conflicts:
	arch/x86/kernel/i387.c

This commit is conflicting:

  e88221c50c ("x86/fpu: Disable XSAVES* support for now")

These functions changed a lot, move the quirk to arch/x86/kernel/fpu/init.c's
fpu__init_system_xstate_size_legacy().

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 12:01:01 +02:00
Ingo Molnar
e88221c50c x86/fpu: Disable XSAVES* support for now
The kernel's handling of 'compacted' xsave state layout is buggy:

    http://marc.info/?l=linux-kernel&m=142967852317199

I don't have such a system, and the description there is vague, but
from extrapolation I guess that there were two kinds of bugs
observed:

  - boot crashes, due to size calculations being wrong and the dynamic
    allocation allocating a too small xstate area. (This is now fixed
    in the new FPU code - but still present in stable kernels.)

  - FPU state corruption and ABI breakage: if signal handlers try to
    change the FPU state in standard format, which then the kernel
    tries to restore in the compacted format.

These breakages are scary, but they only occur on a small number of
systems that have XSAVES* CPU support. Yet we have had XSAVES support
in the upstream kernel for a large number of stable kernel releases,
and the fixes are involved and unproven.

So do the safe resolution first: disable XSAVES* support and only
use the standard xstate format. This makes the code work and is
easy to backport.

On top of this we can work on enabling (and testing!) proper
compacted format support, without backporting pressure, on top of the
new, cleaned up FPU code.

Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 11:58:26 +02:00
Nicholas Krause
ed3cf15271 kvm: x86: Make functions that have no external callers static
This makes the functions kvm_guest_cpu_init and  kvm_init_debugfs
static now due to having no external callers outside their
declarations in the file, kvm.c.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20 11:48:21 +02:00
Ingo Molnar
5856afed0c x86/fpu/init: Clean up and comment the __setup() functions
Explain the functions and also standardize their style
and naming.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 11:39:35 +02:00
Ingo Molnar
7cf82d33b6 x86/fpu/init: Move __setup() functions to fpu/init.c
We had a number of FPU init related boot option handlers
in arch/x86/kernel/cpu/common.c - move them over into
arch/x86/kernel/fpu/init.c to have them all in a
single place.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 11:35:42 +02:00
Dave Airlie
bdcddf95e8 Linux 4.1-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVWh3TAAoJEHm+PkMAQRiG/kwH/2c9irodp2+M9OUnX2bfsBb6
 LnChiDpvkF5BB8jhP6d/XmvPp4NJzAbTxByhjdfb2E2HkorCUHCOIn2tI1TE2pUs
 2qjkOVH+XCzoV0goGtQjzK1ht8f2IrtlDiEjyRekK5cJHzhggb22QPtWL4npyd0O
 reDmG2jsRaF9POr9uLSFEv4CEnkksmRLUU0vuQX0TZeCJ41O7TXrkN/wKrLZ5mj4
 IWpqXQaSlrffq/T5HnVbXBxk3/T8QmhrIoppiMpV1mUVj0uTqlFRNi5qwT2Nit1h
 FVljWI4+WgOk3bf7fUlp+ahopjkTgu+GuXkiRP/pdgWNQO0cxCWSAzSndAlIIAE=
 =uOoJ
 -----END PGP SIGNATURE-----

Backmerge v4.1-rc4 into into drm-next

We picked up a silent conflict in amdkfd with drm-fixes and drm-next,
backmerge v4.1-rc5 and fix the conflicts

Signed-off-by: Dave Airlie <airlied@redhat.com>

Conflicts:
	drivers/gpu/drm/drm_irq.c
2015-05-20 16:23:53 +10:00
Paolo Bonzini
c35ebbeade Revert "kvmclock: set scheduler clock stable"
This reverts commit ff7bbb9c6a.
Sasha Levin is seeing odd jump in time values during boot of a KVM guest:

[...]
[    0.000000] tsc: Detected 2260.998 MHz processor
[3376355.247558] Calibrating delay loop (skipped) preset value..
[...]

and bisected them to this commit.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:37 +02:00
Thomas Gleixner
c3b5d3cea5 Merge branch 'linus' into timers/core
Make sure the upstream fixes are applied before adding further
modifications.
2015-05-19 16:12:32 +02:00
Feng Wu
501b32653e x86/irq: Show statistics information for posted-interrupts
Show the statistics information for notification event
and wakeup event for posted-interrupt in /proc/interrupts.

[ tglx: Named the short identifiers PIN and PIW to match the long
  	identifiers ]

Signed-off-by: Feng Wu <feng.wu@intel.com>
Cc: jiang.liu@linux.intel.com
Link: http://lkml.kernel.org/r/1432026437-16560-5-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19 15:51:17 +02:00
Feng Wu
f6b3c72c23 x86/irq: Define a global vector for VT-d Posted-Interrupts
Currently, we use a global vector as the Posted-Interrupts
Notification Event for all the vCPUs in the system. We need
to introduce another global vector for VT-d Posted-Interrtups,
which will be used to wakeup the sleep vCPU when an external
interrupt from a direct-assigned device happens for that vCPU.

[ tglx: Removed a gazillion of extra newlines ]

Signed-off-by: Feng Wu <feng.wu@intel.com>
Cc: jiang.liu@linux.intel.com
Link: http://lkml.kernel.org/r/1432026437-16560-4-git-send-email-feng.wu@intel.com
Suggested-by: Yang Zhang <yang.z.zhang@intel.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19 15:51:17 +02:00
Feng Wu
a2f1c8bdc0 x86/irq/msi: Implement irq_set_vcpu_affinity for remapped MSI irqs
Implement irq_set_vcpu_affinity for pci_msi_ir_controller.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/1432026437-16560-3-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19 15:51:17 +02:00
Ingo Molnar
e97131a839 x86/fpu: Add CONFIG_X86_DEBUG_FPU=y FPU debugging code
There are various internal FPU state debugging checks that never
trigger in practice, but which are useful for FPU code development.

Separate these out into CONFIG_X86_DEBUG_FPU=y, and also add a
couple of new ones.

The size difference is about 0.5K of code on defconfig:

   text        data     bss          filename
   15028906    2578816  1638400      vmlinux
   15029430    2578816  1638400      vmlinux

( Keep this enabled by default until the new FPU code is debugged. )

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:12 +02:00
Ingo Molnar
d364a7656c x86/fpu: Fix the 'nofxsr' boot parameter to also clear X86_FEATURE_FXSR_OPT
I tried to simulate an ancient CPU via this option, and
found that it still has fxsr_opt enabled, confusing the
FPU code.

Make the 'nofxsr' option also clear FXSR_OPT flag.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:12 +02:00
Ingo Molnar
e1884d69f6 x86/fpu: Pass 'struct fpu' to fpu__restore()
This cleans up the call sites and the function a bit,
and also makes it more symmetric with the other high
level FPU state handling functions.

It's still only valid for the current task, as we copy
to the FPU registers of the current CPU.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:11 +02:00
Ingo Molnar
32231879f6 x86/fpu/init: Propagate __init annotations
Now that all the FPU init function call dependencies are
cleaned up we can propagate __init annotations deeper.

This shrinks the runtime size of the kernel a bit, and
also addresses a few section warnings.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:11 +02:00
Ingo Molnar
5fd402dfa7 x86/fpu/xstate: Clean up setup_xstate_comp() call
So call setup_xstate_comp() from the xstate init code, not
from the generic fpu__init_system() code.

This allows us to remove the protytype from xstate.h as well.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:11 +02:00
Ingo Molnar
39f1acd243 x86/fpu/xstate: Don't assume the first zero xfeatures zero bit means the end
The current xstate code in setup_xstate_features() assumes that
the first zero bit means the end of xfeatures - but that is not
so, the SDM clearly states that an arbitrary set of xfeatures
might be enabled - and it is also clear from the description
of the compaction feature that holes are possible:

  "13-6 Vol. 1MANAGING STATE USING THE XSAVE FEATURE SET
  [...]

  Compacted format. Each state component i (i ≥ 2) is located at a byte
  offset from the base address of the XSAVE area based on the XCOMP_BV
  field in the XSAVE header:

  — If XCOMP_BV[i] = 0, state component i is not in the XSAVE area.

  — If XCOMP_BV[i] = 1, the following items apply:

  • If XCOMP_BV[j] = 0 for every j, 2 ≤ j < i, state component i is
    located at a byte offset 576 from the base address of the XSAVE
    area. (This item applies if i is the first bit set in bits 62:2 of
    the XCOMP_BV; it implies that state component i is located at the
    beginning of the extended region.)

  • Otherwise, let j, 2 ≤ j < i, be the greatest value such that
    XCOMP_BV[j] = 1. Then state component i is located at a byte offset
    X from the location of state component j, where X is the number of
    bytes required for state component j as enumerated in
    CPUID.(EAX=0DH,ECX=j):EAX. (This item implies that state component i
    immediately follows the preceding state component whose bit is set
    in XCOMP_BV.)"

So don't assume that the first zero xfeatures bit means the end of
all xfeatures - iterate through all of them.

I'm not aware of hardware that triggers this currently.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:10 +02:00
Ingo Molnar
63c6680cd0 x86/fpu: Move debugging check from kernel_fpu_begin() to __kernel_fpu_begin()
kernel_fpu_begin() is __kernel_fpu_begin() with a preempt_disable().

Move the kernel_fpu_begin() debugging check into __kernel_fpu_begin(),
so that users of __kernel_fpu_begin() may benefit from it as well.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:10 +02:00
Ingo Molnar
aeb997b9f2 x86/fpu: Change fpu->fpregs_active from 'int' to 'char', add lazy switching comments
Improve the memory layout of 'struct fpu':

 - change ->fpregs_active from 'int' to 'char' - it's just a single flag
   and modern x86 CPUs can do efficient byte accesses.

 - pack related fields closer to each other: often 'fpu->state' will not be
   touched, while the other fields will - so pack them into a group.

Also add comments to each field, describing their purpose, and add
some background information about lazy restores.

Also fix an obsolete, lazy switching related comment in fpu_copy()'s description.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:09 +02:00
Ingo Molnar
c47ada305d x86/fpu: Harmonize FPU register state types
Use these consistent names:

    struct fregs_state           # was: i387_fsave_struct
    struct fxregs_state          # was: i387_fxsave_struct
    struct swregs_state          # was: i387_soft_struct
    struct xregs_state           # was: xsave_struct
    union  fpregs_state          # was: thread_xstate

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:09 +02:00
Ingo Molnar
0c306bcfba x86/fpu: Factor out the FPU regset code into fpu/regset.c
So much of fpu/core.c is the regset code, but it just obscures the generic
FPU state machine logic. Factor out the regset code into fpu/regset.c, where
it can be read in isolation.

This affects one API: fpu__activate_stopped() has to be made available
from the core to fpu/regset.c.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:09 +02:00
Ingo Molnar
b992c660d3 x86/fpu: Factor out fpu/signal.c
fpu/xstate.c has a lot of generic FPU signal frame handling routines,
move them into a separate file: fpu/signal.c.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:08 +02:00
Ingo Molnar
c681314421 x86/fpu: Rename all the fpregs, xregs, fxregs and fregs handling functions
Standardize the naming of the various functions that copy register
content in specific FPU context formats:

  copy_fxregs_to_kernel()         # was: fpu_fxsave()
  copy_xregs_to_kernel()          # was: xsave_state()

  copy_kernel_to_fregs()          # was: frstor_checking()
  copy_kernel_to_fxregs()         # was: fxrstor_checking()
  copy_kernel_to_xregs()          # was: fpu_xrstor_checking()
  copy_kernel_to_xregs_booting()  # was: xrstor_state_booting()

  copy_fregs_to_user()            # was: fsave_user()
  copy_fxregs_to_user()           # was: fxsave_user()
  copy_xregs_to_user()            # was: xsave_user()

  copy_user_to_fregs()            # was: frstor_user()
  copy_user_to_fxregs()           # was: fxrstor_user()
  copy_user_to_xregs()            # was: xrestore_user()
  copy_user_to_fpregs_zeroing()   # was: restore_user_xstate()

Eliminate fpu_xrstor_checking(), because it was just a wrapper.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:08 +02:00
Ingo Molnar
815418890e x86/fpu: Move restore_init_xstate() out of fpu/internal.h
Move restore_init_xstate() next to its sole caller.

Also rename it to copy_init_fpstate_to_fpregs() and add
some comments about what it does.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:08 +02:00
Ingo Molnar
6f57502310 x86/fpu: Generalize 'init_xstate_ctx'
So the handling of init_xstate_ctx has a layering violation: both
'struct xsave_struct' and 'union thread_xstate' have a
'struct i387_fxsave_struct' member:

   xsave_struct::i387
   thread_xstate::fxsave

The handling of init_xstate_ctx is generic, it is used on all
CPUs, with or without XSAVE instruction. So it's confusing how
the generic code passes around and handles an XSAVE specific
format.

What we really want is for init_xstate_ctx to be a proper
fpstate and we use its ::fxsave and ::xsave members, as
appropriate.

Since the xsave_struct::i387 and thread_xstate::fxsave aliases
each other this is not a functional problem.

So implement this, and move init_xstate_ctx to the generic FPU
code in the process.

Also, since init_xstate_ctx is not XSAVE specific anymore,
rename it to init_fpstate, and mark it __read_mostly,
because it's only modified once during bootup, and used
as a reference fpstate later on.

There's no change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:07 +02:00
Ingo Molnar
bf935b0b52 x86/fpu: Create 'union thread_xstate' helper for fpstate_init()
fpstate_init() only uses fpu->state, so pass that in to it.

This enables the cleanup we will do in the next patch.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:07 +02:00
Ingo Molnar
0aba697894 x86/fpu: Harmonize the names of the fpstate_init() helper functions
Harmonize the inconsistent naming of these related functions:

                          fpstate_init()
  finit_soft_fpu()   =>   fpstate_init_fsoft()
  fx_finit()         =>   fpstate_init_fxstate()
  fx_finit()         =>   fpstate_init_fstate()       # split out

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:07 +02:00
Ingo Molnar
e1cebad49c x86/fpu: Factor out the exception error code handling code
Factor out the FPU error code handling code from traps.c and fpu/internal.h
and move them close to each other.

Also convert the helper functions to 'struct fpu *', which further simplifies
them.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:06 +02:00
Ingo Molnar
acd58a3ad0 x86/fpu: Remove run-once init quirks
Remove various boot quirks that came from the old code.

The new code is cleanly split up into per-system and per-cpu
init sequences, and system init functions are only called once.

Remove the run-once quirks.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:06 +02:00
Ingo Molnar
59a36d16be x86/fpu: Factor out fpu/regset.h from fpu/internal.h
Only a few places use the regset definitions, so factor them out.

Also fix related header dependency assumptions.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:06 +02:00
Ingo Molnar
fcbc99c403 x86/fpu: Split out fpu/signal.h from fpu/internal.h for signal frame handling functions
Most of the FPU does not use them, so split it out and include
them in signal.c and ia32_signal.c

Also fix header file dependency assumption in fpu/core.c.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:05 +02:00
Ingo Molnar
05012c13f6 x86/fpu: Move is_ia32*frame() helpers out of fpu/internal.h
Move them to their only user. This makes the code easier to read,
the header is less cluttered, and it also speeds up the build a bit.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:05 +02:00
Ingo Molnar
fbce778246 x86/fpu: Merge fpu__reset() and fpu__clear()
With recent cleanups and fixes the fpu__reset() and fpu__clear()
functions have become almost identical in functionality: the only
difference is that fpu__reset() assumed that the fpstate
was already active in the eagerfpu case, while fpu__clear()
activated it if it was inactive.

This distinction almost never matters, the only case where such
fpstate activation happens if if the init thread (PID 1) gets exec()-ed
for the first time.

So keep fpu__clear() and change all fpu__reset() uses to
fpu__clear() to simpify the logic.

( In a later patch we'll further simplify fpu__clear() by making
  sure that all contexts it is called on are already active. )

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:05 +02:00
Ingo Molnar
82c0e45eb5 x86/fpu: Move the signal frame handling code closer to each other
Consolidate more signal frame related functions:

   text      data    bss     dec       filename
   14108070  2575280 1634304 18317654  vmlinux.before
   14107944  2575344 1634304 18317592  vmlinux.after

Also, while moving it, rename alloc_mathframe() to fpu__alloc_mathframe().

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:04 +02:00
Ingo Molnar
9dfe99b755 x86/fpu: Rename restore_xstate_sig() to fpu__restore_sig()
restore_xstate_sig() is a misnomer: it's not limited to 'xstate' at all,
it is the high level 'restore FPU state from a signal frame' function
that works with all legacy FPU formats as well.

Rename it (and its helper) accordingly, and also move it to the
fpu__*() namespace.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:04 +02:00
Ingo Molnar
04c8e01d50 x86/fpu: Move fpu__clear() to 'struct fpu *' parameter passing
Do it like all other high level FPU state handling functions: they
only know about struct fpu, not about the task.

(Also remove a dead prototype while at it.)

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:04 +02:00
Ingo Molnar
6ffc152e46 x86/fpu: Move all the fpu__*() high level methods closer to each other
The fpu__*() methods are closely related, but they are defined
in scattered places within the FPU code.

Concentrate them, and also uninline fpu__save(), fpu__drop()
and fpu__reset() to save about 5K of kernel text on 64-bit kernels:

   text            data    bss     dec        filename
   14113063        2575280 1634304 18322647   vmlinux.before
   14108070        2575280 1634304 18317654   vmlinux.after

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:04 +02:00
Ingo Molnar
0e75c54f17 x86/fpu: Rename restore_fpu_checking() to copy_fpstate_to_fpregs()
fpu_restore_checking() is a helper function of restore_fpu_checking(),
but this is not apparent from the naming.

Both copy fpstate contents to fpregs, while the fuller variant does
a full copy without leaking information.

So rename them to:

    copy_fpstate_to_fpregs()
  __copy_fpstate_to_fpregs()

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:03 +02:00
Ingo Molnar
5033861575 x86/fpu: Synchronize the naming of drop_fpu() and fpu_reset_state()
drop_fpu() and fpu_reset_state() are similar in functionality
and in scope, yet this is not apparent from their names.

drop_fpu() deactivates FPU contents (both the fpregs and the fpstate),
but leaves register contents intact in the eager-FPU case, mostly as an
optimization. It disables fpregs in the lazy FPU case. The drop_fpu()
method can be used to destroy FPU state in an optimized way, when we
know that a new state will be loaded before user-space might see
any remains of the old FPU state:

     - such as in sys_exit()'s exit_thread() where we know this task
       won't execute any user-space instructions anymore and the
       next context switch cleans up the FPU. The old FPU state
       might still be around in the eagerfpu case but won't be
       saved.

     - in __restore_xstate_sig(), where we use drop_fpu() before
       copying a new state into the fpstate and activating that one.
       No user-pace instructions can execute between those steps.

     - in sys_execve()'s fpu__clear(): there we use drop_fpu() in
       the !eagerfpu case, where it's equivalent to a full reinit.

fpu_reset_state() is a stronger version of drop_fpu(): both in
the eagerfpu and the lazy-FPU case it guarantees that fpregs
are reinitialized to init state. This method is used in cases
where we need a full reset:

     - handle_signal() uses fpu_reset_state() to reset the FPU state
       to init before executing a user-space signal handler. While we
       have already saved the original FPU state at this point, and
       always restore the original state, the signal handling code
       still has to do this reinit, because signals may interrupt
       any user-space instruction, and the FPU might be in various
       intermediate states (such as an unbalanced x87 stack) that is
       not immediately usable for general C signal handler code.

     - __restore_xstate_sig() uses fpu_reset_state() when the signal
       frame has no FP context. Since the signal handler may have
       modified the FPU state, it gets reset back to init state.

     - in another branch __restore_xstate_sig() uses fpu_reset_state()
       to handle a restoration error: when restore_user_xstate() fails
       to restore FPU state and we might have inconsistent FPU data,
       fpu_reset_state() is used to reset it back to a known good
       state.

     - __kernel_fpu_end() uses fpu_reset_state() in an error branch.
       This is in a 'must not trigger' error branch, so on bug-free
       kernels this never triggers.

     - fpu__restore() uses fpu_reset_state() in an error path
       as well: if the fpstate was set up with invalid FPU state
       (via ptrace or via a signal handler), then it's reset back
       to init state.

     - likewise, the scheduler's switch_fpu_finish() uses it in a
       restoration error path too.

Move both drop_fpu() and fpu_reset_state() to the fpu__*() namespace
and harmonize their naming with their function:

    fpu__drop()
    fpu__reset()

This clearly shows that both methods operate on the full state of the
FPU, just like fpu__restore().

Also add comments to explain what each function does.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:03 +02:00
Ingo Molnar
5e907bb045 x86/alternatives, x86/fpu: Add 'alternatives_patched' debug flag and use it in xsave_state()
We'd like to use xsave_state() earlier, but its SYSTEM_BOOTING check
is too imprecise.

The real condition that xsave_state() would like to check is whether
alternative XSAVE instructions were patched into the kernel image
already.

Add such a (read-mostly) debug flag and use it in xsave_state().

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:03 +02:00
Ingo Molnar
2e85591a6c x86/fpu: Better document fpu__clear() state handling
So prior to this fix:

  c88d47480d ("x86/fpu: Always restore_xinit_state() when use_eager_cpu()")

we leaked FPU state across execve() boundaries on eagerfpu systems:

	$ /host/home/mingo/dump-xmm-regs-exec
	# XMM state before execve():
	XMM0 : 000000000000dede
	XMM1 : 000000000000dedf
	XMM2 : 000000000000dee0
	XMM3 : 000000000000dee1
	XMM4 : 000000000000dee2
	XMM5 : 000000000000dee3
	XMM6 : 000000000000dee4
	XMM7 : 000000000000dee5
	XMM8 : 000000000000dee6
	XMM9 : 000000000000dee7
	XMM10: 000000000000dee8
	XMM11: 000000000000dee9
	XMM12: 000000000000deea
	XMM13: 000000000000deeb
	XMM14: 000000000000deec
	XMM15: 000000000000deed

	# XMM state after execve(), in the new task context:
	XMM0 : 0000000000000000
	XMM1 : 2f2f2f2f2f2f2f2f
	XMM2 : 0000000000000000
	XMM3 : 0000000000000000
	XMM4 : 00000000000000ff
	XMM5 : 00000000ff000000
	XMM6 : 000000000000dee4
	XMM7 : 000000000000dee5
	XMM8 : 0000000000000000
	XMM9 : 0000000000000000
	XMM10: 0000000000000000
	XMM11: 0000000000000000
	XMM12: 0000000000000000
	XMM13: 000000000000deeb
	XMM14: 000000000000deec
	XMM15: 000000000000deed

Better explain what this function is supposed to do and why.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:02 +02:00
Ingo Molnar
b1276c48e9 x86/fpu: Initialize fpregs in fpu__init_cpu_generic()
FPU fpregs do not get initialized during bootup on secondary CPUs,
on non-xsave capable CPUs.

For example on one of my systems, the secondary CPU has this FPU
state on bootup:

	x86: Booting SMP configuration:
	.... node  #0, CPUs:      #1
	x86/fpu ######################
	x86/fpu # FPU register dump on CPU#1:
	x86/fpu # ... CWD: ffff0040
	x86/fpu # ... SWD: ffff0000
	x86/fpu # ... TWD: ffff555a
	x86/fpu # ... FIP: 00000000
	x86/fpu # ... FCS: 00000000
	x86/fpu # ... FOO: 00000000
	x86/fpu # ... FOS: ffff0000
	x86/fpu # ... FP0: 02 57 00 00 00 00 00 00 ff ff
	x86/fpu # ... FP1: 1b e2 00 00 00 00 00 00 ff ff
	x86/fpu # ... FP2: 00 00 00 00 00 00 00 00 00 00
	x86/fpu # ... FP3: 00 00 00 00 00 00 00 00 00 00
	x86/fpu # ... FP4: 00 00 00 00 00 00 00 00 00 00
	x86/fpu # ... FP5: 00 00 00 00 00 00 00 00 00 00
	x86/fpu # ... FP6: 00 00 00 00 00 00 00 00 00 00
	x86/fpu # ... FP7: 00 00 00 00 00 00 00 00 00 00
	x86/fpu # ...  SW: dadadada
	x86/fpu ######################

Note how CWD and TWD are off their usual init state (0x037f and 0xffff),
and how FP0 and FP1 has non-zero content.

This is normally not a problem, because any user-space FPU state
is initalized properly - but it can complicate the use of FPU
instructions in kernel code via kernel_fpu_begin()/end(): if
the FPU using code does not initialize registers itself, it
might generate spurious exceptions depending on which CPU it
executes on.

Fix this by initializing the x87 state via the FNINIT instruction.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:02 +02:00
Ingo Molnar
3c6dffa93b x86/fpu: Rename user_has_fpu() to fpregs_active()
Rename this function in line with the new FPU nomenclature.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:02 +02:00
Ingo Molnar
be7436d519 x86/fpu: Clarify ancient comments in fpu__restore()
So this function still had ancient language about 'saving current
math information' - but we haven't been doing lazy FPU saves for
quite some time, we are doing lazy FPU restores.

Also remove IRQ13 related comment, which we don't support anymore
either.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:01 +02:00
Ingo Molnar
2a52af8b8a x86/fpu: Rename save_user_xstate() to copy_fpregs_to_sigframe()
Move the naming in line with existing names, so that we now have:

  copy_fpregs_to_fpstate()
  copy_fpstate_to_sigframe()
  copy_fpregs_to_sigframe()

... where each function does what its name suggests.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:01 +02:00
Ingo Molnar
c8e1404120 x86/fpu: Rename save_xstate_sig() to copy_fpstate_to_sigframe()
Standardize the naming of save_xstate_sig() by renaming it to
copy_fpstate_to_sigframe(): this tells us at a glance that
the function copies an FPU fpstate to a signal frame.

This naming also follows the naming of copy_fpregs_to_fpstate().

Don't put 'xstate' into the name: since this is a generic name,
it's expected that the function is able to handle xstate frames
as well, beyond legacy frames.

xstate used to be the odd case in the x86 FPU code - now it's the
common case.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:01 +02:00
Ingo Molnar
36e49e7f2e x86/fpu: Pass 'struct fpu' to fpstate_sanitize_xstate()
Currently fpstate_sanitize_xstate() has a task_struct input parameter,
but it only uses the fpu structure from it - so pass in a 'struct fpu'
pointer only and update all call sites.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:00 +02:00
Ingo Molnar
1ac91a767f x86/fpu: Simplify fpstate_sanitize_xstate() calls
Remove the extra layer of __fpstate_sanitize_xstate():

	if (!use_xsaveopt())
		return;
	__fpstate_sanitize_xstate(tsk);

and move the check for use_xsaveopt() into fpstate_sanitize_xstate().

In general we optimize for the presence of CPU features, not for
the absence of them. Furthermore there's little point in this inlining,
as the call sites are not super hot code paths.

Doing this uninlining shrinks the code a bit:

   text    data     bss     dec     hex filename
   14108751        2573624 1634304 18316679        1177d87 vmlinux.before
   14108627        2573624 1634304 18316555        1177d0b vmlinux.after

Also remove a pointless '!fx' check from fpstate_sanitize_xstate().

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:00 +02:00
Ingo Molnar
d090319312 x86/fpu: Rename sanitize_i387_state() to fpstate_sanitize_xstate()
So the sanitize_i387_state() function has the following purpose:
on CPUs that support optimized xstate saving instructions, an
FPU fpstate might end up having partially uninitialized data.

This function initializes that data.

Note that the function name is a misnomer and confusing on two levels,
not only is it not i387 specific at all, but it is the exact opposite:
it only matters on xstate CPUs.

So rename sanitize_i387_state() and __sanitize_i387_state() to
fpstate_sanitize_xstate() and __fpstate_sanitize_xstate(),
to clearly express the purpose and usage of the function.

We'll further clean up this function in the next patch.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:00 +02:00
Ingo Molnar
befc61ad3c x86/fpu: Move asm/xcr.h to asm/fpu/internal.h
Now that all FPU internals using drivers are converted to public APIs,
move xcr.h's definitions into fpu/internal.h and remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:00 +02:00
Ingo Molnar
33588b5222 x86/fpu: Simplify print_xstate_features()
We do a boot time printout of xfeatures in print_xstate_features(),
simplify this code to make use of the recently introduced cpu_has_xfeature()
method.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:55 +02:00
Ingo Molnar
5b07343034 x86/fpu: Introduce cpu_has_xfeatures(xfeatures_mask, feature_name)
A lot of FPU using driver code is querying complex CPU features to be
able to figure out whether a given set of xstate features is supported
by the CPU or not.

Introduce a simplified API function that can be used on any CPU type
to get this information. Also add an error string return pointer,
so that the driver can print a meaningful error message with a
standardized feature name.

Also mark xfeatures_mask as __read_only.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:55 +02:00
Ingo Molnar
6278485450 x86/fpu: Rename fpu/xsave.c to fpu/xstate.c
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:55 +02:00
Ingo Molnar
b16529004f x86/fpu: Optimize fpu_copy() some more on lazy switching systems
The current fpu_copy() code on lazy switching CPUs always saves
into the current fpstate and then copies it over into the child
context:

		preempt_disable();
		if (!copy_fpregs_to_fpstate(src_fpu))
			fpregs_deactivate(src_fpu);
		preempt_enable();
		memcpy(&dst_fpu->state, &src_fpu->state, xstate_size);

That memcpy() can be avoided on all lazy switching setups except
really old FNSAVE-only systems: change fpu_copy() to directly save
into the child context, for both the lazy and the eager context
switching case.

Note that we still have to do a memcpy() back into the parent
context in the FNSAVE case, but this won't be executed on the
majority of x86 systems that got built in the last 10 years or so.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:54 +02:00
Ingo Molnar
68271c6ae7 x86/fpu: Optimize fpu_copy()
Optimize fpu_copy() a bit by expanding the ->fpstate_active == 1
portion of fpu__save() into it.

( The main purpose of this change is to enable another, larger
  optimization that will be done in the next patch. )

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:54 +02:00
Ingo Molnar
48c4717f30 x86/fpu: Optimize fpu__save()
So fpu__save() does this currently:

		copy_fpregs_to_fpstate(fpu);
		if (!use_eager_fpu())
			fpregs_deactivate(fpu);

... which deactivates the FPU on lazy switching systems unconditionally.

Both usecases of fpu__save() use this function to save the
FPU state into a fpstate: fork()/clone() and math error signal handling.

The unconditional disabling of FPU registers in the lazy switching
case is probably a mistaken conversion of old FNSAVE code (that had
to disable FPU registers).

So speed up this code by only disabling FPU registers when absolutely
necessary: when indicated by the copy_fpregs_to_fpstate() return
code:

		if (!copy_fpregs_to_fpstate(fpu))
			fpregs_deactivate(fpu);

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:53 +02:00
Ingo Molnar
fea435a202 x86/fpu: Simplify fpu__save()
Factor out a common call.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:53 +02:00
Ingo Molnar
9f876d6766 x86/fpu: Eliminate __save_fpu()
The current implementation of __save_fpu():

	if (use_xsave()) {
		xsave_state(&fpu->state.xsave);
	} else {
		fpu_fxsave(fpu);
	}

Is actually a simplified version of copy_fpregs_to_fpstate(),
if use_eager_fpu() is true.

But all call sites of __save_fpu() call it only it when use_eager_fpu()
is true.

So we can eliminate __save_fpu() altogether and use the standard
copy_fpregs_to_fpstate() function. This cleans up the code
by making it use fewer variants of FPU register saving.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:53 +02:00
Ingo Molnar
72ee6f87ad x86/fpu: Simplify __save_fpu()
__save_fpu() has this pattern:

		if (unlikely(system_state == SYSTEM_BOOTING))
			xsave_state_booting(&fpu->state.xsave);
		else
			xsave_state(&fpu->state.xsave);

... but it does not actually get called during system bootup.

So remove the complication and always call xsave_state().

To make sure this assumption is correct, add a WARN_ONCE()
debug check to xsave_state().

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:53 +02:00
Ingo Molnar
32b49b3c83 x86/fpu: Factor out FPU hw activation/deactivation
We have repeat patterns of:

	if (!use_eager_fpu())
		clts();

... to activate FPU registers, and:

	if (!use_eager_fpu())
		stts();

... to deactivate them.

Encapsulate these in:

	__fpregs_activate_hw();
	__fpregs_activate_hw();

and use them accordingly.

Doing this synchronizes the idiom with the fpu->fpregs_active
software-flag's handling functions, creating clear patterns of:

	__fpregs_activate_hw();
	__fpregs_activate(fpu);

etc., which improves readability.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:52 +02:00
Ingo Molnar
67ee658e6f x86/fpu: Rename fpu__unlazy_stopped() to fpu__activate_stopped()
In line with the fpstate_activate() change, name
fpu__unlazy_stopped() in a similar fashion as well: its purpose
is to make the fpstate of a stopped task the current and active FPU
context, which may require unlazying and initialization.

The unlazying is just part of the job, the main concept is to make
the fpstate active.

Also clarify the function's description to clarify its exact
usage and the background behind it all.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:52 +02:00
Ingo Molnar
c4d72e2db3 x86/fpu: Simplify fpstate_init_curr() usage
Now that fpstate_init_curr() is not doing implicit allocations
anymore, almost all uses of it involve a very simple pattern:

	if (!fpu->fpstate_active)
		fpstate_init_curr(fpu);

which is basically activating the FPU fpstate if it was not active
before.

So propagate the check into the function itself, and rename the
function according to its new purpose:

	fpu__activate_curr(fpu);

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:51 +02:00
Ingo Molnar
2fb29fc7c6 x86/fpu: Simplify fpu__unlazy_stopped() error handling
Now that FPU contexts are always allocated, fpu__unlazy_stopped()
cannot fail. Remove its error return and propagate the changes to
the callers.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:51 +02:00
Ingo Molnar
e62bb3d894 x86/fpu: Rename fpstate_alloc_init() to fpstate_init_curr()
Now that there are no FPU context allocations, rename fpstate_alloc_init()
to fpstate_init_curr(), to signal that it initializes the fpstate and
marks it active, for the current task.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:50 +02:00
Ingo Molnar
91d93d0e20 x86/fpu: Remove failure return from fpstate_alloc_init()
Remove the failure code and propagate this down to callers.

Note that this function still has an 'init' aspect, which must be
called.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:50 +02:00
Ingo Molnar
c4d6ee6e2e x86/fpu: Remove failure paths from fpstate-alloc low level functions
Now that we always allocate the FPU context as part of task_struct there's
no need for separate allocations - remove them and their primary failure
handling code.

( Note that there's still secondary error codes that have become superfluous,
  those will be removed in separate patches. )

Move the somewhat misplaced setup_xstate_comp() call to the core.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:50 +02:00
Ingo Molnar
7366ed771f x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again)
So 6 years ago we made the FPU fpstate dynamically allocated:

  aa283f4927 ("x86, fpu: lazy allocation of FPU area - v5")
  61c4628b53 ("x86, fpu: split FPU state from task struct - v5")

In hindsight this was a mistake:

   - it complicated context allocation failure handling, such as:

		/* kthread execs. TODO: cleanup this horror. */
		if (WARN_ON(fpstate_alloc_init(fpu)))
			force_sig(SIGKILL, tsk);

   - it caused us to enable irqs in fpu__restore():

                local_irq_enable();
                /*
                 * does a slab alloc which can sleep
                 */
                if (fpstate_alloc_init(fpu)) {
                        /*
                         * ran out of memory!
                         */
                        do_group_exit(SIGKILL);
                        return;
                }
                local_irq_disable();

   - it (slightly) slowed down task creation/destruction by adding
     slab allocation/free pattens.

   - it made access to context contents (slightly) slower by adding
     one more pointer dereference.

The motivation for the dynamic allocation was two-fold:

   - reduce memory consumption by non-FPU tasks

   - allocate and handle only the necessary amount of context for
     various XSAVE processors that have varying hardware frame
     sizes.

These days, with glibc using SSE memcpy by default and GCC optimizing
for SSE/AVX by default, the scope of FPU using apps on an x86 system is
much larger than it was 6 years ago.

For example on a freshly installed Fedora 21 desktop system, with a
recent kernel, all non-kthread tasks have used the FPU shortly after
bootup.

Also, even modern embedded x86 CPUs try to support the latest vector
instruction set - so they'll too often use the larger xstate frame
sizes.

So remove the dynamic allocation complication by embedding the FPU
fpstate in task_struct again. This should make the FPU a lot more
accessible to all sorts of atomic contexts.

We could still optimize for the xstate frame size in the future,
by moving the state structure to the last element of task_struct,
and allocating only a part of that.

This change is kept minimal by still keeping the ctx_alloc()/free()
routines (that now do nothing substantial) - we'll remove them in
the following patches.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:49 +02:00
Ingo Molnar
4f83634710 x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate()
So fpu_save_init() is a historic name that got its name when the only
way the FPU state was FNSAVE, which cleared (well, destroyed) the FPU
state after saving it.

Nowadays the name is misleading, because ever since the introduction of
FXSAVE (and more modern FPU saving instructions) the 'we need to reload
the FPU state' part is only true if there's a pending FPU exception [*],
which is almost never the case.

So rename it to copy_fpregs_to_fpstate() to make it clear what's
happening. Also add a few comments about why we cannot keep registers
in certain cases.

Also clean up the control flow a bit, to make it more apparent when
we are dropping/keeping FP registers, and to optimize the common
case (of keeping fpregs) some more.

[*] Probably not true anymore, modern instructions always leave the FPU
    state intact, even if exceptions are pending: because pending FP
    exceptions are posted on the next FP instruction, not asynchronously.

    They were truly asynchronous back in the IRQ13 case, and we had to
    synchronize with them, but that code is not working anymore: we don't
    have IRQ13 mapped in the IDT anymore.

    But a cleanup patch is obviously not the place to change subtle behavior.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:49 +02:00
Ingo Molnar
910665882f x86/fpu: Uninline the irq_ts_save()/restore() functions
Especially the irq_ts_save() function is pretty bloaty, generating
over a dozen instructions, so uninline them.

Even though the API is used rarely, the space savings are measurable:

   text    data     bss     dec     hex filename
   13331995        2572920 1634304 17539219        10ba093 vmlinux.before
   13331739        2572920 1634304 17538963        10b9f93 vmlinux.after

( This also allows the removal of an include file inclusion from fpu/api.h,
  speeding up the kernel build slightly. )

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:48 +02:00
Ingo Molnar
952f07ecbd x86/fpu: Move various internal function prototypes to fpu/internal.h
There are a number of FPU internal function prototypes and an inline function
in fpu/api.h, mostly placed so historically as the code grew over the years.

Move them over into fpu/internal.h where they belong. (Add sched.h include
to stackprotector.h which incorrectly relied on getting it from fpu/api.h.)

fpu/api.h is now a pure file that only contains FPU APIs intended for driver
use.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:48 +02:00
Ingo Molnar
d63e79b114 x86/fpu: Uninline kernel_fpu_begin()/end()
Both inline functions call an inline function unconditionally, so we
already pay the function call based clobbering cost. Uninline them.

This saves quite a bit of code in various performance sensitive
code paths:

   text            data    bss     dec             hex     filename
   13321334        2569888 1634304 17525526        10b6b16 vmlinux.before
   13320246        2569888 1634304 17524438        10b66d6 vmlinux.after

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:48 +02:00
Ingo Molnar
ae02679c56 x86/fpu: Add more comments to the FPU init code
Extend the comments of the FPU init code, and fix old ones.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:47 +02:00
Ingo Molnar
41e78410d8 x86/fpu: Reorder init methods
Reorder init methods in order of their relationship and usage, to
form coherent blocks throughout the whole file.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:47 +02:00
Ingo Molnar
7638b74b56 x86/fpu: Rename fpstate_xstate_init_size() to fpu__init_system_xstate_size_legacy()
To bring it in line with the other init_system*() methods.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:47 +02:00
Ingo Molnar
c66e3f2823 x86/fpu: Remove the extra fpu__detect() layer
Now that fpu__detect() has become an empty layer around
fpu__init_system(), eliminate it and make fpu__init_system()
the main system initialization routine.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:46 +02:00
Ingo Molnar
dd863880ac x86/fpu: Move fpu__init_system_early_generic() out of fpu__detect()
Move the fpu__init_system_early_generic() call into fpu__init_system(),
which hosts all the system init calls.

Expose fpu__init_system() to other modules - this will be our main and only
system init function.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:46 +02:00
Ingo Molnar
71eb3c6d15 x86/fpu: Make check_fpu() init ordering independent
check_fpu() currently relies on being called early in the init sequence,
when CR0::TS has not been set up yet.

Save/restore CR0::TS across this function, to make it invariant to
init ordering. This way we'll be able to move the generic FPU setup
routines earlier in the init sequence.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:46 +02:00
Ingo Molnar
0bf23f3d6c x86/fpu: Factor out FPU bug checks into fpu/bugs.c
Create separate fpu/bugs.c code so that if we read generic FPU code
we don't have to wade through all the bugcheck related code first.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:45 +02:00
Ingo Molnar
e83ab9ad97 x86/fpu: Move !FPU check ingo fpu__init_system_early_generic()
There's a !FPU related sanity check in fpu__init_cpu_generic(),
which is executed on every CPU onlining - even though we should do
this only once, and during system init.

Move this check to fpu__init_system_early_generic().

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:45 +02:00
Ingo Molnar
2e2f3da771 x86/fpu: Factor out fpu__init_system_early_generic()
Move the generic bits of fpu__detect() into fpu__init_system_early_generic().

We'll move some other code here too in a followup patch.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:45 +02:00
Ingo Molnar
7218e8b723 x86/fpu: Factor out fpu__init_system_generic()
Factor out the generic bits from fpu__init_system().

Rename mxcsr_feature_mask_init() to fpu__init_system_mxcsr()
to bring it in line with the rest of the nomenclature.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:45 +02:00
Ingo Molnar
b11316ed9e x86/fpu: Factor out fpu__init_cpu_generic()
Factor out the generic bits from fpu__init_cpu(), to create
a flat sequence of per CPU initialization function calls:

	fpu__init_cpu_generic();
	fpu__init_cpu_xstate();
	fpu__init_cpu_ctx_switch();

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:44 +02:00
Ingo Molnar
21c4cd108a x86/fpu: Simplify fpu__cpu_init()
After the latest round of cleanups, fpu__cpu_init() has become
a simple call to fpu__init_cpu().

Rename fpu__init_cpu() to fpu__cpu_init() and remove the
extra layer.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:44 +02:00
Ingo Molnar
7202ab46f7 x86/fpu: Remove fpu__init_cpu_ctx_switch() call from fpu__init_system()
We are now doing the fpu__init_cpu_ctx_switch() call from fpu__init_cpu(),
so there's no need to call it from fpu__init_system() anymore.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:44 +02:00
Ingo Molnar
067051ccd2 x86/fpu: Do system-wide setup from fpu__detect()
fpu__cpu_init() is called on every CPU, so it is the wrong place
to call fpu__init_system() from. Call it from fpu__detect():
this is early CPU init code, but we already have CPU features detected,
so we can call the system-wide FPU init code from here.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:43 +02:00
Ingo Molnar
3960fccf2e x86/fpu: Call fpu__init_cpu_ctx_switch() from fpu__init_cpu()
fpu__init_cpu() is currently called from fpu__init_system(),
which is the wrong place for it: call it from the proper high level
per CPU init function, fpu__init_cpu().

Note, we still keep the old call site as well, because it depends
on having proper CR0::TS setup. We'll fix this in the next patch.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:43 +02:00
Ingo Molnar
997578b14c x86/fpu: Move the fpstate_xstate_init_size() call into fpu__init_system()
The fpstate_xstate_init_size() function sets up a basic xstate_size, called
during fpu__detect() currently.

Its real dependency is to be called before fpu__init_system_xstate().

So move the function call site into fpu__init_system(), to right before the
fpu__init_system_xstate() call.

Also add a once-per-boot flag to fpstate_xstate_init_size(), we'll remove
this quirk later once we've cleaned up the init dependencies.

This moves the two related functions closer to each other and makes them
both part of the _init_system() functionality.

Currently we do the fpstate_xstate_init_size()
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:43 +02:00
Ingo Molnar
530b37e43c x86/fpu: Do CLTS fpu__init_system()
mxcsr_feature_mask_init() depends on TS being cleared, as it executes
an FXSAVE instruction.

After later changes we will move the TS setup into fpu__init_cpu(),
which will interact with this - so clear the TS flag explicitly.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:43 +02:00
Ingo Molnar
011545b570 x86/fpu: Split fpu__ctx_switch_init() into _cpu() and _system() portions
So fpu__ctx_switch_init() has two aspects: a once per bootup functionality
that sets up a capability flag, and a per CPU functionality that sets CR0::TS.

Split the function.

Note that at this stage we still have duplicate calls into these methods, as
both the _system() and the _cpu() methods are run on all CPUs, with lower
level on_boot_cpu flags filtering out the duplicates where needed. So add
TS flag clearing as well, to handle the aftermath of early CPU init sequences
that might call in without having eager-fpu set - don't assume the TS flag
is cleared.

Calling each from its respective init level will happen later on.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:42 +02:00
Ingo Molnar
064e51e3c8 x86/fpu: Clean up eager_fpu_init() and rename it to fpu__ctx_switch_init()
It's not an xsave specific function anymore, so rename it accordingly
and also clean it up a bit:

 - remove the obsolete __init_refok, as the code paths are not
   mixed anymore

 - rename it from eager_fpu_init() to fpu__ctx_switch_init()

 - remove stray 'return;'

 - make it static to its only user

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:42 +02:00
Ingo Molnar
6f5d265aff x86/fpu: Move eager_fpu_init() to fpu/init.c
Move eager_fpu_init() and the 'eagerfpu' boot parameter handling function
to the generic FPU init file: it's generic FPU functionality.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:42 +02:00
Ingo Molnar
89abbe01a4 x86/fpu: Move all eager-fpu setup code to eager_fpu_init()
The FPU context switch type (lazy or eager) setup code is split into
two places currently - move it all to eager_fpu_init().

Note that the code we move will now be executed on non-xstate CPUs
as well, but this should be safe: both xfeatures_mask and
cpu_has_xsaveopt is 0 there.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:41 +02:00
Ingo Molnar
a5cb56e9a6 x86/fpu: Remove setup_init_fpu_buf() call from eager_fpu_init()
It's a pure xstate method now, no need for this duplicate call.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:41 +02:00
Ingo Molnar
2507e1c03f x86/fpu: Set up the legacy FPU init image from fpu__init_system()
The legacy FPU init image is used on older CPUs who don't run xstate init.
But the init code is called within setup_init_fpu_buf(), an xstate method.

Move this legacy init out of the xstate code and put it into fpu/init.c.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:41 +02:00
Ingo Molnar
429ced50a0 x86/fpu: Do fpu__init_system_xstate only from fpu__init_system()
Only call xstate system setup routines from fpu__init_system().

Likewise, don't call fpu__init_cpu_xstate() from fpu__init_system().

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:41 +02:00
Ingo Molnar
c42103b226 x86/fpu: Remove xsave_init()
Expand fpu__init_system_xstate() and fpu__init_cpu_xstate() calls
into xsave_init() calls.

(This will allow us to call the proper versions in higher level FPU init code
later on.)

No change in functionality.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:40 +02:00
Ingo Molnar
62db6871ae x86/fpu: Propagate once per boot quirk into fpu__init_system_xstate()
Linearize the call sequence in xsave_init():

	fpu__init_system_xstate();
	fpu__init_cpu_xstate();

We do this by propagating the boot-once quirk into
fpu__init_system_xstate(). fpu__init_cpu_xstate() is
safe to be called multiple time.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:40 +02:00
Ingo Molnar
e9dbfd673a x86/fpu: Move legacy check to fpu__init_system_xstate()
Now that legacy code can execute fpu__init_cpu_xstate() in
xsave_init(), we can move the once per boot legacy check into
fpu__init_system_xstate(), where it belongs.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:40 +02:00
Ingo Molnar
e84611fc96 x86/fpu: Move CPU capability check into fpu__init_cpu_xstate()
fpu__init_system_xstate() does an FPU capability check that is better
done in fpu__init_cpu_xstate(). This will allow us to call
fpu__init_cpu_xstate() directly on legacy CPUs as well.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:39 +02:00
Ingo Molnar
55cc4678b7 x86/fpu: Make the system/cpu init distinction clear in the xstate code as well
Rename existing xstate init functions along the system/cpu init principles:

	fpu__init_system_xstate(): called once per system bootup
	fpu__init_cpu_xstate():    called per CPU onlining

Also make the fpu__init_cpu_xstate() early code invariant:
if xfeatures_mask is not set yet then don't crash but return.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:39 +02:00
Ingo Molnar
e35f6f1414 x86/fpu: Split fpu__cpu_init() into early-boot and cpu-boot parts
There are two kinds of FPU initialization sequences necessary to bring FPU
functionality up: once per system bootup activities, such as detection,
feature initialization, etc. of attributes that are shared by all CPUs
in the system - and per cpu initialization sequences run when a CPU is
brought online (either during bootup or during CPU hotplug onlining),
such as CR0/CR4 register setting, etc.

The FPU code is mixing these roles together, with no clear distinction.

Start sorting this out by splitting the main FPU detection routine
(fpu__cpu_init()) into two parts: fpu__init_system() for
one per system init activities, and fpu__init_cpu() for the
per CPU onlining init activities.

Note that xstate_init() is called from both variants for the time being,
because it has a dual nature as well. We'll fix that in upcoming patches.

Just do the split and call it as we used to before, don't introduce any
change in initialization behavior yet, beyond duplicate (and harmless)
fpu__init_cpu() and xstate_init() calls - which we'll fix in later
patches.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:39 +02:00
Ingo Molnar
3e5e126774 x86/fpu: Remove 'init_xstate_buf' bootmem allocation
Make init_xstate_buf allocated statically at build time.

This structure's maximum size is around 1KB - and it's allocated even on
most modern embedded x86 CPUs which strive for FPU instruction set parity
with desktop and server CPUs, so it's not like we can save much on smaller
systems.

This removes the last bootmem allocation from the FPU init path, allowing
it to be called earlier in the boot sequence.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:39 +02:00
Ingo Molnar
26b1f5d05a x86/fpu: Make setup_init_fpu_buf() run-once explicitly
Remove the dependency on the init_xstate_buf == NULL check to
implement once-per-bootup logic in eager_fpu_init(), by making
setup_init_fpu_buf() run once per bootup explicitly.

This is in preparation to make init_xstate_buf statically
allocated.

The various boot-once quirks in the FPU init code will be removed
in a later cleanup stage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:38 +02:00
Ingo Molnar
966ece619e x86/fpu: Remove xsave_init() bootmem allocations
There's only 8 xstate bits at the moment, and it's not like we
can support unknown bits - so put xstate_offsets[] and
xstate_sizes[] into static allocation.

This is in preparation to be able to call the FPU init code
earlier, when there's no bootmem available yet.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:38 +02:00
Ingo Molnar
6a13320758 x86/fpu: Remove fpstate_xstate_init_size() boot quirk
fpstate_xstate_init_size() is called in fpu__cpu_init(), which is
run on every CPU, every time they are brought online.

But we want to call fpstate_xstate_init_size() only once. Move it to
fpu__detect(), which only runs once, on the boot CPU.

Also clean up the flow of fpstate_xstate_init_size() a bit, by
removing a 'return' from the middle of the function.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:38 +02:00
Ingo Molnar
66af8e2764 x86/fpu: Rename __thread_fpu_end() to fpregs_deactivate()
Propagate the 'fpu->fpregs_active' naming to the high level function that
clears it.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:37 +02:00
Ingo Molnar
232f62cdd7 x86/fpu: Rename __thread_fpu_begin() to fpregs_activate()
Propagate the 'fpu->fpregs_active' naming to the high level
function that sets it.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:37 +02:00
Ingo Molnar
d5cea9b0af x86/fpu: Rename fpu->has_fpu to fpu->fpregs_active
So the current code uses fpu->has_cpu to determine whether a given
user FPU context is actively loaded into the FPU's registers [*] and
that those registers represent the task's current FPU state.

But this term is not unambiguous: especially the distinction between
fpu->has_fpu, PF_USED_MATH and fpu_fpregs_owner_ctx is not clear.

Increase clarity by unambigously signalling that it's about
hardware registers being active right now, by renaming it to
fpu->fpregs_active.

( In later patches we'll use more of the 'fpregs' naming, which will
  make it easier to grep for as well. )

[*] There's the kernel_fpu_begin()/end() primitive that also
    activates FPU hw registers as well and uses them, without
    touching the fpu->fpregs_active flag.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:36 +02:00
Ingo Molnar
73a3aeb3ac x86/fpu: Improve the __sanitize_i387_state() documentation
Improve the comments and add new ones, as this code isn't very obvious.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:36 +02:00
Ingo Molnar
678eaf6034 x86/fpu: Rename regset FPU register accessors
Rename regset accessors to prefix them with 'regset_', because we
want to start using the 'fpregs_active' name elsewhere.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:35 +02:00
Ingo Molnar
91a8c2a5b4 x86/fpu: Clean up and fix MXCSR handling
The code has the following problems:

 - it uses a single global 'fx_scratch' area that multiple CPUs could
   write into simultaneously, in theory.

 - it wastes 512 bytes of .data for something that is only rarely used.

Fix this by moving the state buffer to the stack. Note that while
this is 512 bytes, we don't ever call this function in very deep
callchains, so its stack usage should not be a problem.

Also add comments to explain the magic 0x0000ffbf default value.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:35 +02:00
Ingo Molnar
400e4b2091 x86/fpu: Rename xsave.header::xstate_bv to 'xfeatures'
'xsave.header::xstate_bv' is a misnomer - what does 'bv' stand for?

It probably comes from the 'XGETBV' instruction name, but I could
not find in the Intel documentation where that abbreviation comes
from. It could mean 'bit vector' - or something else?

But how about - instead of guessing about a weird name - we named
the field in an obvious and descriptive way that tells us exactly
what it does?

So rename it to 'xfeatures', which is a bitmask of the
xfeatures that are fpstate_active in that context structure.

Eyesore like:

           fpu->state->xsave.xsave_hdr.xstate_bv |= XSTATE_FP;

is now much more readable:

           fpu->state->xsave.header.xfeatures |= XSTATE_FP;

Which form is not just infinitely more readable, but is also
shorter as well.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:35 +02:00
Ingo Molnar
3a54450b5e x86/fpu: Rename 'xsave_hdr' to 'header'
Code like:

           fpu->state->xsave.xsave_hdr.xstate_bv |= XSTATE_FP;

is an eyesore, because not only is the words 'xsave' and 'state'
are repeated twice times (!), but also because of the 'hdr' and 'bv'
abbreviations that are pretty meaningless at a first glance.

Start cleaning this up by renaming 'xsave_hdr' to 'header'.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:34 +02:00
Ingo Molnar
8dcea8db79 x86/fpu: Clean up regset functions
Clean up various regset handlers: use the 'fpu' pointer which
is available in most cases.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:34 +02:00
Ingo Molnar
9254aaa0fe x86/fpu: Move XCR0 manipulation to the FPU code proper
The suspend code accesses FPU state internals, add a helper for
it and isolate it.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:33 +02:00
Ingo Molnar
84246fe4e3 x86/fpu: Rename 'xstate_features' to 'xfeatures_nr'
The name 'xstate_features' does not tell us whether it's a bitmap
or any other value. That it's a count of features is only obvious
if you read the code that calculates it.

Rename it to the more descriptive 'xfeatures_nr' name.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:33 +02:00
Ingo Molnar
614df7fb8a x86/fpu: Rename 'pcntxt_mask' to 'xfeatures_mask'
So the 'pcntxt_mask' is a misnomer, it's essentially meaningless to anyone
who doesn't know what it does exactly.

Name it more descriptively as 'xfeatures_mask'.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:33 +02:00
Ingo Molnar
69496e10f8 x86/fpu: Print supported xstate features in human readable way
Inform the user/admin about which xstate features the kernel supports.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:32 +02:00
Ingo Molnar
32d4d9ccb0 x86/fpu: Improve FPU detection kernel messages
Standardize the various boot time messages printed during FPU detection:

 - Use a common 'x86/fpu: ' prefix for consistency and to make it easy
   to grep boot logs for FPU related messages

 - Correct speling errors

 - Add printout for the legacy FPU case as well

 - Clarify messages

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:32 +02:00
Ingo Molnar
c0841e34fd x86/fpu: Remove xsave_init() __init obfuscation
So this code surprised me - and being surprised when reading FPU code
does not help maintainability of an already overly complex subsystem.

Remove the obfuscation and just don't use __init annotation for now.
Anyone who wants to free these ~600 bytes of xstate_enable_boot_cpu()
should implement it cleanly.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:31 +02:00
Ingo Molnar
78f7f1e54b x86/fpu: Rename fpu-internal.h to fpu/internal.h
This unifies all the FPU related header files under a unified, hiearchical
naming scheme:

 - asm/fpu/types.h:      FPU related data types, needed for 'struct task_struct',
                         widely included in almost all kernel code, and hence kept
                         as small as possible.

 - asm/fpu/api.h:        FPU related 'public' methods exported to other subsystems.

 - asm/fpu/internal.h:   FPU subsystem internal methods

 - asm/fpu/xsave.h:      XSAVE support internal methods

(Also standardize the header guard in asm/fpu/internal.h.)

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:31 +02:00
Ingo Molnar
df6b35f409 x86/fpu: Rename i387.h to fpu/api.h
We already have fpu/types.h, move i387.h to fpu/api.h.

The file name has become a misnomer anyway: it offers generic FPU APIs,
but is not limited to i387 functionality.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:30 +02:00
Ingo Molnar
e11267c13f x86/fpu: Clean up fpu__clear() a bit
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:30 +02:00
Ingo Molnar
2e8a310266 x86/fpu: Rename fpu__flush_thread() to fpu__clear()
The primary purpose of this function is to clear the current task's
FPU before an exec(), to not leak information from the previous task,
and to allow the new task to start with freshly initialized FPU
registers.

Rename the function to reflect this primary purpose.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:29 +02:00
Ingo Molnar
cc08d54599 x86/fpu: Use 'struct fpu' in fpu__unlazy_stopped()
Migrate this function to pure 'struct fpu' usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:29 +02:00
Ingo Molnar
db2b1d3ad1 x86/fpu: Use 'struct fpu' in fpstate_alloc_init()
Migrate this function to pure 'struct fpu' usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:29 +02:00
Ingo Molnar
c69e098b1f x86/fpu: Use 'struct fpu' in fpu__copy()
Migrate this function to pure 'struct fpu' usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:29 +02:00
Ingo Molnar
f9bc977fe7 x86/fpu: Use 'struct fpu' in fpu_copy()
Migrate this function to pure 'struct fpu' usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:28 +02:00
Ingo Molnar
0c070595ce x86/fpu: Use 'struct fpu' in fpu__save()
Migrate this function to pure 'struct fpu' usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:28 +02:00
Ingo Molnar
a4d8fc2e06 x86/fpu: Use 'struct fpu' in __fpu_save()
Migrate this function to pure 'struct fpu' usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:28 +02:00
Ingo Molnar
2d75bcf314 x86/fpu: Move __save_fpu() into fpu/core.c
This helper function is only used in fpu/core.c, move it there.

This slightly speeds up compilation.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:27 +02:00
Ingo Molnar
384a23f939 x86/fpu: Use 'struct fpu' in switch_fpu_finish()
Migrate this function to pure 'struct fpu' usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:27 +02:00
Ingo Molnar
cb8818b6ac x86/fpu: Use 'struct fpu' in switch_fpu_prepare()
Migrate this function to pure 'struct fpu' usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:27 +02:00
Ingo Molnar
af2d94fddc x86/fpu: Use 'struct fpu' in fpu_reset_state()
Migrate this function to pure 'struct fpu' usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:26 +02:00
Ingo Molnar
11f2d50b10 x86/fpu: Use 'struct fpu' in restore_fpu_checking()
Migrate this function to pure 'struct fpu' usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:26 +02:00
Ingo Molnar
eb6a3251bf x86/fpu: Remove task_disable_lazy_fpu_restore()
Replace task_disable_lazy_fpu_restore() with easier to read
open-coded uses: we already update the fpu->last_cpu field
explicitly in other cases.

(This also removes yet another task_struct using FPU method.)

Better explain the fpu::last_cpu field in the structure definition.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:26 +02:00
Ingo Molnar
ca6787ba0f x86/fpu: Remove 'struct task_struct' usage from drop_fpu()
Migrate this function to pure 'struct fpu' usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:25 +02:00
Ingo Molnar
c5bedc6847 x86/fpu: Get rid of PF_USED_MATH usage, convert it to fpu->fpstate_active
Introduce a simple fpu->fpstate_active flag in the fpu context data structure
and use that instead of PF_USED_MATH in task->flags.

Testing for this flag byte should be slightly more efficient than
testing a bit in a bitmask, but the main advantage is that most
FPU functions can now be performed on a 'struct fpu' alone, they
don't need access to 'struct task_struct' anymore.

There's a slight linecount increase, mostly due to the 'fpu' local
variables and due to extra comments. The local variables will go away
once we move most of the FPU methods to pure 'struct fpu' parameters.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:25 +02:00
Ingo Molnar
af7f8721f1 x86/fpu: Document fpu__unlazy_stopped()
Explain its usage and also document a TODO item.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:25 +02:00
Ingo Molnar
4c1384100e x86/fpu: Open code PF_USED_MATH usages
PF_USED_MATH is used directly, but also in a handful of helper inlines.

To ease the elimination of PF_USED_MATH, convert all inline helpers
to open-coded PF_USED_MATH usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:24 +02:00
Ingo Molnar
4540d3faa7 x86/fpu: Remove 'struct task_struct' usage from __thread_fpu_begin()
Migrate this function to pure 'struct fpu' usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:24 +02:00
Ingo Molnar
35191e3f07 x86/fpu: Remove 'struct task_struct' usage from __thread_fpu_end()
Migrate this function to pure 'struct fpu' usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:24 +02:00
Ingo Molnar
36b544dcd3 x86/fpu: Change fpu_owner_task to fpu_fpregs_owner_ctx
Track the FPU owner context instead of the owner task: this change,
together with other changes, will allow in subsequent patches the
elimination of 'struct task_struct' usage in various FPU code:
we'll be able to use 'struct fpu' only.

There's no change in code size:

      text           data     bss      dec            hex filename
  13066467        2545248 1626112 17237827        1070743 vmlinux.before
  13066467        2545248 1626112 17237827        1070743 vmlinux.after

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:23 +02:00
Ingo Molnar
b0c050c5ba x86/fpu: Move 'PER_CPU(fpu_owner_task)' to fpu/core.c
Move it closer to other per-cpu FPU data structures.

This also unifies the 32-bit and 64-bit code.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:23 +02:00
Ingo Molnar
276983f808 x86/fpu: Eliminate the __thread_has_fpu() wrapper
Start migrating FPU methods towards using 'struct fpu *fpu'
directly. __thread_has_fpu() is just a trivial wrapper around
fpu->has_fpu, eliminate it.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:22 +02:00
Ingo Molnar
9a89b02918 x86/fpu: Print out whether we are doing lazy/eager FPU context switches
Ever since the kernel started defaulting to eager FPU switches on modern Intel
CPUs it's not been obvious whether a given system is using the lazy or the eager
FPU context switching logic.

So generate a boot message about which mode the FPU code is in:

  x86/fpu: Using 'lazy' FPU context switches.

or:

  x86/fpu: Using 'eager' FPU context switches.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:22 +02:00
Ingo Molnar
bfd6fc0581 x86/fpu: Add debugging check to fpu_copy()
Also add a bit of documentation.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:22 +02:00
Ingo Molnar
e102f30f4e x86/fpu: Move fpu_copy() to fpu/core.c
Move fpu_copy() where its only user is.

Beyond readability this also speeds up compilation, as fpu-internal.h
is included in over a dozen .c files.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:21 +02:00
Ingo Molnar
6522d78377 x86/fpu: Remove __save_init_fpu()
__save_init_fpu() is just a trivial wrapper around fpu_save_init().

Remove the extra layer of obfuscation.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:21 +02:00
Ingo Molnar
085cc281a0 x86/fpu: Add kernel_fpu_disabled()
Instead of open-coded in_kernel_fpu access, Use kernel_fpu_disabled() in
interrupted_kernel_fpu_idle(), matching the other kernel_fpu_*() methods.

Also add some documentation for in_kernel_fpu.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:21 +02:00
Ingo Molnar
3103ae3a6d x86/fpu: Add debug check to kernel_fpu_disable()
We are not supposed to call kernel_fpu_disable() if we have not
previously enabled it.

Also use kernel_fpu_disable()/enable() in the __kernel_fpu_begin/end()
primitives, instead of writing to in_kernel_fpu directly,
so that we get the debugging checks.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:20 +02:00
Ingo Molnar
416d49ac67 x86/fpu: Make kernel_fpu_disable/enable() static
This allows the compiler to inline them and to eliminate them:

   arch/x86/kernel/fpu/core.o:

   text    data     bss     dec     hex filename
   6741       4       8    6753    1a61 core.o.before
   6716       4       8    6728    1a48 core.o.after

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:20 +02:00
Ingo Molnar
f55f88e25e x86/fpu: Make task_xstate_cachep static
It's now local to fpu/core.c, make it static.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:20 +02:00
Ingo Molnar
5a12bf6332 x86/fpu: Uninline fpstate_free() and move it next to the allocation function
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:20 +02:00
Ingo Molnar
a752b53d9d x86/fpu: Factor out fpu__copy()
Introduce fpu__copy() and use it in arch_dup_task_struct(),
thus moving another chunk of FPU logic to fpu/core.c.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:19 +02:00
Ingo Molnar
8ffb53ab98 x86/fpu: Move task_xstate_cachep handling to core.c
This code was historically in process.c, now we have FPU core internals in
fpu/core.c instead - move it there.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:19 +02:00
Ingo Molnar
3e261c14e4 x86/fpu: Simplify the xsave_state*() methods
These functions (xsave_state() and xsave_state_booting()) have a 'mask'
argument that is always -1.

Propagate this into the functions instead and eliminate the extra argument.

Does not change the generated code, because these were inlined functions.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:18 +02:00
Ingo Molnar
4d1640927b x86/fpu: Factor out the FPU bug detection code into fpu__init_check_bugs()
Move the boot-time FPU bug detection code to the other FPU boot time
init code in fpu/init.c.

No change in code size:

   text    data     bss     dec     hex filename
   13044568        1884440 1130496 16059504         f50c70 vmlinux.before
   13044568        1884440 1130496 16059504         f50c70 vmlinux.after

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:18 +02:00
Ingo Molnar
3a0aee4801 x86/fpu: Rename math_state_restore() to fpu__restore()
Move to the new fpu__*() namespace.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:18 +02:00
Ingo Molnar
93b90712c6 x86/fpu: Move math_state_restore() to fpu/core.c
It's another piece of FPU internals that is better off close to
the other FPU internals.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:17 +02:00
Ingo Molnar
81683cc827 x86/fpu: Factor out fpu__flush_thread() from flush_thread()
flush_thread() open codes a lot of FPU internals - create a separate
function for it in fpu/core.c.

Turns out that this does not hurt performance:

   text    data     bss     dec     hex filename
   11843039        1884440 1130496 14857975         e2b6f7 vmlinux.before
   11843039        1884440 1130496 14857975         e2b6f7 vmlinux.after

and since this is a slowpath clarity comes first anyway.

We can reconsider inlining decisions after the FPU code has been cleaned up.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:17 +02:00
Ingo Molnar
11ad19277e x86/fpu: Remove the free_thread_xstate() complication
Use fpstate_free() directly to manage FPU state.

Only process.c was using this method, so this is a speedup as well,
as it removes the extra function call and related clobbers.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:17 +02:00
Ingo Molnar
146ed598d1 x86/fpu: Move the no_387 handling and FPU detection code into init.c
Both no_387() and fpu__detect() run at boot time, so they belong
into init.c.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:17 +02:00
Ingo Molnar
4445e6e9a5 x86/fpu: Remove unnecessary includes from core.c
fpu/core.c includes a lot of files for mostly historic reasons.

It only needs fpu-internal.h, which already includes all
the required headers.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:16 +02:00
Ingo Molnar
0c86753790 x86/fpu: Split out the boot time FPU init code into fpu/init.c
Move boot time FPU initialization code into init.c, to better
isolate it into its own domain.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:16 +02:00
Ingo Molnar
f89e32e0a3 x86/fpu: Fix header file dependencies of fpu-internal.h
Fix a minor header file dependency bug in asm/fpu-internal.h: it
relies on i387.h but does not include it. All users of fpu-internal.h
included it explicitly.

Also remove unnecessary includes, to reduce compilation time.

This also makes it easier to use it as a standalone header file
for FPU internals, such as an upcoming C module in arch/x86/kernel/fpu/.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:16 +02:00
Ingo Molnar
ce4c4c2624 x86/fpu: Move i387.c and xsave.c to arch/x86/kernel/fpu/
Create a new subdirectory for the FPU support code in arch/x86/kernel/fpu/.

Rename 'i387.c' to 'core.c' - as this really collects the core FPU support
code, nothing i387 specific.

We'll better organize this directory in later patches.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:15 +02:00
Ingo Molnar
c0c2803dee x86/fpu: Move thread_info::fpu_counter into thread_info::fpu.counter
This field is kept separate from the main FPU state structure for
no good reason.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:14 +02:00
Ingo Molnar
3f6a0bce90 x86/fpu: Rename init_thread_xstate() to fpstate_xstate_init_size()
So init_thread_xstate() is a misnomer in that it's not really related to a specific
thread - it determines, once during initial bootup, the size of the xstate context.

Also improve the comments.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:14 +02:00
Ingo Molnar
3a9c4b0d7e x86/fpu: Rename fpu_init() to fpu__cpu_init()
fpu_init() is a bit of a misnomer in that it (falsely) creates the
impression that it's related to the (old) fpu_finit() function,
which initializes FPU ctx state.

Rename it to fpu__cpu_init() to make its boot time initialization
clear, and to move it to the fpu__*() namespace.

Also fix and extend its comment block to point out that it's
called not only on the boot CPU, but on secondary CPUs as well.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:14 +02:00
Ingo Molnar
c0ee2cf61b x86/fpu: Rename fpu_finit() to fpstate_init()
Make it clear that we are initializing the in-memory FPU context area,
no the FPU registers.

Also move it to the fpu__*() namespace.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:13 +02:00
Ingo Molnar
a7c2a83364 x86/fpu: Rename fpu_free() to fpstate_free()
Use the fpu__*() namespace.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:13 +02:00
Ingo Molnar
ed97b08546 x86/fpu: Rename fpu_alloc() to fpstate_alloc()
Use the fpu__*() namespace for fpstate_alloc() as well.

Also add a comment about FPU state alignment.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:13 +02:00
Ingo Molnar
6fbe671248 x86/fpu: Move fpu_alloc() out of line
This is not a small function, and it's used in several places,
one of them a popular module (KVM).

Move the function out of line. This saves a bit of text,
even with the symbol export overhead:

   text    data     bss     dec     hex filename
   12566052        1619504 1089536 15275092         e91454 vmlinux.before
   12566046        1619504 1089536 15275086         e9144e vmlinux.after

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:12 +02:00
Ingo Molnar
071ae621ec x86/fpu: Simplify fpu__unlazy_stopped()
Open code the PF_USED_MATH logic, to make the logic more obvious.

(We'll slowly convert the other users of *_used_math() methods as well.)

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:12 +02:00
Ingo Molnar
8694c3e793 x86/fpu: Optimize fpu__unlazy_stopped()
This function is only called for stopped child tasks, so the
fpu__save() branch will never get called - remove it.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:11 +02:00
Ingo Molnar
67e97fc2ec x86/fpu: Rename init_fpu() to fpu__unlazy_stopped() and add debugging check
This function name is a misnomer now that we've split out all the
other users from it. Rename it accordingly: it's used to save
the FPU state of (ptrace-)stopped child tasks.

Add debugging check to double check this intended usage: that this
function is only called for non-current, stopped child tasks.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:11 +02:00
Ingo Molnar
bda283796b x86/fpu: Make init_fpu() static
Now that the allocation users have been split off into a separate
function, init_fpu() has become local to i387.c: make it static.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:11 +02:00
Ingo Molnar
97185c95f7 x86/fpu: Split an fpstate_alloc_init() function out of init_fpu()
Most init_fpu() users don't want the register-saving aspect of the
function, they are calling it for 'current' and when FPU registers
are not allocated and initialized yet.

Split out a simplified API that does just that (and add debug-checks
for these conditions): fpstate_alloc_init().

Use it where appropriate.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:10 +02:00
Ingo Molnar
1a7dc0db71 x86/fpu: Rename fpu_detect() to fpu__detect()
Use the fpu__*() namespace to organize FPU ops better.

Also document fpu__detect() a bit.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:10 +02:00
Ingo Molnar
87cdb98aff x86/fpu: Add debugging check to fpu__save()
Document the function a bit more and add debugging check that we are only
running this with the current task.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:10 +02:00
Ingo Molnar
4af08f2f47 x86/fpu: Add comments to fpu__save() and restrict its export
Add an explanation to fpu__save() and also don't export it to
random modules - we don't want them to futz around with deep kernel
internals.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:09 +02:00
Ingo Molnar
0a78155154 x86/fpu: Rename unlazy_fpu() to fpu__save()
This function is a misnomer on two levels:

1) it doesn't really manipulate TS on modern CPUs anymore, its
   primary purpose is to save FPU state, used:

      - when executing fork()/clone(): to copy current FPU state
        to the child's FPU state.

      - when handling math exceptions: to generate the math error
        si_code in the signal frame.

2) even on legacy CPUs it doesn't actually 'unlazy', if then
   it lazies the FPU state: as a side effect of the old FNSAVE
   instruction which clears (destroys) FPU state it's necessary
   to set CR0::TS.

So rename it to fpu__save() to better reflect its purpose.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:09 +02:00
Paul Gortmaker
ea6cd25058 x86: Rename eisa_set_level_irq to elcr_set_level_irq
This routine has been around for over a decade, but with EISA
being dead and abandoned for about twice that long, the name can
be kind of confusing.  The function is going at the PIC Edge/Level
Configuration Registers (ELCR), so rename it as such and mentally
decouple it from the long since dead EISA bus.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1431217657-934-1-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19 11:23:38 +02:00
Ingo Molnar
7cb6859821 x86/smp/boot: Fix legacy SMP bootup slow-boot bug
So while testing kernels using tools/kvm/ (kvmtool) I noticed that it
booted super slow:

[    0.142991] Performance Events: no PMU driver, software events only.
[    0.149265] x86: Booting SMP configuration:
[    0.149765] .... node  #0, CPUs:          #1
[    0.148304] kvm-clock: cpu 1, msr 2:1bfe9041, secondary cpu clock
[   10.158813] KVM setup async PF for cpu 1
[   10.159000]    #2
[   10.159000] kvm-stealtime: cpu 1, msr 211a4d400
[   10.158829] kvm-clock: cpu 2, msr 2:1bfe9081, secondary cpu clock
[   20.167805] KVM setup async PF for cpu 2
[   20.168000]    #3
[   20.168000] kvm-stealtime: cpu 2, msr 211a8d400
[   20.167818] kvm-clock: cpu 3, msr 2:1bfe90c1, secondary cpu clock
[   30.176902] KVM setup async PF for cpu 3
[   30.177000]    #4
[   30.177000] kvm-stealtime: cpu 3, msr 211acd400

One CPU booted up per 10 seconds. With 120 CPUs that takes a while.

Bisection pinpointed this commit:

  853b160aaa ("Revert f5d6a52f51 ("x86/smpboot: Skip delays during SMP initialization similar to Xen")")

But that commit just restores previous behavior, so it cannot cause the
problem. After some head scratching it turns out that these two commits:

  1a744cb356 ("x86/smp/boot: Remove 10ms delay from cpu_up() on modern processors")
  d68921f9bd ("x86/smp/boot: Add cmdline "cpu_init_udelay=N" to specify cpu_up() delay")

added the following code to smpboot.c:

-               mdelay(10);
+               mdelay(init_udelay);

Note the mismatch in the units: the delay is called 'udelay' and is set
to microseconds - while the function used here is actually 'mdelay',
which counts in milliseconds ...

So the delay for legacy systems is off by a factor of 1,000, so instead
of 10 msecs we waited for 10 seconds ...

The reason bisection pointed to 853b160aaa was that 853b160aaa removed
a (broken) boot-time speedup patch, which masked the factor 1,000 bug.

Fix it by using udelay(). This fixes my bootup problems.

Cc: Len Brown <len.brown@intel.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-18 12:14:25 +02:00
Borislav Petkov
17fea54bf0 x86/mce: Fix MCE severity messages
Derek noticed that a critical MCE gets reported with the wrong
error type description:

  [Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 9: f200003f000100b0
  [Hardware Error]: RIP !INEXACT! 10:<ffffffff812e14c1> {intel_idle+0xb1/0x170}
  [Hardware Error]: TSC 49587b8e321cb
  [Hardware Error]: PROCESSOR 0:306e4 TIME 1431561296 SOCKET 1 APIC 29
  [Hardware Error]: Some CPUs didn't answer in synchronization
  [Hardware Error]: Machine check: Invalid
				   ^^^^^^^

The last line with 'Invalid' should have printed the high level
MCE error type description we get from mce_severity, i.e.
something like:

  [Hardware Error]: Machine check: Action required: data load error in a user process

this happens due to the fact that mce_no_way_out() iterates over
all MCA banks and possibly overwrites the @msg argument which is
used in the panic printing later.

Change behavior to take the message of only and the (last)
critical MCE it detects.

Reported-by: Derek <denc716@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1431936437-25286-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-18 10:31:22 +02:00
Borislav Petkov
e774eaa9f6 x86/microcode/intel: Rename get_matching_sig()
... to find_matching_signature() which is exactly what it does.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431860101-14847-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-18 09:32:37 +02:00
Borislav Petkov
9e5aed83bb x86/microcode/intel: Simplify get_matching_sig()
Unclutter function, make it a bit more readable, drop local
variables.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431860101-14847-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-18 09:32:36 +02:00
Borislav Petkov
6b2d469f5b x86/microcode/intel: Simplify update_match_cpu()
Drop unreadable macro, deconstruct compound conditional
statement into single ones and return early if they match. Add
comments.

There should be no functionality change resulting from this
patch.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431860101-14847-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-18 09:32:36 +02:00
Borislav Petkov
8de3eafc16 x86/microcode/intel: Rename get_matching_microcode
... to has_newer_microcode() as it does exactly that: checks
whether binary data @mc has newer microcode patch than the
applied one. Move @mc to be the first function arg too.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431860101-14847-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-18 09:32:36 +02:00
Ingo Molnar
cffc32975d Merge branch 'x86/asm' into x86/apic, to resolve conflicts
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-17 07:58:08 +02:00
Denys Vlasenko
adeb553784 x86/asm/entry/64: Use shorter MOVs from segment registers
The "movw %ds,%cx" instruction needs a 0x66 prefix, while
"movl %ds,%ecx" does not.

The difference is that latter form (on 64-bit CPUs)
overwrites the entire %ecx, not only its lower half.

But subsequent code doesn't depend on the value of upper
half of %ecx, so we can safely use the shorter instruction.

The new code is also faster than the old one - now we don't
depend on the old value of %ecx, but this code fragment is
not performance-critical so it does not matter much.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1431722346-26585-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-17 07:57:54 +02:00
Borislav Petkov
e839004b49 x86/asm/head*.S: Change global labels to local
Make the disassembly look less confusing:

  -- head_64.o.before.asm
  ++ head_64.o.after.asm
   0000000000000120 <early_idt_handler>:
    120:	fc                   	cld
    121:	83 3c 24 02          	cmpl   $0x2,(%rsp)
  - 125:	0f 84 9d 00 00 00    	je     1c8 <is_nmi>
  + 125:	0f 84 9d 00 00 00    	je     1c8 <early_idt_handler+0xa8>
    12b:	83 3d 00 00 00 00 02 	cmpl   $0x2,0x0(%rip)        # 132 <early_idt_handler+0x12>
    132:	74 7e                	je     1b2 <early_idt_handler+0x92>
    134:	ff 05 00 00 00 00    	incl   0x0(%rip)        # 13a <early_idt_handler+0x1a>
  @@ -1198,9 +1198,7 @@ Disassembly of section .init.text:
    1bf:	5a                   	pop    %rdx
    1c0:	59                   	pop    %rcx
    1c1:	58                   	pop    %rax
  - 1c2:	ff 0d 00 00 00 00    	decl   0x0(%rip)        # 1c8 <is_nmi>
  -
  -00000000000001c8 <is_nmi>:
  + 1c2:	ff 0d 00 00 00 00    	decl   0x0(%rip)        # 1c8 <early_idt_handler+0xa8>
    1c8:	48 83 c4 10          	add    $0x10,%rsp
    1cc:	48 cf                	iretq

  -- head_32.o.before.asm
  ++ head_32.o.after.asm
   0000016c <early_idt_handler>:
    16c:  fc                      cld
    16d:  83 3c 24 02             cmpl   $0x2,(%esp)
  - 171:  74 73                   je     1e6 <is_nmi>
  + 171:  74 73                   je     1e6 <ex_entry+0xc>
    173:  36 83 3d 00 00 00 00    cmpl   $0x2,%ss:0x0
    17a:  02
    17b:  74 5a                   je     1d7 <hlt_loop>
  @@ -483,8 +483,6 @@ Disassembly of section .init.text:
    1dd:  59                      pop    %ecx
    1de:  58                      pop    %eax
    1df:  36 ff 0d 00 00 00 00    decl   %ss:0x0
  -
  -000001e6 <is_nmi>:
    1e6:  83 c4 08                add    $0x8,%esp
    1e9:  cf                      iret
    1ea:  66 90                   xchg   %ax,%ax

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431793079-11153-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-17 07:57:53 +02:00
Ingo Molnar
75d95d8488 Merge branch 'linus' into x86/asm, to resolve conflicts
Conflicts:
	tools/testing/selftests/x86/Makefile
	tools/testing/selftests/x86/run_x86_tests.sh
2015-05-17 07:57:31 +02:00
Thomas Gleixner
6dc1787605 x86: Consolidate irq entering inlines
smp.c and irq_work.c implement the same inline helper. Move it to
apic.h and use it everywhere.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2015-05-15 16:04:49 +02:00
Thomas Gleixner
6af7faf607 x86: Use entering[_ack]_irq() instead of open coding it
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-15 16:03:18 +02:00
Jiang Liu
486ca539ca x86, irq: Allocate CPU vectors from device local CPUs if possible
On NUMA systems, an IO device may be associated with a NUMA node.
It may improve IO performance to allocate resources, such as memory
and interrupts, from device local node.

This patch introduces a mechanism to support CPU vector allocation
policies. It tries to allocate CPU vectors from CPUs on device local
node first, and then fallback to all online(global) CPUs.

This mechanism may be used to support NumaConnect systems to allocate
CPU vectors from device local node.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1430967244-28905-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-13 09:50:24 +02:00
Sergey Senozhatsky
4a00c95dcd x86/hpet: Pass proper pointer to irq_alloc_info
Fix the following oops:
 hpet_msi_get_hwirq+0x1f/0x27
 msi_domain_alloc+0x35/0xfe
 ? trace_hardirqs_on_caller+0x16c/0x188
 irq_domain_alloc_irqs_recursive+0x51/0x95
 __irq_domain_alloc_irqs+0x151/0x223
 hpet_assign_irq+0x5d/0x68
 hpet_msi_capability_lookup+0x121/0x1cb
 ? hpet_enable+0x2b4/0x2b4
 hpet_late_init+0x5f/0xf2
 ? hpet_enable+0x2b4/0x2b4
 do_one_initcall+0x184/0x199
 kernel_init_freeable+0x1af/0x237
 ? rest_init+0x13a/0x13a
 kernel_init+0xe/0xd4
 ret_from_fork+0x3f/0x70
 ? rest_init+0x13a/0x13a

Since 3cb96f0c97 ('x86/hpet: Enhance HPET IRQ to support
hierarchical irqdomains') hpet_msi_capability_lookup() uses
hpet_assign_irq(). The latter initializes irq_alloc_info on stack, but
passes a NULL pointer to irq_domain_alloc_irqs(), which causes a NULL
pointer dereference later in hpet_msi_get_hwirq().

Pass the pointer to the irq_alloc_info irq_domain_alloc_irqs().

Fixes: 3cb96f0c97 'x86/hpet: Enhance HPET IRQ to support hierarchical irqdomains'
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Link: http://lkml.kernel.org/r/20150512041444.GA1094@swordfish
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-13 09:50:24 +02:00
Ingo Molnar
853b160aaa Revert f5d6a52f51 ("x86/smpboot: Skip delays during SMP initialization similar to Xen")
Huang Ying reported x86 boot hangs due to this commit.

Turns out that the change, despite its changelog, does more
than just change timeouts: it also changes the way we
assert/deassert INIT via the APIC_DM_INIT IPI, in the x2apic
case it skips the deassert step.

This is historically fragile code and the patch did not
improve it, so revert these changes.

This commit:

  1a744cb356 ("x86/smp/boot: Remove 10ms delay from cpu_up() on modern processors")

independently removes the worst of the delays (the 10 msec delay).

The remaining delays can be addressed one by one, combined
with careful testing.

Reported-by: Huang Ying <ying.huang@intel.com>
Cc: Anthony Liguori <aliguori@amazon.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Gang Wei <gang.wei@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Deegan <tim@xen.org>
Link: http://lkml.kernel.org/r/1430732554-7294-1-git-send-email-jschoenh@amazon.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-13 08:40:49 +02:00
Len Brown
1a744cb356 x86/smp/boot: Remove 10ms delay from cpu_up() on modern processors
Modern processor familes do not require the 10ms delay
in cpu_up() to de-assert INIT.  This speeds up boot
and resume by 10ms per (application) processor.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/021ce30c88f216ad39686646421194dc25671e55.1431379433.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-12 08:54:33 +02:00
Len Brown
d68921f9bd x86/smp/boot: Add cmdline "cpu_init_udelay=N" to specify cpu_up() delay
No change to default behavior.

Replace the hard-coded mdelay(10) in cpu_up() with a variable
udelay, that is set to a defined default -- rather than a magic
number.

Add a boot-time override, "cpu_init_udelay=N"

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2fe8e6c798e8def271122f62df9bbf58dc283e2a.1431379433.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-12 08:54:32 +02:00
Ingo Molnar
191a66353b Merge branch 'x86/asm' into x86/apic, to resolve a conflict
Conflicts:
	arch/x86/kernel/apic/io_apic.c
	arch/x86/kernel/apic/vector.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-11 16:05:09 +02:00
Stephane Eranian
a41f3c8cd4 perf/x86/intel/uncore: Add Broadwell-U uncore IMC PMU support
This patch enables the uncore Memory Controller (IMC) PMU
support for Intel Broadwell-U (Model 61) mobile processors.
The IMC PMU enables measuring memory bandwidth.

To use with perf:
$ perf stat -a -I 1000 -e
uncore_imc/data_reads/,uncore_imc/data_writes/ sleep 10

Tested-by: Sonny Rao <sonnyrao@chromium.org>
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kan.liang@intel.com
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/20150423065642.GA4890@thinkpad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-11 11:57:47 +02:00
Stephane Eranian
44b11fee51 perf/x86/rapl: Enable Broadwell-U RAPL support
This patch enables RAPL counters (energy consumption counters)
support for Intel Broadwell-U processors (Model 61):

To use:

  $ perf stat -a -I 1000 -e power/energy-cores/,power/energy-pkg/,power/energy-ram/ sleep 10

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jacob.jun.pan@linux.intel.com
Cc: kan.liang@intel.com
Cc: peterz@infradead.org
Cc: sonnyrao@chromium.org
Link: http://lkml.kernel.org/r/20150423070709.GA4970@thinkpad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-11 11:52:30 +02:00
Toshi Kani
cd2f6a5a47 x86/mm/mtrr: Remove incorrect address check in __mtrr_type_lookup()
__mtrr_type_lookup() checks MTRR fixed ranges when mtrr_state.have_fixed
is set and start is less than 0x100000.

However, the 'else if (start < 0x1000000)' in the code checks with an
incorrect address as it has an extra-zero in the address.

The code still runs correctly as this check is meaningless, though.

This patch replaces the incorrect address check with 'else' with no
condition.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1427234921-19737-4-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1431332153-18566-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-11 10:38:44 +02:00
Borislav Petkov
6b44e72a1c x86/cpu/microcode: Zap changelog
It is useless at best and git history has it all detailed
anyway. Update copyright while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431332153-18566-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-11 10:27:09 +02:00
Borislav Petkov
f21262b8e0 x86/alternatives: Switch AMD F15h and later to the P6 NOPs
Software optimization guides for both F15h and F16h cite those
NOPs as the optimal ones. A microbenchmark confirms that
actually even older families are better with the single-insn
NOPs so switch to them for the alternatives.

Cycles count below includes the loop overhead of the measurement
but that overhead is the same with all runs.

	F10h, revE:
	-----------
	Running NOP tests, 1000 NOPs x 1000000 repetitions

	K8:
			      90     288.212282 cycles
			   66 90     288.220840 cycles
			66 66 90     288.219447 cycles
		     66 66 66 90     288.223204 cycles
		  66 66 90 66 90     571.393424 cycles
	       66 66 90 66 66 90     571.374919 cycles
	    66 66 66 90 66 66 90     572.249281 cycles
	 66 66 66 90 66 66 66 90     571.388651 cycles

	P6:
			      90     288.214193 cycles
			   66 90     288.225550 cycles
			0f 1f 00     288.224441 cycles
		     0f 1f 40 00     288.225030 cycles
		  0f 1f 44 00 00     288.233558 cycles
	       66 0f 1f 44 00 00     324.792342 cycles
	    0f 1f 80 00 00 00 00     325.657462 cycles
	 0f 1f 84 00 00 00 00 00     430.246643 cycles

	F14h:
	----
	Running NOP tests, 1000 NOPs x 1000000 repetitions

	K8:
			      90     510.404890 cycles
			   66 90     510.432117 cycles
			66 66 90     510.561858 cycles
		     66 66 66 90     510.541865 cycles
		  66 66 90 66 90    1014.192782 cycles
	       66 66 90 66 66 90    1014.226546 cycles
	    66 66 66 90 66 66 90    1014.334299 cycles
	 66 66 66 90 66 66 66 90    1014.381205 cycles

	P6:
			      90     510.436710 cycles
			   66 90     510.448229 cycles
			0f 1f 00     510.545100 cycles
		     0f 1f 40 00     510.502792 cycles
		  0f 1f 44 00 00     510.589517 cycles
	       66 0f 1f 44 00 00     510.611462 cycles
	    0f 1f 80 00 00 00 00     511.166794 cycles
	 0f 1f 84 00 00 00 00 00     511.651641 cycles

	F15h:
	-----
	Running NOP tests, 1000 NOPs x 1000000 repetitions

	K8:
			      90     243.128396 cycles
			   66 90     243.129883 cycles
			66 66 90     243.131631 cycles
		     66 66 66 90     242.499324 cycles
		  66 66 90 66 90     481.829083 cycles
	       66 66 90 66 66 90     481.884413 cycles
	    66 66 66 90 66 66 90     481.851446 cycles
	 66 66 66 90 66 66 66 90     481.409220 cycles

	P6:
			      90     243.127026 cycles
			   66 90     243.130711 cycles
			0f 1f 00     243.122747 cycles
		     0f 1f 40 00     242.497617 cycles
		  0f 1f 44 00 00     245.354461 cycles
	       66 0f 1f 44 00 00     361.930417 cycles
	    0f 1f 80 00 00 00 00     362.844944 cycles
	 0f 1f 84 00 00 00 00 00     480.514948 cycles

	F16h:
	-----
	Running NOP tests, 1000 NOPs x 1000000 repetitions

	K8:
			      90     507.793298 cycles
			   66 90     507.789636 cycles
			66 66 90     507.826490 cycles
		     66 66 66 90     507.859075 cycles
		  66 66 90 66 90    1008.663129 cycles
	       66 66 90 66 66 90    1008.696259 cycles
	    66 66 66 90 66 66 90    1008.692517 cycles
	 66 66 66 90 66 66 66 90    1008.755399 cycles

	P6:
			      90     507.795232 cycles
			   66 90     507.794761 cycles
			0f 1f 00     507.834901 cycles
		     0f 1f 40 00     507.822629 cycles
		  0f 1f 44 00 00     507.838493 cycles
	       66 0f 1f 44 00 00     507.908597 cycles
	    0f 1f 80 00 00 00 00     507.946417 cycles
	 0f 1f 84 00 00 00 00 00     507.954960 cycles

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431332153-18566-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-11 10:26:05 +02:00
Ingo Molnar
4ddf2a1785 RAS: Add support for deferred errors on AMD (Aravind Gopalakrishnan)
This is an important RAS feature which adds hardware support for
 poisoned data. That means roughly that the hardware marks data which it
 has detected as corrupted but wasn't able to correct, as poisoned data
 and raises an APIC interrupt to signal that in the form of a deferred
 error. It is the OS's responsibility then to take proper recovery action
 and thus prolonge system lifetime as far as possible.
 
 Misc cleanups ontop. (Borislav Petkov)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVUF2sAAoJEBLB8Bhh3lVKXZ4QAJ3UdVM1/TuqAsZ7+jLkb7BZ
 BRyWgv31CcX5fM1D0vV+6K+4GPPsLAtNVYy2G+LauFX1bfE1f9ExWKlMzp45h1sS
 xaNLDhIIP+aE4kD1J7mlNc0WlF0ghlfX+iaGc7lI+j3o2Ydlxm15Pt6Te9hDI7en
 C1NOWrkJ0+BJv48bPeJ835CLu+DZ6xktWdJ1In88PNUA9YiTj12/nhMKkaGbh3zv
 Ep3FCFD/tHcecRK/rVmSTE3cG50SLKtndh/Kl7s1wYhgw6ERyg3x/t8QefZkuU0Q
 6fbetgYS9VvpewViAuNemoCHY5qxBNHPLsn6vwhluzlelW1CcgINU8LHcGZiaLmd
 DYVM9bHfSrKrHhH0M55XPn9RQSZpA+cTep3IyQzCK+jmLBiqrH3bMIRHjNQRUOLy
 DsGLm51tQqaMmnhDma8mMjF7LN+iBqNxXeqvkxQxQBE5NVLXHoaajOgUuj/N59WE
 FEFa65rmTrmsmgjAn9BPBk0zeoyQaYFKCLhENB19Vlt/4YoY/vHvzFYJNEcQT5ZU
 kuM8/hSEqeYZH4ZjJ8i9zKVado7z6pRQqV/lwRJ27tuXy9+9y6pV+ewmk8gCjQe4
 gvySlHbIlfO5geF59GYenp4ll5CdZFvIJuwhybDBZhk3C7M2M7X/xgHnJnprza6j
 YVzOp7Jj2aeHGImqGL49
 =3e5/
 -----END PGP SIGNATURE-----

Merge tag 'ras_for_4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull RAS updates from Borislav Petkov:

  - RAS: Add support for deferred errors on AMD (Aravind Gopalakrishnan)

    This is an important RAS feature which adds hardware support for
    poisoned data. That means roughly that the hardware marks data which it
    has detected as corrupted but wasn't able to correct, as poisoned data
    and raises an APIC interrupt to signal that in the form of a deferred
    error. It is the OS's responsibility then to take proper recovery action
    and thus prolonge system lifetime as far as possible.

  - Misc cleanups ontop. (Borislav Petkov)"

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-11 10:05:19 +02:00
Ingo Molnar
62c7a1e9ae locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS
Valentin Rothberg reported that we use CONFIG_QUEUED_SPINLOCKS
in arch/x86/kernel/paravirt_patch_32.c, while the symbol is
called CONFIG_QUEUED_SPINLOCK. (Note the extra 'S')

But the typo was natural: the proper English term for such
a generic object would be 'queued spinlocks' - so rename
this and related symbols accordingly to the plural form.

Reported-by: Valentin Rothberg <valentinrothberg@gmail.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-11 09:52:09 +02:00
Brian Gerst
8b455e6577 x86/asm/entry/irq: Clean up IRQn_VECTOR macros
Since the ISA irqs are in a single block, use
ISA_IRQ_VECTOR(irq) instead of individual macros.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431185813-15413-5-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-10 12:34:28 +02:00
Brian Gerst
51bb92843e x86/asm/entry: Remove SYSCALL_VECTOR
Use IA32_SYSCALL_VECTOR for both compat and native.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431185813-15413-4-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-10 12:34:28 +02:00
Brian Gerst
c5bde906d2 x86/irq: Merge irq_regs & irq_stat
Move irq_regs and irq_stat definitions to irq.c.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431185813-15413-2-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-10 12:34:27 +02:00
Denys Vlasenko
3a23208e69 x86/entry: Define 'cpu_current_top_of_stack' for 64-bit code
32-bit code has PER_CPU_VAR(cpu_current_top_of_stack).
64-bit code uses somewhat more obscure: PER_CPU_VAR(cpu_tss + TSS_sp0).

Define the 'cpu_current_top_of_stack' macro on CONFIG_X86_64
as well so that the PER_CPU_VAR(cpu_current_top_of_stack)
expression can be used in both 32-bit and 64-bit code.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429889495-27850-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 13:50:02 +02:00
Denys Vlasenko
fed7c3f0f7 x86/entry: Remove unused 'kernel_stack' per-cpu variable
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429889495-27850-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 13:49:43 +02:00
Denys Vlasenko
63332a8455 x86/entry: Stop using PER_CPU_VAR(kernel_stack)
PER_CPU_VAR(kernel_stack) is redundant:

  - On the 64-bit build, we can use PER_CPU_VAR(cpu_tss + TSS_sp0).
  - On the 32-bit build, we can use PER_CPU_VAR(cpu_current_top_of_stack).

PER_CPU_VAR(kernel_stack) will be deleted by a separate change.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429889495-27850-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 13:43:52 +02:00
Ingo Molnar
7ae383be81 Merge branch 'linus' into x86/asm, before applying dependent patch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 13:33:33 +02:00
Dave Airlie
e1dee1973c Merge tag 'drm-intel-next-2015-04-23-fixed' of git://anongit.freedesktop.org/drm-intel into drm-next
drm-intel-next-2015-04-23:
- dither support for ns2501 dvo (Thomas Richter)
- some polish for the gtt code and fixes to finally enable the cmd parser on hsw
- first pile of bxt stage 1 enabling (too many different people to list ...)
- more psr fixes from Rodrigo
- skl rotation support from Chandra
- more atomic work from Ander and Matt
- pile of cleanups and micro-ops for execlist from Chris
drm-intel-next-2015-04-10:
- cdclk handling cleanup and fixes from Ville
- more prep patches for olr removal from John Harrison
- gmbus pin naming rework from Jani (prep for bxt)
- remove ->new_config from Ander (more atomic conversion work)
- rps (boost) tuning and unification with byt/bsw from Chris
- cmd parser batch bool tuning from Chris
- gen8 dynamic pte allocation (Michel Thierry, based on work from Ben Widawsky)
- execlist tuning (not yet all of it) from Chris
- add drm_plane_from_index (Chandra)
- various small things all over

* tag 'drm-intel-next-2015-04-23-fixed' of git://anongit.freedesktop.org/drm-intel: (204 commits)
  drm/i915/gtt: Allocate va range only if vma is not bound
  drm/i915: Enable cmd parser to do secure batch promotion for aliasing ppgtt
  drm/i915: fix intel_prepare_ddi
  drm/i915: factor out ddi_get_encoder_port
  drm/i915/hdmi: check port in ibx_infoframe_enabled
  drm/i915/hdmi: fix vlv infoframe port check
  drm/i915: Silence compiler warning in dvo
  drm/i915: Update DRIVER_DATE to 20150423
  drm/i915: Enable dithering on NatSemi DVO2501 for Fujitsu S6010
  rm/i915: Move i915_get_ggtt_vma_pages into ggtt_bind_vma
  drm/i915: Don't try to outsmart gcc in i915_gem_gtt.c
  drm/i915: Unduplicate i915_ggtt_unbind/bind_vma
  drm/i915: Move ppgtt_bind/unbind around
  drm/i915: move i915_gem_restore_gtt_mappings around
  drm/i915: Fix up the vma aliasing ppgtt binding
  drm/i915: Remove misleading comment around bind_to_vm
  drm/i915: Don't use atomics for pg_dirty_rings
  drm/i915: Don't look at pg_dirty_rings for aliasing ppgtt
  drm/i915/skl: Support Y tiling in MMIO flips
  drm/i915: Fixup kerneldoc for struct intel_context
  ...

Conflicts:
	drivers/gpu/drm/i915/i915_drv.c
2015-05-08 20:51:06 +10:00
Ingo Molnar
99e711101c Merge branch 'linus' into x86/cleanups, before applying dependent patch 2015-05-08 12:41:09 +02:00
Waiman Long
bf0c7c34ad locking/pvqspinlock, x86: Enable PV qspinlock for KVM
This patch adds the necessary KVM specific code to allow KVM to
support the CPU halting and kicking operations needed by the queue
spinlock PV code.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-11-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 12:37:17 +02:00
Peter Zijlstra (Intel)
f233f7f158 locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching
We use the regular paravirt call patching to switch between:

  native_queued_spin_lock_slowpath()	__pv_queued_spin_lock_slowpath()
  native_queued_spin_unlock()		__pv_queued_spin_unlock()

We use a callee saved call for the unlock function which reduces the
i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions
again.

We further optimize the unlock path by patching the direct call with a
"movb $0,%arg1" if we are indeed using the native unlock code. This
makes the unlock code almost as fast as the !PARAVIRT case.

This significantly lowers the overhead of having
CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-10-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 12:37:09 +02:00
Kan Liang
6d37405635 perf/x86/intel: Fix SLM cache event list
iTLB-load-misses and LLC-load-misses count incorrectly on SLM.

There is no ITLB.MISSES support on SLM. Event PAGE_WALKS.I_SIDE_WALK
should be used to count iTLB-load-misses. This event counts when an
instruction (I) page walk is completed or started. Since a page walk
implies a TLB miss, the number of TLB misses can be counted by counting
the number of pagewalks.

DMND_DATA_RD counts both demand and DCU prefetch data reads. However,
LLC-load-misses should only count demand reads. There is no way to not
include prefetches with a single counter on SLM. So the LLC-load-misses
support should be removed on SLM.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1429608881-5055-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 11:59:41 +02:00
Denys Vlasenko
03335e95e2 x86/asm/entry/64: Clean up usage of TEST insns
By the nature of TEST operation, it is often possible
to test a narrower part of the operand:

    "testl $3, mem"  -> "testb $3, mem"

This results in shorter insns, because TEST insn has no
sign-entending byte-immediate forms unlike other ALU ops.

   text	   data	    bss	    dec	    hex	filename
  11674	      0	      0	  11674	   2d9a	entry_64.o.before
  11658	      0	      0	  11658	   2d8a	entry_64.o

Changes in object code:

-	f7 84 24 88 00 00 00 03 00 00 00 	testl  $0x3,0x88(%rsp)
+	f6 84 24 88 00 00 00 03	         	testb  $0x3,0x88(%rsp)
-	f7 44 24 68 03 00 00 00          	testl  $0x3,0x68(%rsp)
+	f6 44 24 68 03                  	testb  $0x3,0x68(%rsp)
-	f7 84 24 90 00 00 00 03 00 00 00	testl  $0x3,0x90(%rsp)
+	f6 84 24 90 00 00 00 03         	testb  $0x3,0x90(%rsp)

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1430140912-7960-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 11:07:32 +02:00
Denys Vlasenko
dde74f2e4a x86/asm/entry/64: Tidy up JZ insns after TESTs
After TESTs, use logically correct JZ/JNZ mnemonics instead of
JE/JNE. This doesn't change code.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1430140912-7960-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 11:07:31 +02:00
Borislav Petkov
3490c0e45f x86/mce/amd: Zap changelog
It is useless and git history has it all detailed anyway. Update
copyright while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
2015-05-07 12:06:43 +02:00
Borislav Petkov
8cd161b1f7 x86/traps: Remove superfluous weak definitions and dead code
Those were leftovers of the x86 merge, see

  081f75bbdc ("traps: x86: make traps_32.c and traps_64.c equal")

for example and are not needed now.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-07 11:38:08 +02:00
Luiz Capitulino
ff7bbb9c6a kvmclock: set scheduler clock stable
If you try to enable NOHZ_FULL on a guest today, you'll get
the following error when the guest tries to deactivate the
scheduler tick:

 WARNING: CPU: 3 PID: 2182 at kernel/time/tick-sched.c:192 can_stop_full_tick+0xb9/0x290()
 NO_HZ FULL will not work with unstable sched clock
 CPU: 3 PID: 2182 Comm: kworker/3:1 Not tainted 4.0.0-10545-gb9bb6fb #204
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 Workqueue: events flush_to_ldisc
  ffffffff8162a0c7 ffff88011f583e88 ffffffff814e6ba0 0000000000000002
  ffff88011f583ed8 ffff88011f583ec8 ffffffff8104d095 ffff88011f583eb8
  0000000000000000 0000000000000003 0000000000000001 0000000000000001
 Call Trace:
  <IRQ>  [<ffffffff814e6ba0>] dump_stack+0x4f/0x7b
  [<ffffffff8104d095>] warn_slowpath_common+0x85/0xc0
  [<ffffffff8104d146>] warn_slowpath_fmt+0x46/0x50
  [<ffffffff810bd2a9>] can_stop_full_tick+0xb9/0x290
  [<ffffffff810bd9ed>] tick_nohz_irq_exit+0x8d/0xb0
  [<ffffffff810511c5>] irq_exit+0xc5/0x130
  [<ffffffff814f180a>] smp_apic_timer_interrupt+0x4a/0x60
  [<ffffffff814eff5e>] apic_timer_interrupt+0x6e/0x80
  <EOI>  [<ffffffff814ee5d1>] ? _raw_spin_unlock_irqrestore+0x31/0x60
  [<ffffffff8108bbc8>] __wake_up+0x48/0x60
  [<ffffffff8134836c>] n_tty_receive_buf_common+0x49c/0xba0
  [<ffffffff8134a6bf>] ? tty_ldisc_ref+0x1f/0x70
  [<ffffffff81348a84>] n_tty_receive_buf2+0x14/0x20
  [<ffffffff8134b390>] flush_to_ldisc+0xe0/0x120
  [<ffffffff81064d05>] process_one_work+0x1d5/0x540
  [<ffffffff81064c81>] ? process_one_work+0x151/0x540
  [<ffffffff81065191>] worker_thread+0x121/0x470
  [<ffffffff81065070>] ? process_one_work+0x540/0x540
  [<ffffffff8106b4df>] kthread+0xef/0x110
  [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
  [<ffffffff814ef4f2>] ret_from_fork+0x42/0x70
  [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
 ---[ end trace 06e3507544a38866 ]---

However, it turns out that kvmclock does provide a stable
sched_clock callback. So, let the scheduler know this which
in turn makes NOHZ_FULL work in the guest.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07 11:28:20 +02:00
Aravind Gopalakrishnan
868c00bb59 x86/mce/amd: Rename setup_APIC_mce
'setup_APIC_mce' doesn't give us an indication of why we are
going to program LVT. Make that explicit by renaming it to
setup_APIC_mce_threshold so we know.

No functional change is introduced.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1430913538-1415-7-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-07 10:33:40 +02:00
Aravind Gopalakrishnan
24fd78a81f x86/mce/amd: Introduce deferred error interrupt handler
Deferred errors indicate error conditions that were not corrected, but
require no action from S/W (or action is optional).These errors provide
info about a latent UC MCE that can occur when a poisoned data is
consumed by the processor.

Processors that report these errors can be configured to generate APIC
interrupts to notify OS about the error.

Provide an interrupt handler in this patch so that OS can catch these
errors as and when they happen. Currently, we simply log the errors and
exit the handler as S/W action is not mandated.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1430913538-1415-5-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-07 10:23:32 +02:00
Linus Torvalds
0e1dc42748 xen: bug fixes for 4.1-rc2
- Fix blkback regression if using persistent grants.
 - Fix various event channel related suspend/resume bugs.
 - Fix AMD x86 regression with X86_BUG_SYSRET_SS_ATTRS.
 - SWIOTLB on ARM now uses frames <4 GiB (if available) so device only
   capable of 32-bit DMA work.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJVSiC1AAoJEFxbo/MsZsTRojgH/1zWPD0r5WMAEPb6DFdb7Ga1
 SqBbyHFu43axNwZ7EvUzSqI8BKDPbTnScQ3+zC6Zy1SIEfS+40+vn7kY/uASmWtK
 LYaYu8nd49OZP8ykH0HEvsJ2LXKnAwqAwvVbEigG7KJA7h8wXo7aDwdwxtZmHlFP
 18xRTfHcrnINtAJpjVRmIGZsCMXhXQz4bm0HwsXTTX0qUcRWtxydKDlMPTVFyWR8
 wQ2m5+76fQ8KlFsoJEB0M9ygFdheZBF4FxBGHRrWXBUOhHrQITnH+cf1aMVxTkvy
 NDwiEebwXUDHacv21onszoOkNjReLsx+DWp9eHknlT/fgPo6tweMM2yazFGm+JQ=
 =W683
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.1b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:

 - fix blkback regression if using persistent grants

 - fix various event channel related suspend/resume bugs

 - fix AMD x86 regression with X86_BUG_SYSRET_SS_ATTRS

 - SWIOTLB on ARM now uses frames <4 GiB (if available) so device only
   capable of 32-bit DMA work.

* tag 'for-linus-4.1b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM
  hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests
  xen/events: Set irq_info->evtchn before binding the channel to CPU in __startup_pirq()
  xen/console: Update console event channel on resume
  xen/xenbus: Update xenbus event channel on resume
  xen/events: Clear cpu_evtchn_mask before resuming
  xen-pciback: Add name prefix to global 'permissive' variable
  xen: Suspend ticks on all CPUs during suspend
  xen/grant: introduce func gnttab_unmap_refs_sync()
  xen/blkback: safely unmap purge persistent grants
2015-05-06 15:58:06 -07:00
Aravind Gopalakrishnan
7559e13fb4 x86/mce: Add support for deferred errors on AMD
Deferred errors indicate error conditions that were not corrected, but
those errors have not been consumed yet. They require no action from
S/W (or action is optional). These errors provide info about a latent
uncorrectable MCE that can occur when a poisoned data is consumed by the
processor.

Newer AMD processors can generate deferred errors and can be configured
to generate APIC interrupts on such events.

SUCCOR stands for S/W UnCorrectable error COntainment and Recovery.
It indicates support for data poisoning in HW and deferred error
interrupts.

Add new bitfield to mce_vendor_flags for this. We use this to verify
presence of deferred error interrupts before we enable them in mce_amd.c

While at it, clarify comments in mce_vendor_flags to provide an
indication of usages of the bitfields.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1430913538-1415-4-git-send-email-Aravind.Gopalakrishnan@amd.com
[ beef up commit message, do CPUID(8000_0007) only once. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-06 20:34:31 +02:00
Linus Torvalds
3d54ac9e35 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "EFI fixes, and FPU fix, a ticket spinlock boundary condition fix and
  two build fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Always restore_xinit_state() when use_eager_cpu()
  x86: Make cpu_tss available to external modules
  efi: Fix error handling in add_sysfs_runtime_map_entry()
  x86/spinlocks: Fix regression in spinlock contention detection
  x86/mm: Clean up types in xlate_dev_mem_ptr()
  x86/efi: Store upper bits of command line buffer address in ext_cmd_line_ptr
  efivarfs: Ensure VariableName is NUL-terminated
2015-05-06 10:57:37 -07:00
Aravind Gopalakrishnan
6e6e746e33 x86/mce/amd: Collect valid address before logging an error
amd_decode_mce() needs value in m->addr so it can report the error
address correctly. This should be setup in __log_error() before we call
mce_log(). We do this because the error address is an important bit of
information which should be conveyed to userspace.

The correct output then reports proper address, like this:

  [Hardware Error]: Corrected error, no action required.
  [Hardware Error]: CPU:0 (15:60:0) MC0_STATUS [-|CE|-|-|AddrV|-|-|CECC]: 0x840041000028017b
  [Hardware Error]: MC0 Error Address: 0x00001f808f0ff040

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1430913538-1415-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-06 19:49:31 +02:00
Aravind Gopalakrishnan
afdf344e08 x86/mce/amd: Factor out logging mechanism
Refactor the code here to setup struct mce and call mce_log() to log
the error. We're going to reuse this in a later patch as part of the
deferred error interrupt enablement.

No functional change is introduced.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1430913538-1415-2-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-06 19:49:20 +02:00
Linus Torvalds
d8fce2db72 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Mostly tooling fixes, but also an uncore PMU driver fix and an uncore
  PMU driver hardware-enablement addition"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf probe: Fix segfault if passed with ''.
  perf report: Fix -T/--threads option to work again
  perf bench numa: Fix immediate meeting of convergence condition
  perf bench numa: Fixes of --quiet argument
  perf bench futex: Fix hung wakeup tasks after requeueing
  perf probe: Fix bug with global variables handling
  perf top: Fix a segfault when kernel map is restricted.
  tools lib traceevent: Fix build failure on 32-bit arch
  perf kmem: Fix compiles on RHEL6/OL6
  tools lib api: Undefine _FORTIFY_SOURCE before setting it
  perf kmem: Consistently use PRIu64 for printing u64 values
  perf trace: Disable events and drain events when forked workload ends
  perf trace: Enable events when doing system wide tracing and starting a workload
  perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver
  perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
  perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu
2015-05-06 10:47:25 -07:00
Borislav Petkov
760d765b2b x86/microcode: Parse built-in microcode early
Apparently, people do build microcode into the kernel image, i.e.
CONFIG_FIRMWARE_IN_KERNEL=y.

Make that work in the early loader which is where microcode should be
preferably loaded anyway.

Note that you need to specify the microcode filename with the path
relative to the toplevel firmware directory (the same like the late
loading method) in CONFIG_EXTRA_FIRMWARE=y so that early loader can
find it.

I.e., something like this (Intel variant):

  CONFIG_FIRMWARE_IN_KERNEL=y
  CONFIG_EXTRA_FIRMWARE="intel-ucode/06-3a-09"
  CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware/"

While at it, add me to the loader copyright boilerplate.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-06 11:24:53 +02:00
Borislav Petkov
da9b50765e x86/microcode/intel: Remove unused @rev arg of get_matching_sig()
@rev wasn't used in get_matching_sig(), drop it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-06 11:24:52 +02:00
Borislav Petkov
a1a32d29f9 x86/microcode/intel: Get rid of revision_is_newer()
It is a one-liner for checking microcode header revisions. On top of
that, it can be used wrong as it was the case in _save_mc(). Get rid of
it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-06 11:24:44 +02:00
Bobby Powers
c88d47480d x86/fpu: Always restore_xinit_state() when use_eager_cpu()
The following commit:

  f893959b08 ("x86/fpu: Don't abuse drop_init_fpu() in flush_thread()")

removed drop_init_fpu() usage from flush_thread(). This seems to break
things for me - the Go 1.4 test suite fails all over the place with
floating point comparision errors (offending commit found through
bisection).

The functional change was that flush_thread() after this commit
only calls restore_init_xstate() when both use_eager_fpu() and
!used_math() are true. drop_init_fpu() (now fpu_reset_state()) calls
restore_init_xstate() regardless of whether current used_math() - apply
the same logic here.

Switch used_math() -> tsk_used_math(tsk) to consistently use the grabbed
tsk instead of current, like in the rest of flush_thread().

Tested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Bobby Powers <bobbypowers@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: f893959b ("x86/fpu: Don't abuse drop_init_fpu() in flush_thread()")
Link: http://lkml.kernel.org/r/1430147441-9820-1-git-send-email-bobbypowers@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-06 11:22:03 +02:00
Aravind Gopalakrishnan
b9d16a2a21 x86/cpu/amd: Set X86_FEATURE_EXTD_APICID for future processors
Decision to use a 4-bit mask or 8-bit mask in default_get_apic_id()
is controlled by setting capability bit X86_FEATURE_EXTD_APICID.

Currently, we detect extended APIC ID support by accessing Link
Transaction Control register D18F0x68 in PCI config space.

But, not even that is needed as we can safely postulate that future
AMD processors will support 8-bit APIC IDs and we can simply set that
feature bit on them, without the PCI access.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@linux.intel.com
Cc: hecmargi@upv.es
Cc: mgorman@suse.de
Link: http://lkml.kernel.org/r/1430148351-9013-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-06 11:16:53 +02:00
Aravind Gopalakrishnan
1b4574292e x86/gart: Check for GART support before accessing GART registers
GART registers are not present in newer AMD processors (Fam15h, Model
10h and later). So, avoid accessing those in PCI config space by
returning early in early_gart_iommu_check() and gart_iommu_hole_init()
if GART is not available.

Current code doesn't break on existing processors but there are some
side effects:

We get bogus AGP aperture messages which are simply noise on
GART-less processors:

  AGP: Node 0: aperture [bus addr 0x00000000-0x01ffffff] (32MB)
  AGP: Your BIOS doesn't leave aperture memory hole
  AGP: Please enable the IOMMU option in the BIOS setup
  AGP: This costs you 64MB of RAM
  AGP: Mapping aperture over RAM [mem 0xd4000000-0xd7ffffff]

We can avoid calling allocate_aperture() and would not have to
wastefully reserve 64MB of RAM with memblock_reserve(). Also, we can
avoid having to loop through all PCI buses and devices twice, searching
for a non-existent AGP bridge if we bail out early.

Refactor the family check used in amd_nb.c into an inline function so we
can use it here as well as in amd_nb.c

Fix some typos while at it.

Tested the patch on Fam10h and Fam15h Model 00h-fh and this code runs
fine. On Fam15h Model 60h-6fh and on Fam16h, we bail early as they don't
have GART.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Rodel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1428443197-3834-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-06 11:15:53 +02:00
Jan H. Schönherr
f5d6a52f51 x86/smpboot: Skip delays during SMP initialization similar to Xen
Remove the per-CPU delays during SMP initialization, which seems
to be possible on newer architectures with an x2APIC.

Xen does this since 2011. In fact, this commit is basically a
combination of the following Xen commits. The first removes the
delays, the second fixes an issue with the removal:

  commit 68fce206f6dba9981e8322269db49692c95ce250
  Author: Tim Deegan <Tim.Deegan@citrix.com>
  Date:   Tue Jul 19 14:13:01 2011 +0100

    x86: Remove timeouts from INIT-SIPI-SIPI sequence when using x2apic.

    Some of the timeouts are pointless since they're waiting for the ICR
    to ack the IPI delivery and that doesn't happen on x2apic.
    The others should be benign (and are suggested in the SDM) but
    removing them makes AP bringup much more reliable on some test boxes.

    Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>

  commit f12ee533150761df5a7099c83f2a5fa6c07d1187
  Author: Gang Wei <gang.wei@intel.com>
  Date:   Thu Dec 29 10:07:54 2011 +0000

    X86: Add a delay between INIT & SIPIs for tboot AP bring-up in X2APIC case

    Without this delay, Xen could not bring APs up while working with
    TXT/tboot, because tboot needs some time in APs to handle INIT before
    becoming ready for receiving SIPIs (this delay was removed as part of
    c/s 23724 by Tim Deegan).

    Signed-off-by: Gang Wei <gang.wei@intel.com>
    Acked-by: Keir Fraser <keir@xen.org>
    Acked-by: Tim Deegan <tim@xen.org>
    Committed-by: Tim Deegan <tim@xen.org>

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Cc: Anthony Liguori <aliguori@amazon.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Gang Wei <gang.wei@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Deegan <tim@xen.org>
Link: http://lkml.kernel.org/r/1430732554-7294-1-git-send-email-jschoenh@amazon.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-06 10:24:51 +02:00
Denys Vlasenko
f1dc154f82 x86: Deinline dma_free_attrs()
Reduces kernel size by 76720 bytes on allyesconfig build:

    text     data      bss       dec     hex filename
82594029 22255352 20627456 125476837 77a9fe5 vmlinux1
82517277 22255384 20627456 125400117 7797435 vmlinux2

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1428926075-28796-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-05 20:48:02 +02:00
Denys Vlasenko
0c7965ff22 x86: Deinline dma_alloc_attrs()
Reduces kernel size by 68739 bytes on allyesconfig build:

    text     data      bss       dec     hex filename
82662736 22255384 20627456 125545576 77bac68 vmlinux0
82594029 22255352 20627456 125476837 77a9fe5 vmlinux1

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1428926075-28796-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-05 20:48:02 +02:00
Brian Gerst
c07e5a542e x86: Remove unused TI_cpu
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/1428844486-6638-2-git-send-email-brgerst@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-05 20:48:02 +02:00
Brian Gerst
fd91784beb x86: Merge common 32-bit values in asm-offsets.c
Merge common values for 32-bit native and compat.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/1428844486-6638-1-git-send-email-brgerst@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-05 20:48:02 +02:00
Marc Dionne
de71ad2c97 x86: Make cpu_tss available to external modules
Commit 75182b1632 ("x86/asm/entry: Switch all C consumers of
kernel_stack to this_cpu_sp0()") changed current_thread_info
to use this_cpu_sp0, and indirectly made it rely on init_tss
which was exported with EXPORT_PER_CPU_SYMBOL_GPL.
As a result some macros and inline functions such as set/get_fs,
test_thread_flag and variants have been made unusable for
external modules.

Make cpu_tss exported with EXPORT_PER_CPU_SYMBOL so that these
functions are accessible again, as they were previously.

Signed-off-by: Marc Dionne <marc.dionne@your-file-system.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/1430763404-21221-1-git-send-email-marc.dionne@your-file-system.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-05 20:40:31 +02:00
Boris Ostrovsky
a71dbdaa8c hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests
Commit 61f01dd941 ("x86_64, asm: Work around AMD SYSRET SS descriptor
attribute issue") makes AMD processors set SS to __KERNEL_DS in
__switch_to() to deal with cases when SS is NULL.

This breaks Xen PV guests who do not want to load SS with__KERNEL_DS.

Since the problem that the commit is trying to address would have to be
fixed in the hypervisor (if it in fact exists under Xen) there is no
reason to set X86_BUG_SYSRET_SS_ATTRS flag for PV VPCUs here.

This can be easily achieved by adding x86_hyper_xen_hvm.set_cpu_features
op which will clear this flag. (And since this structure is no longer
HVM-specific we should do some renaming).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-05-05 18:27:43 +01:00
Jan Kiszka
781674fc33 x86/x2apic: Acpi_gbl_FADT existence depends on CONFIG_ACPI
If ACPI is disabled, acpi_gbl_FADT is not available, and and the build
breaks.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Link: http://lkml.kernel.org/r/55479708.2000104@siemens.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-05 14:01:37 +02:00
Thomas Gleixner
eb18cf55c2 x86: Constify irqdomain ops
Nothing changes those ops. Make the initializers readable while at it.

Reported-by: Krzysztof Kozlowski <k.kozlowski.k@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-05 11:14:48 +02:00
Jiri Kosina
4545c89880 x86: introduce kaslr_offset()
Offset that has been chosen for kaslr during kernel decompression can be
easily computed as a difference between _text and __START_KERNEL. We are
already making use of this in dump_kernel_offset() notifier and in
arch_crash_save_vmcoreinfo().

Introduce kaslr_offset() that makes this computation instead of hard-coding
it, so that other kernel code (such as live patching) can make use of it.
Also convert existing users to make use of it.

This patch is equivalent transofrmation without any effects on the resulting
code:

	$ diff -u vmlinux.old.asm vmlinux.new.asm
	--- vmlinux.old.asm     2015-04-28 17:55:19.520983368 +0200
	+++ vmlinux.new.asm     2015-04-28 17:55:24.141206072 +0200
	@@ -1,5 +1,5 @@

	-vmlinux.old:     file format elf64-x86-64
	+vmlinux.new:     file format elf64-x86-64

	Disassembly of section .text:
	$

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-04-29 16:51:33 +02:00
Paolo Bonzini
73459e2a1a x86: pvclock: Really remove the sched notifier for cross-cpu migrations
This reverts commits 0a4e6be9ca
and 80f7fdb1c7.

The task migration notifier was originally introduced in order to support
the pvclock vsyscall with non-synchronized TSC, but KVM only supports it
with synchronized TSC.  Hence, on KVM the race condition is only needed
due to a bad implementation on the host side, and even then it's so rare
that it's mostly theoretical.

As far as KVM is concerned it's possible to fix the host, avoiding the
additional complexity in the vDSO and the (re)introduction of the task
migration notifier.

Xen, on the other hand, hasn't yet implemented vsyscall support at
all, so we do not care about its plans for non-synchronized TSC.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-27 15:49:30 +02:00
Andy Lutomirski
61f01dd941 x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue
AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
SS == 0 results in an invalid usermode state in which SS is apparently
equal to __USER_DS but causes #SS if used.

Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
ensuring that SYSRET never happens with SS set to NULL.

This was exposed by a recent vDSO cleanup.

Fixes: e7d6eefaaa x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-26 17:57:38 -07:00
Linus Torvalds
836ee4874e Initial ACPI support for arm64:
This series introduces preliminary ACPI 5.1 support to the arm64 kernel
 using the "hardware reduced" profile. We don't support any peripherals
 yet, so it's fairly limited in scope:
 
 - Memory init (UEFI)
 - ACPI discovery (RSDP via UEFI)
 - CPU init (FADT)
 - GIC init (MADT)
 - SMP boot (MADT + PSCI)
 - ACPI Kconfig options (dependent on EXPERT)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJVNOC2AAoJELescNyEwWM08dIH/1Pn5xa04wwNDn0MOpbuQMk2
 kHM7hx69fbXflTJpnZRVyFBjRxxr5qilA7rljAFLnFeF8Fcll/s5VNy7ElHKLISq
 CB0ywgUfOd/sFJH57rcc67pC1b/XuqTbE1u1NFwvp2R3j1kGAEJWNA6SyxIP4bbc
 NO5jScx0lQOJ3rrPAXBW8qlGkeUk7TPOQJtMrpftNXlFLFrR63rPaEmMZ9dWepBF
 aRE4GXPvyUhpyv5o9RvlN5l8bQttiRJ3f9QjyG7NYhX0PXH3DyvGUzYlk2IoZtID
 v3ssCQH3uRsAZHIBhaTyNqFnUIaDR825bvGqyG/tj2Dt3kQZiF+QrfnU5D9TuMw=
 =zLJn
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull initial ACPI support for arm64 from Will Deacon:
 "This series introduces preliminary ACPI 5.1 support to the arm64
  kernel using the "hardware reduced" profile.  We don't support any
  peripherals yet, so it's fairly limited in scope:

   - MEMORY init (UEFI)

   - ACPI discovery (RSDP via UEFI)

   - CPU init (FADT)

   - GIC init (MADT)

   - SMP boot (MADT + PSCI)

   - ACPI Kconfig options (dependent on EXPERT)

  ACPI for arm64 has been in development for a while now and hardware
  has been available that can boot with either FDT or ACPI tables.  This
  has been made possible by both changes to the ACPI spec to cater for
  ARM-based machines (known as "hardware-reduced" in ACPI parlance) but
  also a Linaro-driven effort to get this supported on top of the Linux
  kernel.  This pull request is the result of that work.

  These changes allow us to initialise the CPUs, interrupt controller,
  and timers via ACPI tables, with memory information and cmdline coming
  from EFI.  We don't support a hybrid ACPI/FDT scheme.  Of course,
  there is still plenty of work to do (a serial console would be nice!)
  but I expect that to happen on a per-driver basis after this core
  series has been merged.

  Anyway, the diff stat here is fairly horrible, but splitting this up
  and merging it via all the different subsystems would have been
  extremely painful.  Instead, we've got all the relevant Acks in place
  and I've not seen anything other than trivial (Kconfig) conflicts in
  -next (for completeness, I've included my resolution below).  Nearly
  half of the insertions fall under Documentation/.

  So, we'll see how this goes.  Right now, it all depends on EXPERT and
  I fully expect people to use FDT by default for the immediate future"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (31 commits)
  ARM64 / ACPI: make acpi_map_gic_cpu_interface() as void function
  ARM64 / ACPI: Ignore the return error value of acpi_map_gic_cpu_interface()
  ARM64 / ACPI: fix usage of acpi_map_gic_cpu_interface
  ARM64: kernel: acpi: honour acpi=force command line parameter
  ARM64: kernel: acpi: refactor ACPI tables init and checks
  ARM64: kernel: psci: let ACPI probe PSCI version
  ARM64: kernel: psci: factor out probe function
  ACPI: move arm64 GSI IRQ model to generic GSI IRQ layer
  ARM64 / ACPI: Don't unflatten device tree if acpi=force is passed
  ARM64 / ACPI: additions of ACPI documentation for arm64
  Documentation: ACPI for ARM64
  ARM64 / ACPI: Enable ARM64 in Kconfig
  XEN / ACPI: Make XEN ACPI depend on X86
  ARM64 / ACPI: Select ACPI_REDUCED_HARDWARE_ONLY if ACPI is enabled on ARM64
  clocksource / arch_timer: Parse GTDT to initialize arch timer
  irqchip: Add GICv2 specific ACPI boot support
  ARM64 / ACPI: Introduce ACPI_IRQ_MODEL_GIC and register device's gsi
  ACPI / processor: Make it possible to get CPU hardware ID via GICC
  ACPI / processor: Introduce phys_cpuid_t for CPU hardware ID
  ARM64 / ACPI: Parse MADT for SMP initialization
  ...
2015-04-24 08:23:45 -07:00
Jiang Liu
f7fa7aeeec x86/irq: Avoid memory allocation in __assign_irq_vector()
Function __assign_irq_vector() is protected by vector_lock, so use
a global temporary cpu_mask to avoid allocating/freeing cpu_mask.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428978610-28986-34-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:55 +02:00
Jiang Liu
d746d1ebd3 x86/irq: Move irqdomain specific code into asm/irqdomain.h
Now we have dedicated asm/irqdomain.h, so move irqdomain specific
code into it.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/1428978610-28986-33-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:55 +02:00
Thomas Gleixner
f7a0c78669 x86: Cleanup irq_domain ops
We have 3 identical copies of the ioapic domain ops for acpi, mpparse,
and sfi. Have a global one in the io_apic code and be done with it.

To avoid include hell in io_apic.h, create a private irqdomain header
and include the generic irqdomain header from there.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: sfi-devel@simplefirmware.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh@kernel.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1428978610-28986-32-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:55 +02:00
Thomas Gleixner
ab76085ec0 x86,ioapic: Cleanup irq_trigger/polarity()
These functions are full of pointless indentations, useless comments
and even more useless printks.

Clean them up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428978610-28986-31-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: x86@kernel.org
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
2015-04-24 15:36:55 +02:00
Thomas Gleixner
335efdf57d x86, ioapic: Use proper defines for the entry fields
While looking at the printout issue, I stumbled more than once over
the various 0/1 assignments which are either commented in strange ways
or force to lookup the meaning.

Use proper constants and fix the misleading comments. While at it
remove pointless 0 assignments in native_disable_io_apic() which have
no value for understanding the code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1428978610-28986-30-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:55 +02:00
Jiang Liu
46176f39b1 x86/irq, ACPI: Remove private function mp_register_gsi()/ mp_unregister_gsi()
Function mp_register_gsi() is only called once, so fold it into caller
acpi_register_gsi_ioapic(). Do the same for mp_unregister_gsi().

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Link: http://lkml.kernel.org/r/1428978610-28986-29-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:55 +02:00
Jiang Liu
7f3262edcd x86/irq: Move private data in struct irq_cfg into dedicated data structure
Several fields in struct irq_cfg are private to vector.c, so move it
into dedicated data structure. This helps to hide implementation
details.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428978610-28986-27-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1416901802-24211-35-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Joerg Roedel <jroedel@suse.de>
2015-04-24 15:36:55 +02:00
Jiang Liu
c6c2002b74 x86/irq: Move check of cfg->move_in_progress into send_cleanup_vector()
Move check of cfg->move_in_progress into send_cleanup_vector() to
prepare for simplifying struct irq_cfg.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: iommu@lists.linux-foundation.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Joerg Roedel <joro@8bytes.org>
Link: http://lkml.kernel.org/r/1428978610-28986-26-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:54 +02:00
Jiang Liu
68f9f4404d x86/irq: Remove function apic_set_affinity()
Now there's no user of apic_set_affinity(), so remove it.  Also rename
vector_set_affinity() to apic_set_affinity() for consistency.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428978610-28986-25-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:54 +02:00
Jiang Liu
f970510cc5 x86/irq: Make functions only used in vector.c static
Function {assign|clear}_irq_vector() and apic_retrigger_irq() are only
used in vector.c, so make them static.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428978610-28986-24-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:54 +02:00
Jiang Liu
a2cbbb47fd x86/irq: Remove unused alloc_irq_and_cfg_at()
There's no caller of alloc_irq_and_cfg_at() anymore, so remove it.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428978610-28986-23-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:54 +02:00
Thomas Gleixner
1f93464129 x86/irq: Remove sis apic bug workaround
The SiS apic bug workaround is now obsolete as we cache the register
values for performance reasons.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428978610-28986-22-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:54 +02:00
Jiang Liu
0be275e3a5 x86/irq: Use cached IOAPIC entry instead of reading from hardware
Use cached IOAPIC entry instead of reading data from IOAPIC hardware
registers to improve performance.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428978610-28986-21-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:54 +02:00
Jiang Liu
154d9e50e4 x86/irq: Clean up io_apic.h
Clean up io_apic.h by:
1) moving definition of struct mp_ioapic_gsi into io_apic.c
2) changing mp_pin_to_gsi() and mp_ioapic_gsi_routing() as static
3) removing unused MP_MAX_IOAPIC_PIN
4) removing useless forward declaration
5) removing useless comments

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428978610-28986-20-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:54 +02:00
Thomas Gleixner
ca1b88622e x86: Remove more unmodified io_apic_ops
io_apic_ops.init() is either NULL, if IO-APIC support is disabled at
compile time or native_io_apic_init_mappings(). No point to have that
as we can achieve the same thing with an empty inline.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:54 +02:00
Jiang Liu
9a93d4736e x86/irq: Remove x86_io_apic_ops.write and x86_io_apic_ops.modify
x86_io_apic_ops.write is always set to native_io_apic_write(),
and nobody overrides it. So get rid of the indirection by changing
native_io_apic_write() as io_apic_write() and removing
x86_io_apic_ops.write.

Do the same for x86_io_apic_ops.modify and native_io_apic_modify().

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428978610-28986-19-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:53 +02:00
Jiang Liu
50a6ad84b2 x86/irq: Remove struct io_apic_irq_attr
Now there's no user of struct io_apic_irq_attr anymore, so remove it.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428978610-28986-18-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:53 +02:00
Jiang Liu
4467715a44 x86/irq: Move irq_cfg.irq_2_pin into io_apic.c
Now only io_apic.c accesses struct irq_cfg.irq_2_pin, so move irq_2_pin
into struct mp_chip_data in io_apic.c to clean up struct irq_cfg further.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428978610-28986-17-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:53 +02:00
Jiang Liu
9880534989 irq_remapping: Clean up unsued code to support IOAPIC
Now we have converted to hierarchical irqdomains, so clean up unused code.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: iommu@lists.linux-foundation.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Joerg Roedel <joro@8bytes.org>
Link: http://lkml.kernel.org/r/1428978610-28986-10-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:52 +02:00
Jiang Liu
baac169526 x86/irq: Remove GENERIC_IRQ_LEGACY_ALLOC_HWIRQ
There's no user of irq_alloc_hwirqs(), irq_alloc_hwirq(),
irq_free_hwirqs() and irq_free_hwirq() in x86 anymore, so remove
GENERIC_IRQ_LEGACY_ALLOC_HWIRQ and related code.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428978610-28986-8-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:52 +02:00
Jiang Liu
ad66e1efc9 x86/irq: Remove x86_io_apic_ops.eoi_ioapic_pin and related interfaces
Now there is no user of x86_io_apic_ops.eoi_ioapic_pin anymore, so remove
it.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: iommu@lists.linux-foundation.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428978610-28986-7-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:52 +02:00
Jiang Liu
aa5cb97f14 x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces
Now there is no user of x86_io_apic_ops.set_affinity anymore, so remove
it.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: iommu@lists.linux-foundation.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428978610-28986-6-git-send-email-jiang.liu@linux.intel.com
2015-04-24 15:36:52 +02:00
Jiang Liu
35d50d8fd5 x86/irq: Remove x86_io_apic_ops.setup_entry and related interfaces
Now there is no user of x86_io_apic_ops.setup_entry anymore, so remove it.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: iommu@lists.linux-foundation.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428978610-28986-5-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:52 +02:00
Jiang Liu
84bea5cc77 x86/irq: Remove x86_io_apic_ops.print_entries and related interfaces
Now there is no user of x86_io_apic_ops.print_entries anymore, so remove
it.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: iommu@lists.linux-foundation.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428978610-28986-4-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:52 +02:00
Jiang Liu
b75e818f7f x86/irq: Remove unused struct mp_pin_info
Now nobody makes use of struct mp_pin_info, so remove it.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428978610-28986-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:52 +02:00
Jiang Liu
5ad274d41c x86/irq: Remove unused old IOAPIC irqdomain interfaces
Now we have converted to hierarchical irqdomain, so remove unused old
IOAPIC interfaces and code.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428978610-28986-2-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:51 +02:00
Jiang Liu
d32932d02e x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces
Convert IOAPIC driver to support and use hierarchical irqdomain
interfaces.  It's a little big, but would break bisecting if we split
it into multiple patches.

Fold in a patch from Andy Shevchenko <andriy.shevchenko@linux.intel.com>
to make it bisectable.
http://lkml.org/lkml/2014/12/10/622

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: sfi-devel@simplefirmware.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Link: http://lkml.kernel.org/r/1428905519-23704-38-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:51 +02:00
Jiang Liu
96ed44b2d5 x86/irq: Introduce helper functions to support hierarchical irqdomains for IOAPIC
Introduce several helper functions, which will be used to enable
hierarchical irqdomain for IOAPIC.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428905519-23704-37-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:51 +02:00
Jiang Liu
a44174ee7b x86/irq: Simplify the way to print IOAPIC entry
Simplify the way to print IOAPIC entry content, so we can remove
native_io_apic_print_entries(), intel_ir_io_apic_print_entries()
and x86_io_apic_ops.print_entries() later.

Folded a patch from Thomas to fix errors in printed pin attributes,
http://www.spinics.net/lists/linux-tip-commits/msg26108.html

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428905519-23704-36-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:51 +02:00
Jiang Liu
133153205b x86/irq: Refine the way to allocate irq_cfg for legacy IRQs
To support legacy ISA IRQs, we need to preallocate irq_cfg structures
for legacy ISA IRQs. Refine the way to allocate irq_cfg for legacy ISA
IRQs, so it's more friendly for the hierarchical irqdomain
implementation.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428905519-23704-35-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:51 +02:00
Jiang Liu
49c7e60022 x86/irq: Implement callbacks to enable hierarchical irqdomains on IOAPICs
Implement required callbacks to prepare for enabling hierarchical
irqdomains on IOAPICs. After the conversion we can remove quite some
code from the old implementation.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428905519-23704-34-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:51 +02:00
Jiang Liu
c4d05a2c35 x86/irq: Prepare IOAPIC interfaces to support hierarchical irqdomains
Introduce helper functions to manipulate struct irq_alloc_info for
IOAPIC.  Also add an extra parameter to IOAPIC interfaces to prepare
for hierarchical irqdomain. Function mp_set_gsi_attr() will be removed
once we have switched to hierarchical irqdomains.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Link: http://lkml.kernel.org/r/1428905519-23704-33-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:51 +02:00
Jiang Liu
4e69d7eab4 x86/irq: Remove unused pre_init_apic_IRQ0()
Now there's no user of pre_init_apic_IRQ0(), so remove it.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428905519-23704-32-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:51 +02:00
Thomas Gleixner
6648d1b42c x86/intel-mid: Delay initialization of APB timer
MID has no PIC, but depending on the platform it requires the
abt_timer, which is connected to irq0. The timer is set up at
late_time_init().

But, looking at the MID code it seems, that there is no reason to do
so. The only code which might need the timer working is the TSC
calibration code, but thats a non issue on MID as that is using its
own empty calibration function. And check_timer() is not invoked
either because MID has no PIC and therefor no legacy irqs.

So if you look at intel_mid_time_init() then you'll see that in the
ARAT case the timer setup is skipped already. So until the point where
x86_init.timers.setup_percpu_clockev() is called for the boot cpu
nothing really needs a timer on MID.

According to the MID code the apbt horror is only used for moorestown.
Medfield and later use the local apic timer without the apbt nonsense.

The best thing we can do is to drop moorestown support and get rid of
that apbt nonsense alltogether.

I don't think anyone deeply cares about it not being supported from
3.18 on. The number of devices which sport a moorestown should be
pretty limited and the only relevant use case of those is to act as a
pocket heater with short battery life time. Its pretty pointless to
update kernels on pocket heaters except for bragging reasons.

If someone at Intel really thinks that we need to keep moorestown
alive for other than documentary and sentimental reasons, then we can
move the apbt setup to x86_init.timers.setup_percpu_clockev(). At that
point the IOAPIC is setup already, so it should just work.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Link: http://lkml.kernel.org/r/1428905519-23704-30-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:51 +02:00
Jiang Liu
e390d895ae x86/irq: Simplify MSI/DMAR/HPET implementation by using common code
Use common MSI interfaces instead of private implementations of the
same functionality to simplify DMAR/HPET driver implementation.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428905519-23704-28-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:50 +02:00
Jiang Liu
62ac178083 x86/irq: Implement irq_chip.irq_write_msi_msg for MSI/DMAR/HPET irq_chips
Implement irq_chip.irq_write_msi_msg for MSI/DMAR/HPET irq_chips, they
will be used to replace duplicated code.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428905519-23704-27-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:50 +02:00
Jiang Liu
90d84fe95d x86/MSI: Replace msi_update_msg() with irq_chip_compose_msi_msg()
Function irq_chip_compose_msi_msg() can achieve the same goal as
msi_update_msg(), so remove msi_update_msg().

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428905519-23704-26-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:50 +02:00
Jiang Liu
68682a2687 x86/MSI: Simplify the way to deal with remapped MSI interrupts
Simplify the way to deal with remapped MSI interrupts, so we can remove
irq_chip.irq_print_chip later. We simply change the name when the
setup detects that the parent domain is remapping.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428905519-23704-25-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:50 +02:00
Jiang Liu
81dabe2e73 x86/irq: Normalize x86 irq_chip name
Some irq_chip names use underscore, others use hyphen. So normalize them
to use hyphen as separator.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428905519-23704-24-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:50 +02:00
Jiang Liu
49e07d8f28 x86/htirq: Use hierarchical irqdomain to manage Hypertransport interrupts
We have slightly changed the architecture interfaces to support htirq
PCI driver. It's safe because currently Hypertransport interrupt is
only enabled on x86 platforms.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428905519-23704-22-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:50 +02:00
Jiang Liu
0921f1da64 x86/irq: Use hierarchical irqdomain to manage DMAR interrupts
Enhance DMAR code to support hierarchical irqdomain, it helps to make
the architecture more clear.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428905519-23704-21-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:49 +02:00
Jiang Liu
34742db8ea iommu/vt-d: Refine the interfaces to create IRQ for DMAR unit
Refine the interfaces to create IRQ for DMAR unit. It's a preparation
for converting DMAR IRQ to hierarchical irqdomain on x86.

It also moves dmar_alloc_hwirq()/dmar_free_hwirq() from irq_remapping.h
to dmar.h. They are not irq_remapping specific.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: iommu@lists.linux-foundation.org
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Link: http://lkml.kernel.org/r/1428905519-23704-20-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:49 +02:00
Jiang Liu
b1855c752e x86/MSI: Clean up unused MSI related code and interfaces
Now MSI interrupt has been converted to new hierarchical irqdomain
interfaces, so remove legacy MSI related code and interfaces.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Link: http://lkml.kernel.org/r/1428905519-23704-19-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:49 +02:00
Jiang Liu
7a53a12162 irq_remapping: Clean up unused MSI related code
Now MSI interrupt has been converted to new hierarchical irqdomain
interfaces, so remove legacy MSI related code and interfaces.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: iommu@lists.linux-foundation.org
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Yijing Wang <wangyijing@huawei.com>
Link: http://lkml.kernel.org/r/1428905519-23704-18-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:49 +02:00
Jiang Liu
80aa283364 x86/irq: Directly call native_compose_msi_msg() for DMAR IRQ
DMAR interrupt won't be remapped by interrupt remapping hardware,
so directly call native_compose_msi_msg() for DMAR IRQ to compose MSI
message data. This will help to simplify MSI code later.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428905519-23704-15-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:49 +02:00
Jiang Liu
52f518a3a7 x86/MSI: Use hierarchical irqdomains to manage MSI interrupts
Enhance MSI code to support hierarchical irqdomains, it helps to make
the architecture more clear.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: iommu@lists.linux-foundation.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Joerg Roedel <joro@8bytes.org>
Link: http://lkml.kernel.org/r/1428905519-23704-14-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:49 +02:00
Jiang Liu
3cb96f0c97 x86/hpet: Enhance HPET IRQ to support hierarchical irqdomains
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/1428905519-23704-13-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:49 +02:00
Jiang Liu
a62b32cdd0 x86/dmar: Use new irqdomain interfaces to allocate/free IRQ
Use new irqdomain interfaces to allocate/free IRQ for DMAR and interrupt
remapping, so we can remove GENERIC_IRQ_LEGACY_ALLOC_HWIRQ later.

The private definitions of irq_alloc_hwirqs()/irq_free_hwirqs() are a
temporary solution, they will be removed once we have converted the
interrupt remapping driver to use irqdomain framework.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: iommu@lists.linux-foundation.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Joerg Roedel <joro@8bytes.org>
Link: http://lkml.kernel.org/r/1428905519-23704-8-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:48 +02:00
Jiang Liu
af87baedf2 x86/htirq: Use new irqdomain interfaces to allocate/free IRQ
Use new irqdomain interfaces to allocate/free IRQ for HTIRQ, so we can
remove GENERIC_IRQ_LEGACY_ALLOC_HWIRQ later.

This patch changes the interfaces between arch independent PCI driver
and arch specific code. Currently HT_IRQ is only enabled on x86, so it
does not affect other architectures.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428905519-23704-7-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:48 +02:00
Jiang Liu
4c8f9960ee x86/MSI: Use new irqdomain interfaces to allocate/free IRQ
Use new irqdomain interfaces to allocate/free IRQ for PCI MSI, so we
can remove GENERIC_IRQ_LEGACY_ALLOC_HWIRQ later.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1416894816-23245-5-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:48 +02:00
Jiang Liu
bd8eb63f8a x86/hpet: Use new irqdomain interfaces to allocate/free IRQ
Use new irqdomain interfaces to allocate/free IRQ for HPET, so we can
remove GENERIC_IRQ_LEGACY_ALLOC_HWIRQ later.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/1416894816-23245-4-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:48 +02:00
Jiang Liu
b5dc8e6c21 x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors
Abstract CPU local APIC as an interrupt controller and create an
irqdomain for it to manage CPU interrupt vectors. It's the base to
enable hierarchical irqdomains on x86 systems. 

The final irqdomain hierarchy will look like this:

IOAPIC domain    ----|
MSI/MSI-x domain ----> [Interrupt Remapping domain] -> CPU vector domain
HPET_IRQ domain  ----|                                         ^
                                                               |
DMAR domain      ----------------------------------------------|
HT_IRQ domain    ----------------------------------------------|

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428905519-23704-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:47 +02:00
Jiang Liu
5f0052f952 x86/irq: Save destination CPU ID in irq_cfg
Cache destination CPU APIC ID into struct irq_cfg when assigning vector
for interrupt. Upper layer just needs to read the cached APIC ID instead
of calling apic->cpu_mask_to_apicid_and(), it helps to hide APIC driver
details from IOAPIC/HPET/MSI drivers..

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428905519-23704-2-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 15:36:47 +02:00
Thomas Gleixner
576b0704c9 x86: perf: uncore: Use hrtimer_start()
hrtimer_start() does not longer defer already expired timers to the
softirq. Get rid of the __hrtimer_start_range_ns() invocation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20150414203502.360555157@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:50 +02:00
Thomas Gleixner
514c2304b4 x86: perf: Use hrtimer_start()
hrtimer_start() does not longer defer already expired timers to the
softirq. Get rid of the __hrtimer_start_range_ns() invocation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20150414203502.260487331@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:50 +02:00
Denys Vlasenko
ac7f5dfb03 x86/asm/entry/64: Merge 32-bit execve stubs with x32 ones, as they are identical
Run-tested.

Suggested-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429632194-13445-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-22 08:41:44 +02:00
Denys Vlasenko
17be0aec74 x86/asm/entry/64: Implement better check for canonical addresses
This change makes the check exact (no more false positives
on "negative" addresses).

Andy explains:

 "Canonical addresses either start with 17 zeros or 17 ones.

  In the old code, we checked that the top (64-47) = 17 bits were all
  zero.  We did this by shifting right by 47 bits and making sure that
  nothing was left.

  In the new code, we're shifting left by (64 - 48) = 16 bits and then
  signed shifting right by the same amount, this propagating the 17th
  highest bit to all positions to its left.  If we get the same value we
  started with, then we're good to go."

While it isn't really important to be fully correct here -
almost all addresses we'll ever see will be userspace ones,
but OTOH it looks to be cheap enough: the new code uses
two more ALU ops but preserves %rcx, allowing to not
reload it from pt_regs->cx again.

On disassembly level, the changes are:

  cmp %rcx,0x80(%rsp) -> mov 0x80(%rsp),%r11; cmp %rcx,%r11
  shr $0x2f,%rcx      -> shl $0x10,%rcx; sar $0x10,%rcx; cmp %rcx,%r11
  mov 0x58(%rsp),%rcx -> (eliminated)

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429633649-20169-1-git-send-email-dvlasenk@redhat.com
[ Changelog massage. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-22 08:37:56 +02:00
Sonny Rao
0140e6141e perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver
This keeps all the related PCI IDs together in the driver where
they are used.

Signed-off-by: Sonny Rao <sonnyrao@chromium.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1429644791-25724-1-git-send-email-sonnyrao@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-22 08:29:19 +02:00
Sonny Rao
80bcffb376 perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
This uncore is the same as the Haswell desktop part but uses a
different PCI ID.

Signed-off-by: Sonny Rao <sonnyrao@chromium.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1429569247-16697-1-git-send-email-sonnyrao@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-22 08:27:43 +02:00
Jiri Olsa
3b6e042188 perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu
The core_pmu does not define cpu_* callbacks, which handles
allocation of 'struct cpu_hw_events::shared_regs' data,
initialization of debug store and PMU_FL_EXCL_CNTRS counters.

While this probably won't happen on bare metal, virtual CPU can
define x86_pmu.extra_regs together with PMU version 1 and thus
be using core_pmu -> using shared_regs data without it being
allocated. That could could leave to following panic:

	BUG: unable to handle kernel NULL pointer dereference at (null)
	IP: [<ffffffff8152cd4f>] _spin_lock_irqsave+0x1f/0x40

	SNIP

	 [<ffffffff81024bd9>] __intel_shared_reg_get_constraints+0x69/0x1e0
	 [<ffffffff81024deb>] intel_get_event_constraints+0x9b/0x180
	 [<ffffffff8101e815>] x86_schedule_events+0x75/0x1d0
	 [<ffffffff810586dc>] ? check_preempt_curr+0x7c/0x90
	 [<ffffffff810649fe>] ? try_to_wake_up+0x24e/0x3e0
	 [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20
	 [<ffffffff8109eb16>] ? autoremove_wake_function+0x16/0x40
	 [<ffffffff810577e9>] ? __wake_up_common+0x59/0x90
	 [<ffffffff811a9517>] ? __d_lookup+0xa7/0x150
	 [<ffffffff8119db5f>] ? do_lookup+0x9f/0x230
	 [<ffffffff811a993a>] ? dput+0x9a/0x150
	 [<ffffffff8119c8f5>] ? path_to_nameidata+0x25/0x60
	 [<ffffffff8119e90a>] ? __link_path_walk+0x7da/0x1000
	 [<ffffffff8101d8f9>] ? x86_pmu_add+0xb9/0x170
	 [<ffffffff8101d7a7>] x86_pmu_commit_txn+0x67/0xc0
	 [<ffffffff811b07b0>] ? mntput_no_expire+0x30/0x110
	 [<ffffffff8119c731>] ? path_put+0x31/0x40
	 [<ffffffff8107c297>] ? current_fs_time+0x27/0x30
	 [<ffffffff8117d170>] ? mem_cgroup_get_reclaim_stat_from_page+0x20/0x70
	 [<ffffffff8111b7aa>] group_sched_in+0x13a/0x170
	 [<ffffffff81014a29>] ? sched_clock+0x9/0x10
	 [<ffffffff8111bac8>] ctx_sched_in+0x2e8/0x330
	 [<ffffffff8111bb7b>] perf_event_sched_in+0x6b/0xb0
	 [<ffffffff8111bc36>] perf_event_context_sched_in+0x76/0xc0
	 [<ffffffff8111eb3b>] perf_event_comm+0x1bb/0x2e0
	 [<ffffffff81195ee9>] set_task_comm+0x69/0x80
	 [<ffffffff81195fe1>] setup_new_exec+0xe1/0x2e0
	 [<ffffffff811ea68e>] load_elf_binary+0x3ce/0x1ab0

Adding cpu_(prepare|starting|dying) for core_pmu to have
shared_regs data allocated for core_pmu. AFAICS there's no harm
to initialize debug store and PMU_FL_EXCL_CNTRS either for
core_pmu.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20150421152623.GC13169@krava.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-22 08:24:33 +02:00
Andy Lutomirski
aac82d3191 x86, paravirt, xen: Remove the 64-bit ->irq_enable_sysexit() pvop
We don't use irq_enable_sysexit on 64-bit kernels any more.
Remove all the paravirt and Xen machinery to support it on
64-bit kernels.

Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8a03355698fe5b94194e9e7360f19f91c1b2cf1f.1428100853.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-22 08:07:45 +02:00
Linus Torvalds
41d5e08ea8 TTY/Serial patches for 4.1-rc1
Here's the big tty/serial driver update for 4.1-rc1.
 
 It was delayed for a bit due to some questions surrounding some of the
 console command line parsing changes that are in here.  There's still
 one tiny regression for people who were previously putting multiple
 console command lines and expecting them all to be ignored for some odd
 reason, but Peter is working on fixing that.  If not, I'll send a revert
 for the offending patch, but I have faith that Peter can address it.
 
 Other than the console work here, there's the usual serial driver
 updates and changes, and a buch of 8250 reworks to try to make that
 driver easier to maintain over time, and have it support more devices in
 the future.
 
 All of these have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlU2IcUACgkQMUfUDdst+ylFqACcC8LPhFEZg9aHn0hNUoqGK3rE
 5dUAnR4b8r/NYqjVoE9FJZgZfB/TqVi1
 =lyN/
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial updates from Greg KH:
 "Here's the big tty/serial driver update for 4.1-rc1.

  It was delayed for a bit due to some questions surrounding some of the
  console command line parsing changes that are in here.  There's still
  one tiny regression for people who were previously putting multiple
  console command lines and expecting them all to be ignored for some
  odd reason, but Peter is working on fixing that.  If not, I'll send a
  revert for the offending patch, but I have faith that Peter can
  address it.

  Other than the console work here, there's the usual serial driver
  updates and changes, and a buch of 8250 reworks to try to make that
  driver easier to maintain over time, and have it support more devices
  in the future.

  All of these have been in linux-next for a while"

* tag 'tty-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (119 commits)
  n_gsm: Drop unneeded cast on netdev_priv
  sc16is7xx: expose RTS inversion in RS-485 mode
  serial: 8250_pci: port failed after wakeup from S3
  earlycon: 8250: Document kernel command line options
  earlycon: 8250: Fix command line regression
  earlycon: Fix __earlycon_table stride
  tty: clean up the tty time logic a bit
  serial: 8250_dw: only get the clock rate in one place
  serial: 8250_dw: remove useless ACPI ID check
  dmaengine: hsu: move memory allocation to GFP_NOWAIT
  dmaengine: hsu: remove redundant pieces of code
  serial: 8250_pci: add Intel Tangier support
  dmaengine: hsu: add Intel Tangier PCI ID
  serial: 8250_pci: replace switch-case by formula for Intel MID
  serial: 8250_pci: replace switch-case by formula
  tty: cpm_uart: replace CONFIG_8xx by CONFIG_CPM1
  serial: jsm: some off by one bugs
  serial: xuartps: Fix check in console_setup().
  serial: xuartps: Get rid of register access macros.
  serial: xuartps: Fix iobase use.
  ...
2015-04-21 09:33:10 -07:00
Linus Torvalds
6496edfce9 This is the final removal (after several years!) of the obsolete cpus_*
functions, prompted by their mis-use in staging.
 
 With these function removed, all cpu functions should only iterate to
 nr_cpu_ids, so we finally only allocate that many bits when cpumasks
 are allocated offstack.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVNPMuAAoJENkgDmzRrbjx7ZIP/j65e6xs1jEyXR3WOYSdTU1x
 bMo6JcII6O1oEZLgyKXgx9KiBg6uIIDta1NG/H/XIe354dwfHVsHvj5HHHQR5Xof
 iRrjLOaHj4XglI3hvsk0eEEl3/OBBLgyo9bUwDvMF1fmr/9tW4caMs3Op6n7Evzm
 YIvoAyeJ0A8BfEtOU5lXhcVIGmnHtSw0x6mdGXpXIBmWYQPCtdQP868s4lnl44w0
 bSNpAYdzEqg64Ph3SK0prgWPrn5+5EiaAhV7HZzENZ5+o0DAdIXWq/W7uHyCWPKH
 536cJDojec+nSUQkPYngngGprxrKO02aBcMw/3JGJ0tdCDj8yw3XAyVAFzz4hmMb
 Lkmyv4QHHIILLvJ4ZRH5KHWCjjVBg41LNCs2H3HnoxFACdm0lZYKHsUAh2ucBVtU
 Wb/eHmLxOG43UIkpX4yrhy3SfE1ZdnOVzEzOzPXtr51t8ojqk+bLFe/hJ6EkzrQX
 X+90qHfBq+PMJlAnc3zdXHjxoJrL6KPWVwVvFrNeibgEKtVvy/BiwZkS6QceC1Ea
 TatOYA5r6awFVHHQCooN1DGAxN5Juvu2SmdnTUA9ymsCNDghj1YUoAKRNP81u8Sa
 pe3hco/63iCuPna+vlwNDU6SgsaMk9m0p+1n1BiDIfVJIkWYCNeG+u2gQkzbDKlQ
 AJuKKQv1QuZiF0ylZ0wq
 =VAgA
 -----END PGP SIGNATURE-----

Merge tag 'cpumask-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull final removal of deprecated cpus_* cpumask functions from Rusty Russell:
 "This is the final removal (after several years!) of the obsolete
  cpus_* functions, prompted by their mis-use in staging.

  With these function removed, all cpu functions should only iterate to
  nr_cpu_ids, so we finally only allocate that many bits when cpumasks
  are allocated offstack"

* tag 'cpumask-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits)
  cpumask: remove __first_cpu / __next_cpu
  cpumask: resurrect CPU_MASK_CPU0
  linux/cpumask.h: add typechecking to cpumask_test_cpu
  cpumask: only allocate nr_cpumask_bits.
  Fix weird uses of num_online_cpus().
  cpumask: remove deprecated functions.
  mips: fix obsolete cpumask_of_cpu usage.
  x86: fix more deprecated cpu function usage.
  ia64: remove deprecated cpus_ usage.
  powerpc: fix deprecated CPU_MASK_CPU0 usage.
  CPU_MASK_ALL/CPU_MASK_NONE: remove from deprecated region.
  staging/lustre/o2iblnd: Don't use cpus_weight
  staging/lustre/libcfs: replace deprecated cpus_ calls with cpumask_
  staging/lustre/ptlrpc: Do not use deprecated cpus_* functions
  blackfin: fix up obsolete cpu function usage.
  parisc: fix up obsolete cpu function usage.
  tile: fix up obsolete cpu function usage.
  arm64: fix up obsolete cpu function usage.
  mips: fix up obsolete cpu function usage.
  x86: fix up obsolete cpu function usage.
  ...
2015-04-20 10:19:03 -07:00
Linus Torvalds
34a984f7b0 Merge branch 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull PMEM driver from Ingo Molnar:
 "This is the initial support for the pmem block device driver:
  persistent non-volatile memory space mapped into the system's physical
  memory space as large physical memory regions.

  The driver is based on Intel code, written by Ross Zwisler, with fixes
  by Boaz Harrosh, integrated with x86 e820 memory resource management
  and tidied up by Christoph Hellwig.

  Note that there were two other separate pmem driver submissions to
  lkml: but apparently all parties (Ross Zwisler, Boaz Harrosh) are
  reasonably happy with this initial version.

  This version enables minimal support that enables persistent memory
  devices out in the wild to work as block devices, identified through a
  magic (non-standard) e820 flag and auto-discovered if
  CONFIG_X86_PMEM_LEGACY=y, or added explicitly through manipulating the
  memory maps via the "memmap=..." boot option with the new, special '!'
  modifier character.

  Limitations: this is a regular block device, and since the pmem areas
  are not struct page backed, they are invisible to the rest of the
  system (other than the block IO device), so direct IO to/from pmem
  areas, direct mmap() or XIP is not possible yet.  The page cache will
  also shadow and double buffer pmem contents, etc.

  Initial support is for x86"

* 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  drivers/block/pmem: Fix 32-bit build warning in pmem_alloc()
  drivers/block/pmem: Add a driver for persistent memory
  x86/mm: Add support for the non-standard protected e820 type
2015-04-18 11:42:49 -04:00
Linus Torvalds
90d1c08786 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "This tree includes:

   - an FPU related crash fix

   - a ptrace fix (with matching testcase in tools/testing/selftests/)

   - an x86 Kconfig DMA-config defaults tweak to better avoid
     non-working drivers"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected
  x86/fpu: Load xsave pointer *after* initialization
  x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
  x86, selftests: Add single_step_syscall test
2015-04-18 11:31:11 -04:00
Linus Torvalds
96b90f27bc Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "This update has mostly fixes, but also other bits:

   - perf tooling fixes

   - PMU driver fixes

   - Intel Broadwell PMU driver HW-enablement for LBR callstacks

   - a late coming 'perf kmem' tool update that enables it to also
     analyze page allocation data.  Note, this comes with MM tracepoint
     changes that we believe to not break anything: because it changes
     the formerly opaque 'struct page *' field that uniquely identifies
     pages to 'pfn' which identifies pages uniquely too, but isn't as
     opaque and can be used for other purposes as well"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/pt: Fix and clean up error handling in pt_event_add()
  perf/x86/intel: Add Broadwell support for the LBR callstack
  perf/x86/intel/rapl: Fix energy counter measurements but supporing per domain energy units
  perf/x86/intel: Fix Core2,Atom,NHM,WSM cycles:pp events
  perf/x86: Fix hw_perf_event::flags collision
  perf probe: Fix segfault when probe with lazy_line to file
  perf probe: Find compilation directory path for lazy matching
  perf probe: Set retprobe flag when probe in address-based alternative mode
  perf kmem: Analyze page allocator events also
  tracing, mm: Record pfn instead of pointer to struct page
2015-04-18 11:26:46 -04:00
Ingo Molnar
0c99241c93 perf/x86/intel/pt: Fix and clean up error handling in pt_event_add()
Dan Carpenter reported that pt_event_add() has buggy
error handling logic: it returns 0 instead of -EBUSY when
it fails to start a newly added event.

Furthermore, the control flow in this function is messy,
with cleanup labels mixed with direct returns.

Fix the bug and clean up the code by converting it to
a straight fast path for the regular non-failing case,
plus a clear sequence of cascading goto labels to do
all cleanup.

NOTE: I materially changed the existing clean up logic in the
pt_event_start() failure case to use the direct
perf_aux_output_end() path, not pt_event_del(), because
perf_aux_output_end() is enough here.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150416103830.GB7847@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-18 13:31:26 +02:00
Borislav Petkov
18ecb3bfa5 x86/fpu: Load xsave pointer *after* initialization
So I was playing with gdb today and did this simple thing:

	gdb /bin/ls

	...

	(gdb) run

Box exploded with this splat:

	BUG: unable to handle kernel NULL pointer dereference at 00000000000001d0
	IP: [<ffffffff8100fe5a>] xstateregs_get+0x7a/0x120
	[...]

	Call Trace:
	 ptrace_regset
	 ptrace_request
	 ? wait_task_inactive
	 ? preempt_count_sub
	 arch_ptrace
	 ? ptrace_get_task_struct
	 SyS_ptrace
	 system_call_fastpath

... because we do cache &target->thread.fpu.state->xsave into the
local variable xsave but that pointer is NULL at that time and
it gets initialized later, in init_fpu(), see:

	e7f180dcd8 ("x86/fpu: Change xstateregs_get()/set() to use ->xsave.i387 rather than ->fxsave")

The fix is simple: load xsave *after* init_fpu() has run.

Also do the same in xstateregs_set(), as suggested by Oleg Nesterov.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1429209697-5902-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-17 10:15:47 +02:00
Kan Liang
78d504bcd7 perf/x86/intel: Add Broadwell support for the LBR callstack
Same as Haswell, Broadwell also support the LBR callstack.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1427962377-40955-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-17 09:59:07 +02:00
Jacob Pan
6455239601 perf/x86/intel/rapl: Fix energy counter measurements but supporing per domain energy units
RAPL energy hardware unit can vary within a single CPU package, e.g.
HSW server DRAM has a fixed energy unit of 15.3 uJ (2^-16) whereas
the unit on other domains can be enumerated from power unit MSR.

There might be other variations in the future, this patch adds
per cpu model quirk to allow special handling of certain cpus.

hw_unit is also removed from per cpu data since it is not per cpu
and the sampling rate for energy counter is typically not high.

Without this patch, DRAM domain on HSW servers will be counted
4x higher than the real energy counter.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1427405325-780-1-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-17 09:58:56 +02:00
Peter Zijlstra
517e6341fa perf/x86/intel: Fix Core2,Atom,NHM,WSM cycles:pp events
Ingo reported that cycles:pp didn't work for him on some machines.

It turns out that in this commit:

  af4bdcf675 perf/x86/intel: Disallow flags for most Core2/Atom/Nehalem/Westmere events

Andi forgot to explicitly allow that event when he
disabled event flags for PEBS on those uarchs.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: af4bdcf675 ("perf/x86/intel: Disallow flags for most Core2/Atom/Nehalem/Westmere events")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-17 09:58:47 +02:00
Peter Zijlstra
c857eb56e6 perf/x86: Fix hw_perf_event::flags collision
Somehow we ended up with overlapping flags when merging the
RDPMC control flag - this is bad, fix it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-17 09:50:43 +02:00
Oleg Nesterov
fd0f86b664 x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
When the TIF_SINGLESTEP tracee dequeues a signal,
handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but
leaves TIF_SINGLESTEP set.

If the tracer does PTRACE_SINGLESTEP again, enable_single_step()
sets X86_EFLAGS_TF but not TIF_FORCED_TF.  This means that the
subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the
tracee gets the wrong SIGTRAP.

Test-case (needs -O2 to avoid prologue insns in signal handler):

	#include <unistd.h>
	#include <stdio.h>
	#include <sys/ptrace.h>
	#include <sys/wait.h>
	#include <sys/user.h>
	#include <assert.h>
	#include <stddef.h>

	void handler(int n)
	{
		asm("nop");
	}

	int child(void)
	{
		assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
		signal(SIGALRM, handler);
		kill(getpid(), SIGALRM);
		return 0x23;
	}

	void *getip(int pid)
	{
		return (void*)ptrace(PTRACE_PEEKUSER, pid,
					offsetof(struct user, regs.rip), 0);
	}

	int main(void)
	{
		int pid, status;

		pid = fork();
		if (!pid)
			return child();

		assert(wait(&status) == pid);
		assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);

		assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
		assert(wait(&status) == pid);
		assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
		assert((getip(pid) - (void*)handler) == 0);

		assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
		assert(wait(&status) == pid);
		assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
		assert((getip(pid) - (void*)handler) == 1);

		assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
		assert(wait(&status) == pid);
		assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);

		return 0;
	}

The last assert() fails because PTRACE_CONT wrongly triggers
another single-step and X86_EFLAGS_TF can't be cleared by
debugger until the tracee does sys_rt_sigreturn().

Change handle_signal() to do user_disable_single_step() if
stepping, we do not need to preserve TIF_SINGLESTEP because we
are going to do ptrace_notify(), and it is simply wrong to leak
this bit.

While at it, change the comment to explain why we also need to
clear TF unconditionally after setup_rt_frame().

Note: in the longer term we should probably change
setup_sigcontext() to use get_flags() and then just remove this
user_disable_single_step().  And, the state of TIF_FORCED_TF can
be wrong after restore_sigcontext() which can set/clear TF, this
needs another fix.

This fix fixes the 'single_step_syscall_32' testcase in
the x86 testsuite:

Before:

	~/linux/tools/testing/selftests/x86> ./single_step_syscall_32
	[RUN]   Set TF and check nop
	[OK]    Survived with TF set and 9 traps
	[RUN]   Set TF and check int80
	[OK]    Survived with TF set and 9 traps
	[RUN]   Set TF and check a fast syscall
	[WARN]  Hit 10000 SIGTRAPs with si_addr 0xf7789cc0, ip 0xf7789cc0
	Trace/breakpoint trap (core dumped)

After:

	~/linux/linux/tools/testing/selftests/x86> ./single_step_syscall_32
	[RUN]   Set TF and check nop
	[OK]    Survived with TF set and 9 traps
	[RUN]   Set TF and check int80
	[OK]    Survived with TF set and 9 traps
	[RUN]   Set TF and check a fast syscall
	[OK]    Survived with TF set and 39 traps
	[RUN]   Fast syscall with TF cleared
	[OK]    Nothing unexpected happened

Reported-by: Evan Teran <eteran@alum.rit.edu>
Reported-by: Pedro Alves <palves@redhat.com>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Added x86 self-test info. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-16 12:47:45 +02:00
Linus Torvalds
eea3a00264 Merge branch 'akpm' (patches from Andrew)
Merge second patchbomb from Andrew Morton:

 - the rest of MM

 - various misc bits

 - add ability to run /sbin/reboot at reboot time

 - printk/vsprintf changes

 - fiddle with seq_printf() return value

* akpm: (114 commits)
  parisc: remove use of seq_printf return value
  lru_cache: remove use of seq_printf return value
  tracing: remove use of seq_printf return value
  cgroup: remove use of seq_printf return value
  proc: remove use of seq_printf return value
  s390: remove use of seq_printf return value
  cris fasttimer: remove use of seq_printf return value
  cris: remove use of seq_printf return value
  openrisc: remove use of seq_printf return value
  ARM: plat-pxa: remove use of seq_printf return value
  nios2: cpuinfo: remove use of seq_printf return value
  microblaze: mb: remove use of seq_printf return value
  ipc: remove use of seq_printf return value
  rtc: remove use of seq_printf return value
  power: wakeup: remove use of seq_printf return value
  x86: mtrr: if: remove use of seq_printf return value
  linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK
  MAINTAINERS: CREDITS: remove Stefano Brivio from B43
  .mailmap: add Ricardo Ribalda
  CREDITS: add Ricardo Ribalda Delgado
  ...
2015-04-15 16:39:15 -07:00
Joe Perches
3ac62bc060 x86: mtrr: if: remove use of seq_printf return value
The seq_printf return value, because it's frequently misused,
will eventually be converted to void.

See: commit 1f33c41c03 ("seq_file: Rename seq_overflow() to
     seq_has_overflowed() and make public")

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15 16:35:24 -07:00
Linus Torvalds
fa2e5c073a Merge branch 'exec_domain_rip_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc
Pull exec domain removal from Richard Weinberger:
 "This series removes execution domain support from Linux.

  The idea behind exec domains was to support different ABIs.  The
  feature was never complete nor stable.  Let's rip it out and make the
  kernel signal handling code less complicated"

* 'exec_domain_rip_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc: (27 commits)
  arm64: Removed unused variable
  sparc: Fix execution domain removal
  Remove rest of exec domains.
  arch: Remove exec_domain from remaining archs
  arc: Remove signal translation and exec_domain
  xtensa: Remove signal translation and exec_domain
  xtensa: Autogenerate offsets in struct thread_info
  x86: Remove signal translation and exec_domain
  unicore32: Remove signal translation and exec_domain
  um: Remove signal translation and exec_domain
  tile: Remove signal translation and exec_domain
  sparc: Remove signal translation and exec_domain
  sh: Remove signal translation and exec_domain
  s390: Remove signal translation and exec_domain
  mn10300: Remove signal translation and exec_domain
  microblaze: Remove signal translation and exec_domain
  m68k: Remove signal translation and exec_domain
  m32r: Remove signal translation and exec_domain
  m32r: Autogenerate offsets in struct thread_info
  frv: Remove signal translation and exec_domain
  ...
2015-04-15 13:53:55 -07:00
Borislav Petkov
c0f6feba78 x86/asm, x86/acpi/wakeup_64.S: Make global label a local one
Make it a local symbol so that it doesn't appear in objdump
output.

No functionality change - code remains the same, just the global
label disappears:

	 ffffffff81039dbe:       bf 03 00 00 00          mov    $0x3,%edi
	 ffffffff81039dc3:       31 c0                   xor    %eax,%eax
	 ffffffff81039dc5:       e8 b6 fd ff ff          callq  ffffffff81039b80 <x86_acpi_enter_sleep_state>
	-ffffffff81039dca:       eb 00                   jmp    ffffffff81039dcc <resume_point>
	-
	-ffffffff81039dcc <resume_point>:
	+ffffffff81039dca:       eb 00                   jmp    ffffffff81039dcc <do_suspend_lowlevel+0x9c>
	 ffffffff81039dcc:       48 c7 c0 80 1a ca 82    mov    $0xffffffff82ca1a80,%rax
	 ffffffff81039dd3:       48 8b 98 e2 00 00 00    mov    0xe2(%rax),%rbx
	 ffffffff81039dda:       0f 22 e3                mov    %rbx,%cr4

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <linux-pm@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1429080614-22610-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-15 11:38:01 +02:00
Brian Gerst
14434052ff x86/asm: Remove unused TI_cpu
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Link: http://lkml.kernel.org/r/1428844486-6638-2-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-15 11:28:50 +02:00
Brian Gerst
4d178f94eb x86/asm: Merge common 32-bit values in asm-offsets.c
Merge common values for 32-bit native and compat.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Link: http://lkml.kernel.org/r/1428844486-6638-1-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-15 11:28:49 +02:00
Linus Torvalds
1dcf58d6e6 Merge branch 'akpm' (patches from Andrew)
Merge first patchbomb from Andrew Morton:

 - arch/sh updates

 - ocfs2 updates

 - kernel/watchdog feature

 - about half of mm/

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (122 commits)
  Documentation: update arch list in the 'memtest' entry
  Kconfig: memtest: update number of test patterns up to 17
  arm: add support for memtest
  arm64: add support for memtest
  memtest: use phys_addr_t for physical addresses
  mm: move memtest under mm
  mm, hugetlb: abort __get_user_pages if current has been oom killed
  mm, mempool: do not allow atomic resizing
  memcg: print cgroup information when system panics due to panic_on_oom
  mm: numa: remove migrate_ratelimited
  mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
  mm: split ET_DYN ASLR from mmap ASLR
  s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
  mm: expose arch_mmap_rnd when available
  s390: standardize mmap_rnd() usage
  powerpc: standardize mmap_rnd() usage
  mips: extract logic for mmap_rnd()
  arm64: standardize mmap_rnd() usage
  x86: standardize mmap_rnd() usage
  arm: factor out mmap ASLR into mmap_rnd
  ...
2015-04-14 16:49:17 -07:00
Kirill A. Shutemov
9823336833 x86: expose number of page table levels on Kconfig level
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:02 -07:00
Ulrich Obergfell
692297d8f9 watchdog: introduce the hardlockup_detector_disable() function
Have kvm_guest_init() use hardlockup_detector_disable() instead of
watchdog_enable_hardlockup_detector(false).

Remove the watchdog_hardlockup_detector_is_enabled() and the
watchdog_enable_hardlockup_detector() function which are no longer needed.

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:48:59 -07:00
Linus Torvalds
6c8a53c9e6 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "Core kernel changes:

   - One of the more interesting features in this cycle is the ability
     to attach eBPF programs (user-defined, sandboxed bytecode executed
     by the kernel) to kprobes.

     This allows user-defined instrumentation on a live kernel image
     that can never crash, hang or interfere with the kernel negatively.
     (Right now it's limited to root-only, but in the future we might
     allow unprivileged use as well.)

     (Alexei Starovoitov)

   - Another non-trivial feature is per event clockid support: this
     allows, amongst other things, the selection of different clock
     sources for event timestamps traced via perf.

     This feature is sought by people who'd like to merge perf generated
     events with external events that were measured with different
     clocks:

       - cluster wide profiling

       - for system wide tracing with user-space events,

       - JIT profiling events

     etc.  Matching perf tooling support is added as well, available via
     the -k, --clockid <clockid> parameter to perf record et al.

     (Peter Zijlstra)

  Hardware enablement kernel changes:

   - x86 Intel Processor Trace (PT) support: which is a hardware tracer
     on steroids, available on Broadwell CPUs.

     The hardware trace stream is directly output into the user-space
     ring-buffer, using the 'AUX' data format extension that was added
     to the perf core to support hardware constraints such as the
     necessity to have the tracing buffer physically contiguous.

     This patch-set was developed for two years and this is the result.
     A simple way to make use of this is to use BTS tracing, the PT
     driver emulates BTS output - available via the 'intel_bts' PMU.
     More explicit PT specific tooling support is in the works as well -
     will probably be ready by 4.2.

     (Alexander Shishkin, Peter Zijlstra)

   - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware
     feature of Intel Xeon CPUs that allows the measurement and
     allocation/partitioning of caches to individual workloads.

     These kernel changes expose the measurement side as a new PMU
     driver, which exposes various QoS related PMU events.  (The
     partitioning change is work in progress and is planned to be merged
     as a cgroup extension.)

     (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P
     Waskiewicz Jr)

   - x86 Intel Haswell LBR call stack support: this is a new Haswell
     feature that allows the hardware recording of call chains, plus
     tooling support.  To activate this feature you have to enable it
     via the new 'lbr' call-graph recording option:

        perf record --call-graph lbr
        perf report

     or:

        perf top --call-graph lbr

     This hardware feature is a lot faster than stack walk or dwarf
     based unwinding, but has some limitations:

       - It reuses the current LBR facility, so LBR call stack and
         branch record can not be enabled at the same time.

       - It is only available for user-space callchains.

     (Yan, Zheng)

   - x86 Intel Broadwell CPU support and various event constraints and
     event table fixes for earlier models.

     (Andi Kleen)

   - x86 Intel HT CPUs event scheduling workarounds.  This is a complex
     CPU bug affecting the SNB,IVB,HSW families that results in counter
     value corruption.  The mitigation code is automatically enabled and
     is transparent.

     (Maria Dimakopoulou, Stephane Eranian)

  The perf tooling side had a ton of changes in this cycle as well, so
  I'm only able to list the user visible changes here, in addition to
  the tooling changes outlined above:

  User visible changes affecting all tools:

      - Improve support of compressed kernel modules (Jiri Olsa)
      - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo)
      - Bash completion for subcommands (Yunlong Song)
      - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)
      - Support missing -f to override perf.data file ownership. (Yunlong Song)
      - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)

  User visible changes in individual tools:

    'perf data':

        New tool for converting perf.data to other formats, initially
        for the CTF (Common Trace Format) from LTTng (Jiri Olsa,
        Sebastian Siewior)

    'perf diff':

        Add --kallsyms option (David Ahern)

    'perf list':

        Allow listing events with 'tracepoint' prefix (Yunlong Song)

        Sort the output of the command (Yunlong Song)

    'perf kmem':

        Respect -i option (Jiri Olsa)

        Print big numbers using thousands' group (Namhyung Kim)

        Allow -v option (Namhyung Kim)

        Fix alignment of slab result table (Namhyung Kim)

    'perf probe':

        Support multiple probes on different binaries on the same command line (Masami Hiramatsu)

        Support unnamed union/structure members data collection. (Masami Hiramatsu)

        Check kprobes blacklist when adding new events. (Masami Hiramatsu)

    'perf record':

        Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)

        Support recording running/enabled time (Andi Kleen)

    'perf sched':

        Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song)

    'perf report' and 'perf top':

        Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo)

        Indicate which callchain entries are annotated in the
        TUI hists browser (Arnaldo Carvalho de Melo)

        Add pid/tid filtering to 'report' and 'script' commands (David Ahern)

        Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
        cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
        events (Arnaldo Carvalho de Melo)

    'perf stat':

        Report unsupported events properly (Suzuki K. Poulose)

        Output running time and run/enabled ratio in CSV mode (Andi Kleen)

    'perf trace':

        Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo)

        Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo)

        Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo)

        Dump stack on segfaults (Arnaldo Carvalho de Melo)

        No need to explicitely enable evsels for workload started from perf, let it
        be enabled via perf_event_attr.enable_on_exec, removing some events that take
        place in the 'perf trace' before a workload is really started by it.
        (Arnaldo Carvalho de Melo)

        Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo)

  There's also been a ton of infrastructure work done, such as the
  split-out of perf's build system into tools/build/ and other changes -
  see the shortlog and changelog for details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits)
  perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()
  perf evlist: Fix type for references to data_head/tail
  perf probe: Check the orphaned -x option
  perf probe: Support multiple probes on different binaries
  perf buildid-list: Fix segfault when show DSOs with hits
  perf tools: Fix cross-endian analysis
  perf tools: Fix error path to do closedir() when synthesizing threads
  perf tools: Fix synthesizing fork_event.ppid for non-main thread
  perf tools: Add 'I' event modifier for exclude_idle bit
  perf report: Don't call map__kmap if map is NULL.
  perf tests: Fix attr tests
  perf probe: Fix ARM 32 building error
  perf tools: Merge all perf_event_attr print functions
  perf record: Add clockid parameter
  perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
  perf sched replay: Support using -f to override perf.data file ownership
  perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
  perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
  perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
  perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
  ...
2015-04-14 14:37:47 -07:00
Linus Torvalds
e95e7f6270 Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull NOHZ changes from Ingo Molnar:
 "This tree adds full dynticks support to KVM guests (support the
  disabling of the timer tick on the guest).  The main missing piece was
  the recognition of guest execution as RCU extended quiescent state and
  related changes"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kvm,rcu,nohz: use RCU extended quiescent state when running KVM guest
  context_tracking: Export context_tracking_user_enter/exit
  context_tracking: Run vtime_user_enter/exit only when state == CONTEXT_USER
  context_tracking: Add stub context_tracking_is_enabled
  context_tracking: Generalize context tracking APIs to support user and guest
  context_tracking: Rename context symbols to prepare for transition state
  ppc: Remove unused cpp symbols in kvm headers
2015-04-14 13:58:48 -07:00
Linus Torvalds
078838d565 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU changes from Ingo Molnar:
 "The main changes in this cycle were:

   - changes permitting use of call_rcu() and friends very early in
     boot, for example, before rcu_init() is invoked.

   - add in-kernel API to enable and disable expediting of normal RCU
     grace periods.

   - improve RCU's handling of (hotplug-) outgoing CPUs.

   - NO_HZ_FULL_SYSIDLE fixes.

   - tiny-RCU updates to make it more tiny.

   - documentation updates.

   - miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
  cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well
  cpu: Defer smpboot kthread unparking until CPU known to scheduler
  rcu: Associate quiescent-state reports with grace period
  rcu: Yet another fix for preemption and CPU hotplug
  rcu: Add diagnostics to grace-period cleanup
  rcutorture: Default to grace-period-initialization delays
  rcu: Handle outgoing CPUs on exit from idle loop
  cpu: Make CPU-offline idle-loop transition point more precise
  rcu: Eliminate ->onoff_mutex from rcu_node structure
  rcu: Process offlining and onlining only at grace-period start
  rcu: Move rcu_report_unblock_qs_rnp() to common code
  rcu: Rework preemptible expedited bitmask handling
  rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs
  rcutorture: Enable slow grace-period initializations
  rcu: Provide diagnostic option to slow down grace-period initialization
  rcu: Detect stalls caused by failure to propagate up rcu_node tree
  rcu: Eliminate empty HOTPLUG_CPU ifdef
  rcu: Simplify sync_rcu_preempt_exp_init()
  rcu: Put all orphan-callback-related code under same comment
  rcu: Consolidate offline-CPU callback initialization
  ...
2015-04-14 13:36:04 -07:00
Linus Torvalds
d0bbe0dd35 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "Usual trivial tree updates.  Nothing outstanding -- mostly printk()
  and comment fixes and unused identifier removals"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  goldfish: goldfish_tty_probe() is not using 'i' any more
  powerpc: Fix comment in smu.h
  qla2xxx: Fix printks in ql_log message
  lib: correct link to the original source for div64_u64
  si2168, tda10071, m88ds3103: Fix firmware wording
  usb: storage: Fix printk in isd200_log_config()
  qla2xxx: Fix printk in qla25xx_setup_mode
  init/main: fix reset_device comment
  ipwireless: missing assignment
  goldfish: remove unreachable line of code
  coredump: Fix do_coredump() comment
  stacktrace.h: remove duplicate declaration task_struct
  smpboot.h: Remove unused function prototype
  treewide: Fix typo in printk messages
  treewide: Fix typo in printk messages
  mod_devicetable: fix comment for match_flags
2015-04-14 09:50:27 -07:00
Linus Torvalds
07f2d8c63f Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS changes from Ingo Molnar:
 "The main changes in this cycle were:

   - Simplify the CMCI storm logic on Intel CPUs after yet another
     report about a race in the code (Borislav Petkov)

   - Enable the MCE threshold irq on AMD CPUs by default (Aravind
     Gopalakrishnan)

   - Add AMD-specific MCE-severity grading function.  Further error
     recovery actions will be based on its output (Aravind Gopalakrishnan)

   - Documentation updates (Borislav Petkov)

   - ... assorted fixes and cleanups"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce/severity: Fix warning about indented braces
  x86/mce: Define mce_severity function pointer
  x86/mce: Add an AMD severities-grading function
  x86/mce: Reindent __mcheck_cpu_apply_quirks() properly
  x86/mce: Use safe MSR accesses for AMD quirk
  x86/MCE/AMD: Enable thresholding interrupts by default if supported
  x86/MCE: Make mce_panic() fatal machine check msg in the same pattern
  x86/MCE/intel: Cleanup CMCI storm logic
  Documentation/acpi/einj: Correct and streamline text
  x86/MCE/AMD: Drop bogus const modifier from AMD's bank4_names()
2015-04-13 13:33:20 -07:00
Linus Torvalds
6cf78d4b37 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "The main changes in this cycle were:

   - reduce the x86/32 PAE per task PGD allocation overhead from 4K to
     0.032k (Fenghua Yu)

   - early_ioremap/memunmap() usage cleanups (Juergen Gross)

   - gbpages support cleanups (Luis R Rodriguez)

   - improve AMD Bulldozer (family 0x15) ASLR I$ aliasing workaround to
     increase randomization by 3 bits (per bootup) (Hector
     Marco-Gisbert)

   - misc fixlets"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Improve AMD Bulldozer ASLR workaround
  x86/mm/pat: Initialize __cachemode2pte_tbl[] and __pte2cachemode_tbl[] in a bit more readable fashion
  init.h: Clean up the __setup()/early_param() macros
  x86/mm: Simplify probe_page_size_mask()
  x86/mm: Further simplify 1 GB kernel linear mappings handling
  x86/mm: Use early_param_on_off() for direct_gbpages
  init.h: Add early_param_on_off()
  x86/mm: Simplify enabling direct_gbpages
  x86/mm: Use IS_ENABLED() for direct_gbpages
  x86/mm: Unexport set_memory_ro() and set_memory_rw()
  x86/mm, efi: Use early_ioremap() in arch/x86/platform/efi/efi-bgrt.c
  x86/mm: Use early_memunmap() instead of early_iounmap()
  x86/mm/pat: Ensure different messages in STRICT_DEVMEM and PAT cases
  x86/mm: Reduce PAE-mode per task pgd allocation overhead from 4K to 32 bytes
2015-04-13 13:31:32 -07:00
Linus Torvalds
0ad5c6b3c2 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode changes from Ingo Molnar:
 "Microcode driver updates: mostly cleanups but also some fixes
  (Borislav Petkov)"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/amd: Drop the pci_ids.h dependency
  x86/microcode/intel: Fix printing of microcode blobs in show_saved_mc()
  x86/microcode/intel: Check scan_microcode()'s retval
  x86/microcode/intel: Sanitize microcode_pointer()
  x86/microcode/intel: Move mc arg last in get_matching_{microcode|sig}
  x86/microcode/intel: Simplify generic_load_microcode_early()
  x86/microcode: Consolidate family,model, ... code
  x86/microcode/intel: Rename update_match_revision()
  x86/microcode/intel: Sanitize _save_mc()
  x86/microcode/intel: Make _save_mc() return the updated saved count
  x86/microcode/intel: Simplify load_ucode_intel_bsp()
  x86/microcode/intel: Get rid of last arg to load_ucode_intel_bsp()
  x86/microcode/intel: Do the mc_saved_src NULL check first
  x86/microcode/intel: Check if microcode was found before applying
  x86/microcode/intel: Fix out of bounds memory access to the extended header
2015-04-13 13:25:33 -07:00
Linus Torvalds
421ec9017f Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu changes from Ingo Molnar:
 "Various x86 FPU handling cleanups, refactorings and fixes (Borislav
  Petkov, Oleg Nesterov, Rik van Riel)"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/fpu: Kill eager_fpu_init_bp()
  x86/fpu: Don't allocate fpu->state for swapper/0
  x86/fpu: Rename drop_init_fpu() to fpu_reset_state()
  x86/fpu: Fold __drop_fpu() into its sole user
  x86/fpu: Don't abuse drop_init_fpu() in flush_thread()
  x86/fpu: Use restore_init_xstate() instead of math_state_restore() on kthread exec
  x86/fpu: Introduce restore_init_xstate()
  x86/fpu: Document user_fpu_begin()
  x86/fpu: Factor out memset(xstate, 0) in fpu_finit() paths
  x86/fpu: Change xstateregs_get()/set() to use ->xsave.i387 rather than ->fxsave
  x86/fpu: Don't abuse FPU in kernel threads if use_eager_fpu()
  x86/fpu: Always allow FPU in interrupt if use_eager_fpu()
  x86/fpu: __kernel_fpu_begin() should clear fpu_owner_task even if use_eager_fpu()
  x86/fpu: Also check fpu_lazy_restore() when use_eager_fpu()
  x86/fpu: Use task_disable_lazy_fpu_restore() helper
  x86/fpu: Use an explicit if/else in switch_fpu_prepare()
  x86/fpu: Introduce task_disable_lazy_fpu_restore() helper
  x86/fpu: Move lazy restore functions up a few lines
  x86/fpu: Change math_error() to use unlazy_fpu(), kill (now) unused save_init_fpu()
  x86/fpu: Don't do __thread_fpu_end() if use_eager_fpu()
  ...
2015-04-13 13:24:23 -07:00
Linus Torvalds
64f004a2ab Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 debug changes from Ingo Molnar:
 "Stack printing fixlets"

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kernel: Use kstack_end() in dumpstack_64.c
  x86/kernel: Fix output of show_stack_log_lvl()
2015-04-13 13:23:34 -07:00
Linus Torvalds
b48488d109 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cacheinfo sysfs changes from Ingo Molnar:
 "This tree converts the x86 cacheinfo sysfs code to use the generic
  code in drivers/base/cacheinfo.c.

  It's not intended to change the sysfs ABI:

      'This patch neither alters any existing sysfs entries nor their
       formating, however since the generic cacheinfo has switched to
       use the device attributes instead of the traditional raw
       kobjects, a directory named 'power' along with its standard
       attributes are added similar to any other device'"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/cacheinfo: Fix cache_get_priv_group() for Intel processors
  x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure
2015-04-13 13:21:51 -07:00
Linus Torvalds
9f3252f1ad Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Various cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/iommu: Fix header comments regarding standard and _FINISH macros
  x86/earlyprintk: Put CONFIG_PCI-only functions under the #ifdef
  x86: Fix up obsolete __cpu_set() function usage
2015-04-13 13:20:54 -07:00
Linus Torvalds
5945fba8c5 Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build changes from Ingo Molnar:
 "Small cleanups and fixes"

* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kexec: Cleanup KEXEC_VERIFY_SIG Kconfig help text
  x86/build/defconfig: Enable USB_EHCI_TT_NEWSCHED=y
  x86/build: Fix mkcapflags.sh bash-ism
  x86/Kconfig: Simplify X86_UP_APIC handling
  x86/Kconfig: Simplify X86_IO_APIC dependencies
  x86/Kconfig: Avoid issuing pointless turned off entries to .config
2015-04-13 13:19:59 -07:00
Linus Torvalds
8f74bc5ff0 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot changes from Ingo Molnar:
 "A number of cleanups"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Standardize strcmp()
  x86/boot/64: Remove pointless early_printk() message
  x86/boot/video: Move the 'video_segment' variable to video.c
2015-04-13 13:19:10 -07:00
Linus Torvalds
60f898eeaa Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm changes from Ingo Molnar:
 "There were lots of changes in this development cycle:

   - over 100 separate cleanups, restructuring changes, speedups and
     fixes in the x86 system call, irq, trap and other entry code, part
     of a heroic effort to deobfuscate a decade old spaghetti asm code
     and its C code dependencies (Denys Vlasenko, Andy Lutomirski)

   - alternatives code fixes and enhancements (Borislav Petkov)

   - simplifications and cleanups to the compat code (Brian Gerst)

   - signal handling fixes and new x86 testcases (Andy Lutomirski)

   - various other fixes and cleanups

  By their nature many of these changes are risky - we tried to test
  them well on many different x86 systems (there are no known
  regressions), and they are split up finely to help bisection - but
  there's still a fair bit of residual risk left so caveat emptor"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (148 commits)
  perf/x86/64: Report regs_user->ax too in get_regs_user()
  perf/x86/64: Simplify regs_user->abi setting code in get_regs_user()
  perf/x86/64: Do report user_regs->cx while we are in syscall, in get_regs_user()
  perf/x86/64: Do not guess user_regs->cs, ss, sp in get_regs_user()
  x86/asm/entry/32: Tidy up JNZ instructions after TESTs
  x86/asm/entry/64: Reduce padding in execve stubs
  x86/asm/entry/64: Remove GET_THREAD_INFO() in ret_from_fork
  x86/asm/entry/64: Simplify jumps in ret_from_fork
  x86/asm/entry/64: Remove a redundant jump
  x86/asm/entry/64: Optimize [v]fork/clone stubs
  x86/asm/entry: Zero EXTRA_REGS for stub32_execve() too
  x86/asm/entry/64: Move stub_x32_execvecloser() to stub_execveat()
  x86/asm/entry/64: Use common code for rt_sigreturn() epilogue
  x86/asm/entry/64: Add forgotten CFI annotation
  x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layout
  x86/asm/entry/64: Move opportunistic sysret code to syscall code path
  x86, selftests: Add sigreturn selftest
  x86/alternatives: Guard NOPs optimization
  x86/asm/entry: Clear EXTRA_REGS for all executable formats
  x86/signal: Remove pax argument from restore_sigcontext
  ...
2015-04-13 13:16:36 -07:00
Linus Torvalds
977e1ba508 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic changes from Ingo Molnar:
 "Changes:

   - SGI UV APIC driver updates

   - dead code removal"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/uv: Update the UV APIC HUB check
  x86/apic/uv: Update the UV APIC driver check
  x86/apic/uv: Update the APIC UV OEM check
  x86/apic: Remove verify_local_APIC()
2015-04-13 13:15:09 -07:00
Linus Torvalds
7fd56474db Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Ingo Molnar:
 "The main changes in this cycle were:

   - clockevents state machine cleanups and enhancements (Viresh Kumar)

   - clockevents broadcast notifier horror to state machine conversion
     and related cleanups (Thomas Gleixner, Rafael J Wysocki)

   - clocksource and timekeeping core updates (John Stultz)

   - clocksource driver updates and fixes (Ben Dooks, Dmitry Osipenko,
     Hans de Goede, Laurent Pinchart, Maxime Ripard, Xunlei Pang)

   - y2038 fixes (Xunlei Pang, John Stultz)

   - NMI-safe ktime_get_raw_fast() and general refactoring of the clock
     code, in preparation to perf's per event clock ID support (Peter
     Zijlstra)

   - generic sched/clock fixes, optimizations and cleanups (Daniel
     Thompson)

   - clockevents cpu_down() race fix (Preeti U Murthy)"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (94 commits)
  timers/PM: Drop unnecessary braces from tick_freeze()
  timers/PM: Fix up tick_unfreeze()
  timekeeping: Get rid of stale comment
  clockevents: Cleanup dead cpu explicitely
  clockevents: Make tick handover explicit
  clockevents: Remove broadcast oneshot control leftovers
  sched/idle: Use explicit broadcast oneshot control function
  ARM: Tegra: Use explicit broadcast oneshot control function
  ARM: OMAP: Use explicit broadcast oneshot control function
  intel_idle: Use explicit broadcast oneshot control function
  ACPI/idle: Use explicit broadcast control function
  ACPI/PAD: Use explicit broadcast oneshot control function
  x86/amd/idle, clockevents: Use explicit broadcast oneshot control functions
  clockevents: Provide explicit broadcast oneshot control functions
  clockevents: Remove the broadcast control leftovers
  ARM: OMAP: Use explicit broadcast control function
  intel_idle: Use explicit broadcast control function
  cpuidle: Use explicit broadcast control function
  ACPI/processor: Use explicit broadcast control function
  ACPI/PAD: Use explicit broadcast control function
  ...
2015-04-13 11:08:28 -07:00
Linus Torvalds
49d2953c72 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "Major changes:

   - Reworked CPU capacity code, for better SMP load balancing on
     systems with assymetric CPUs. (Vincent Guittot, Morten Rasmussen)

   - Reworked RT task SMP balancing to be push based instead of pull
     based, to reduce latencies on large CPU count systems. (Steven
     Rostedt)

   - SCHED_DEADLINE support updates and fixes. (Juri Lelli)

   - SCHED_DEADLINE task migration support during CPU hotplug. (Wanpeng Li)

   - x86 mwait-idle optimizations and fixes. (Mike Galbraith, Len Brown)

   - sched/numa improvements. (Rik van Riel)

   - various cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
  sched/core: Drop debugging leftover trace_printk call
  sched/deadline: Support DL task migration during CPU hotplug
  sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()
  sched/deadline: Always enqueue on previous rq when dl_task_timer() fires
  sched/core: Remove unused argument from init_[rt|dl]_rq()
  sched/deadline: Fix rt runtime corruption when dl fails its global constraints
  sched/deadline: Avoid a superfluous check
  sched: Improve load balancing in the presence of idle CPUs
  sched: Optimize freq invariant accounting
  sched: Move CFS tasks to CPUs with higher capacity
  sched: Add SD_PREFER_SIBLING for SMT level
  sched: Remove unused struct sched_group_capacity::capacity_orig
  sched: Replace capacity_factor by usage
  sched: Calculate CPU's usage statistic and put it into struct sg_lb_stats::group_usage
  sched: Add struct rq::cpu_capacity_orig
  sched: Make scale_rt invariant with frequency
  sched: Make sched entity usage tracking scale-invariant
  sched: Remove frequency scaling from cpu_capacity
  sched: Track group sched_entity usage contributions
  sched: Add sched_avg::utilization_avg_contrib
  ...
2015-04-13 10:47:34 -07:00
Linus Torvalds
9003601310 The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64.
ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling
 vhost, too), page aging
 
 s390: interrupt handling rework, allowing to inject all local interrupts
 via new ioctl and to get/set the full local irq state for migration
 and introspection.  New ioctls to access memory by virtual address,
 and to get/set the guest storage keys.  SIMD support.
 
 MIPS: FPU and MIPS SIMD Architecture (MSA) support.  Includes some patches
 from Ralf Baechle's MIPS tree.
 
 x86: bugfixes (notably for pvclock, the others are small) and cleanups.
 Another small latency improvement for the TSC deadline timer.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJVJ9vmAAoJEL/70l94x66DoMEH/R3rh8IMf4jTiWRkcqohOMPX
 k1+NaSY/lCKayaSgggJ2hcQenMbQoXEOdslvaA/H0oC+VfJGK+lmU6E63eMyyhjQ
 Y+Px6L85NENIzDzaVu/TIWWuhil5PvIRr3VO8cvntExRoCjuekTUmNdOgCvN2ObW
 wswN2qRdPIeEj2kkulbnye+9IV4G0Ne9bvsmUdOdfSSdi6ZcV43JcvrpOZT++mKj
 RrKB+3gTMZYGJXMMLBwMkdl8mK1ozriD+q0mbomT04LUyGlPwYLl4pVRDBqyksD7
 KsSSybaK2E4i5R80WEljgDMkNqrCgNfg6VZe4n9Y+CfAAOToNnkMJaFEi+yuqbs=
 =yu2b
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "First batch of KVM changes for 4.1

  The most interesting bit here is irqfd/ioeventfd support for ARM and
  ARM64.

  Summary:

  ARM/ARM64:
     fixes for live migration, irqfd and ioeventfd support (enabling
     vhost, too), page aging

  s390:
     interrupt handling rework, allowing to inject all local interrupts
     via new ioctl and to get/set the full local irq state for migration
     and introspection.  New ioctls to access memory by virtual address,
     and to get/set the guest storage keys.  SIMD support.

  MIPS:
     FPU and MIPS SIMD Architecture (MSA) support.  Includes some
     patches from Ralf Baechle's MIPS tree.

  x86:
     bugfixes (notably for pvclock, the others are small) and cleanups.
     Another small latency improvement for the TSC deadline timer"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
  KVM: use slowpath for cross page cached accesses
  kvm: mmu: lazy collapse small sptes into large sptes
  KVM: x86: Clear CR2 on VCPU reset
  KVM: x86: DR0-DR3 are not clear on reset
  KVM: x86: BSP in MSR_IA32_APICBASE is writable
  KVM: x86: simplify kvm_apic_map
  KVM: x86: avoid logical_map when it is invalid
  KVM: x86: fix mixed APIC mode broadcast
  KVM: x86: use MDA for interrupt matching
  kvm/ppc/mpic: drop unused IRQ_testbit
  KVM: nVMX: remove unnecessary double caching of MAXPHYADDR
  KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry
  KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu
  KVM: vmx: pass error code with internal error #2
  x86: vdso: fix pvclock races with task migration
  KVM: remove kvm_read_hva and kvm_read_hva_atomic
  KVM: x86: optimize delivery of TSC deadline timer interrupt
  KVM: x86: extract blocking logic from __vcpu_run
  kvm: x86: fix x86 eflags fixed bit
  KVM: s390: migrate vcpu interrupt state
  ...
2015-04-13 09:47:01 -07:00
Richard Weinberger
3050a35fba x86: Remove signal translation and exec_domain
As execution domain support is gone we can remove
signal translation from the signal code and remove
exec_domain from thread_info.

Signed-off-by: Richard Weinberger <richard@nod.at>
2015-04-12 21:03:28 +02:00
Ingo Molnar
066450be41 perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()
Dan Carpenter pointed out that the control flow in pt_pmu_hw_init()
is a bit messy: for example the kfree(de_attrs) is entirely
superfluous.

Another problem is the inconsistent mixing of label based and
direct return error handling.

Add modern, label based error handling instead and clean up the code
a bit as well.

Note that we'll still do a kfree(NULL) in the normal case - this does
not matter as this is an init path and kfree() returns early if it
sees a NULL.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20150409090805.GG17605@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-12 11:21:15 +02:00
Denys Vlasenko
3b75232d55 perf/x86/64: Report regs_user->ax too in get_regs_user()
I don't see why we report e.g. orix_ax, which is not always
meaningful, but don't report ax, which is meaningful.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1428671219-29341-4-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-11 13:08:53 +02:00
Denys Vlasenko
32caa06091 perf/x86/64: Simplify regs_user->abi setting code in get_regs_user()
user_64bit_mode(regs) basically checks regs->cs to point to a
64-bit segment. This check used to be unreliable here because
regs->cs was not always correct in syscalls.

Now regs->cs is always correct: in syscalls, in interrupts, in
exceptions. No need to emply heuristics here.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1428671219-29341-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-11 13:08:53 +02:00
Denys Vlasenko
5df71b396b perf/x86/64: Do report user_regs->cx while we are in syscall, in get_regs_user()
Yes, it is true that cx contains return address.
It's not clear why we trash it.
Stop doing that.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1428671219-29341-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-11 13:08:53 +02:00
Denys Vlasenko
aa21df0424 perf/x86/64: Do not guess user_regs->cs, ss, sp in get_regs_user()
After recent changes to syscall entry points,
user_regs->{cs,ss,sp} are always correct. (They used to be
undefined while in syscalls).

We can report them reliably, without guessing.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1428671219-29341-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-11 13:08:52 +02:00
Denys Vlasenko
54cfa458b4 x86/asm/entry/32: Tidy up JNZ instructions after TESTs
After TESTs, use logically correct JNZ mnemonic instead of JNE.

This doesn't change code:

  md5:
   c3005b39a11fe582b7df7908561ad4ee  entry_32.o.before.asm
   c3005b39a11fe582b7df7908561ad4ee  entry_32.o.after.asm

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428689620-21881-1-git-send-email-dvlasenk@redhat.com
[ Added object file comparison. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-11 10:40:02 +02:00
Mike Travis
1912c7afa3 x86/apic/uv: Update the UV APIC HUB check
Update the check for UV2000/3000.  Note when the HUB is not recognized.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Hedi Berriche <hedi@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/20150409182629.267239403@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-10 10:16:08 +02:00
Mike Travis
379b97e280 x86/apic/uv: Update the UV APIC driver check
Fix a bug in the OEM check function that determines if the
system is a UV system and the BIOS is compatible with the
kernel's UV apic driver.  This prevents some possibly obscure
panics and guards the system against being started on SGI
hardware that does not have the required kernel support.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Hedi Berriche <hedi@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/20150409182629.112998930@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-10 10:16:07 +02:00
Mike Travis
7a4e017041 x86/apic/uv: Update the APIC UV OEM check
Optimize the first "SGI" OEM check to return faster if the
system is not an SGI or UV system.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Hedi Berriche <hedi@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/20150409182628.952357922@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-10 10:16:07 +02:00
Damien Lespiau
31d4dcf705 drm/i915/bxt: Broxton uses the same GMS values as Skylake
v2: Rebase on top of the early-quirks rework from Ville.

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v1)
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-09 15:57:47 +02:00
Denys Vlasenko
a37f34a325 x86/asm/entry/64: Reduce padding in execve stubs
execve stubs are 7 bytes only. Padding them to 16 bytes is a
waste.

   text	   data	    bss	    dec	    hex	filename
  12594	      0	      0	  12594	   3132	entry_64.o.before
  12530	      0	      0	  12530	   30f2	entry_64.o

Run-tested.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-8-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09 10:31:26 +02:00
Denys Vlasenko
54a81e914b x86/asm/entry/64: Remove GET_THREAD_INFO() in ret_from_fork
It used to be used to check for _TIF_IA32, but the check has
been removed.

Remove GET_THREAD_INFO() too.

Run-tested.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-7-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09 10:31:26 +02:00
Denys Vlasenko
66ad4efa51 x86/asm/entry/64: Simplify jumps in ret_from_fork
Replace
        test
        jz  1f
        jmp label
    1:

with
        test
        jnz label

Run-tested.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-6-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09 10:31:25 +02:00
Denys Vlasenko
a30b0085f5 x86/asm/entry/64: Remove a redundant jump
Jumping to the very next instruction is not very useful:

        jmp label
    label:

Removing the jump.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-5-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09 10:31:25 +02:00
Denys Vlasenko
772951c4e4 x86/asm/entry/64: Optimize [v]fork/clone stubs
Replace "call func; ret" with "jmp func".

Run-tested.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-4-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09 10:31:25 +02:00
Denys Vlasenko
0f90fb979d x86/asm/entry: Zero EXTRA_REGS for stub32_execve() too
The change which affected how execve clears EXTRA_REGS missed
32-bit execve syscalls.

Fix this by using 64-bit execve stub epilogue for them too.

Run-tested.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09 10:31:24 +02:00
Denys Vlasenko
05f1752d19 x86/asm/entry/64: Move stub_x32_execvecloser() to stub_execveat()
This is a preparatory patch for moving stub32_execve[at]() to this
file. It makes sense to have all execve stubs in one place, so
that they can reuse code.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09 10:31:24 +02:00
Denys Vlasenko
31f0119b81 x86/asm/entry/64: Use common code for rt_sigreturn() epilogue
Similarly to stub_execve, we can reuse the epilogue in
stub_rt_sigreturn() and stub_x32_rt_sigreturn().

Add a comment explaining why we can't eliminage SAVE_EXTRA_REGS
here.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09 10:31:24 +02:00
Denys Vlasenko
8b3607b5b8 x86/asm/entry/64: Add forgotten CFI annotation
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428424967-14460-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-08 09:18:36 +02:00
Denys Vlasenko
3304c9c37b x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layout
Interrupt entry points are handled with the following code,
each 32-byte code block contains seven entry points:

		...
		[push][jump 22] // 4 bytes
		[push][jump 18] // 4 bytes
		[push][jump 14] // 4 bytes
		[push][jump 10] // 4 bytes
		[push][jump  6] // 4 bytes
		[push][jump  2] // 4 bytes
		[push][jump common_interrupt][padding] // 8 bytes

		[push][jump]
		[push][jump]
		[push][jump]
		[push][jump]
		[push][jump]
		[push][jump]
		[push][jump common_interrupt][padding]

		[padding_2]
	common_interrupt:

And there is a table which holds pointers to every entry point,
IOW: to every push.

In cold cache, two jumps are still costlier than one, even
though we get the benefit of them residing in the same
cacheline.

This change replaces short jumps with near ones to
'common_interrupt', and pads every push+jump pair to 8 bytes. This
way, each interrupt takes only one jump.

This change replaces ".p2align CONFIG_X86_L1_CACHE_SHIFT" before
dispatch table with ".align 8" - we do not need anything
stronger than that.

The table of entry addresses (the interrupt[] array) is no
longer necessary, the address of entries can be easily
calculated as (irq_entries_start + i*8).

   text	   data	    bss	    dec	    hex	filename
  12546	      0	      0	  12546	   3102	entry_64.o.before
  11626	      0	      0	  11626	   2d6a	entry_64.o

The size decrease is because 1656 bytes of .init.rodata are
gone. That's initdata, though. The resident size does go up a
bit.

Run-tested (32 and 64 bits).

Acked-and-Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428090553-7283-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-08 09:02:13 +02:00
Denys Vlasenko
fffbb5dcfd x86/asm/entry/64: Move opportunistic sysret code to syscall code path
This change does two things:

Copy-pastes "retint_swapgs:" code into syscall handling code,
the copy is under "syscall_return:" label. The code is unchanged
apart from some label renames.

Removes "opportunistic sysret" code from "retint_swapgs:" code
block, since now it won't be reached by syscall return. This in
fact removes most of the code in question.

   text	   data	    bss	    dec	    hex	filename
  12530	      0	      0	  12530	   30f2	entry_64.o.before
  12562	      0	      0	  12562	   3112	entry_64.o

Run-tested.

Acked-and-Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427993219-7291-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-08 09:02:12 +02:00
Ingo Molnar
4bcc7827b0 Linux 4.0-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVIws/AAoJEHm+PkMAQRiGwEcH/1GCBqrBzXaKwDdCPMRcYVUb
 MYkXmGkCGRYWe5MXI8QNAaa/CdG6mAFMHWN6CaMMpLTxnM1m87uBg01fQMsh73BO
 mRVLKE/soiJDnR1gYzBBDBYV/AUvytN5PhgeNaA95YIJvU3T1f3iTnV8vs30Dp0L
 YpxSqwr3C0k7C9IE0VcgfzvWJPCnQ9IWHuX3jn5s1XjGKVNbBYHMt6FusHdyXMfT
 dp8ksuGHwm30mTFI5xJpKOrRzfi+P5EsEUrsnFRPRM/iFTVrM5R7eaUhsRZb2+Wo
 YApnbYhUYz7om1AuQ+UZ/+S6y7ZLlGWegI1lWI754GIsczG5vPHEYhhgkzMhTsc=
 =kR1V
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc7' into x86/asm, to resolve conflicts

Conflicts:
	arch/x86/kernel/entry_64.S

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-08 09:01:54 +02:00
Greg Kroah-Hartman
b3e3bf2ef2 Merge 4.0-rc7 into tty-next
We want the fixes in here as well, also to help out with merge issues.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-07 11:07:20 +02:00
Borislav Petkov
69df353ff3 x86/alternatives: Guard NOPs optimization
Take a look at the first instruction byte before optimizing the NOP -
there might be something else there already, like the ALTERNATIVE_2()
in rdtsc_barrier() which NOPs out on AMD even though we just
patched in an MFENCE.

This happens because the alternatives sees X86_FEATURE_MFENCE_RDTSC,
AMD CPUs set it, we patch in the MFENCE and right afterwards it sees
X86_FEATURE_LFENCE_RDTSC which AMD CPUs don't set and we blindly
optimize the NOP.

Checking whether at least the first byte is 0x90 prevents that.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1428181662-18020-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-06 09:24:09 +02:00
Denys Vlasenko
fc3e958a2b x86/asm/entry: Clear EXTRA_REGS for all executable formats
On failure, sys_execve() does not clobber EXTRA_REGS, so we can
just return to userpsace without saving/restoring them.

On success, ELF_PLAT_INIT() in sys_execve() clears all these
registers.

On other executable formats:

  - binfmt_flat.c has similar FLAT_PLAT_INIT, but x86 (and everyone
    else except sh) doesn't define it.

  - binfmt_elf_fdpic.c has ELF_FDPIC_PLAT_INIT, but x86 (and most
    others) doesn't define it.

  - There are no such hooks in binfmt_aout.c et al. We inherit
    EXTRA_REGS from the prior executable.

This inconsistency was not intended.

This change removes SAVE/RESTORE_EXTRA_REGS in stub_execve,
removes register clearing in ELF_PLAT_INIT(),
and instead simply clears them on success in stub_execve.

Run-tested.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428173719-7637-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-06 09:24:08 +02:00
Brian Gerst
6a3713f001 x86/signal: Remove pax argument from restore_sigcontext
The 'pax' argument is unnecesary.  Instead, store the RAX value
directly in regs.

This pattern goes all the way back to 2.1.106pre1, when restore_sigcontext()
was changed to return an error code instead of EAX directly:

  https://git.kernel.org/cgit/linux/kernel/git/history/history.git/diff/arch/i386/kernel/signal.c?id=9a8f8b7ca3f319bd668298d447bdf32730e51174

In 2007 sigaltstack syscall support was added, where the return
value of restore_sigcontext() was changed to carry the memory-copying
failure code.

But instead of putting 'ax' into regs->ax directly, it was carried
in via a pointer and then returned, where the generic syscall return
code copied it to regs->ax.

So there was never any deeper reason for this suboptimal pattern, it
was simply never noticed after being introduced.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1428152303-17154-1-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-06 09:06:39 +02:00
Borislav Petkov
dbe4058a6a x86/alternatives: Fix ALTERNATIVE_2 padding generation properly
Quentin caught a corner case with the generation of instruction
padding in the ALTERNATIVE_2 macro: if len(orig_insn) <
len(alt1) < len(alt2), then not enough padding gets added and
that is not good(tm) as we could overwrite the beginning of the
next instruction.

Luckily, at the time of this writing, we don't have
ALTERNATIVE_2() invocations which have that problem and even if
we did, a simple fix would be to prepend the instructions with
enough prefixes so that that corner case doesn't happen.

However, best it would be if we fixed it properly. See below for
a simple, abstracted example of what we're doing.

So what we ended up doing is, we compute the

	max(len(alt1), len(alt2)) - len(orig_insn)

and feed that value to the .skip gas directive. The max() cannot
have conditionals due to gas limitations, thus the fancy integer
math.

With this patch, all ALTERNATIVE_2 sites get padded correctly;
generating obscure test cases pass too:

  #define alt_max_short(a, b)    ((a) ^ (((a) ^ (b)) & -(-((a) < (b)))))

  #define gen_skip(orig, alt1, alt2, marker)	\
  	.skip -((alt_max_short(alt1, alt2) - (orig)) > 0) * \
  		(alt_max_short(alt1, alt2) - (orig)),marker

  	.pushsection .text, "ax"
  .globl main
  main:
  	gen_skip(1, 2, 4, 0x09)
  	gen_skip(4, 1, 2, 0x10)
  	...
  	.popsection

Thanks to Quentin for catching it and double-checking the fix!

Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150404133443.GE21152@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-04 15:58:23 +02:00
Linus Torvalds
567cfea99a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes: a SYSRET single-stepping fix, a dmi-scan robustization
  fix, a reboot quirk and a kgdb fixlet"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kgdb/x86: Fix reporting of 'si' in kgdb on x86_64
  x86/asm/entry/64: Disable opportunistic SYSRET if regs->flags has TF set
  x86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirk
  MAINTAINERS: Change the x86 microcode loader maintainer
  firmware: dmi_scan: Prevent dmi_num integer overflow
2015-04-03 10:42:32 -07:00
Borislav Petkov
6b51311c97 x86/asm/entry/64: Use a define for an invalid segment selector
... instead of a naked number, for better readability.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428054130-25847-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 15:29:13 +02:00
Borislav Petkov
7c74d5b7b7 x86/asm/entry/64: Fix MSR_IA32_SYSENTER_CS MSR value
Commit:

  d56fe4bf5f ("x86/asm/entry/64: Always set up SYSENTER MSRs")

missed to add "ULL" to the 0 and wrmsrl_safe() complains:

  arch/x86/kernel/cpu/common.c: In function ‘syscall_init’:
  arch/x86/kernel/cpu/common.c:1226:2: warning: right shift count >= width of type wrmsrl_safe(MSR_IA32_SYSENTER_CS, 0);

Fix it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428054130-25847-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 15:29:12 +02:00
Borislav Petkov
78cac48c04 x86/mm/KASLR: Propagate KASLR status to kernel proper
Commit:

  e2b32e6785 ("x86, kaslr: randomize module base load address")

made module base address randomization unconditional and didn't regard
disabled KKASLR due to CONFIG_HIBERNATION and command line option
"nokaslr". For more info see (now reverted) commit:

  f47233c2d3 ("x86/mm/ASLR: Propagate base load address calculation")

In order to propagate KASLR status to kernel proper, we need a single bit
in boot_params.hdr.loadflags and we've chosen bit 1 thus leaving the
top-down allocated bits for bits supposed to be used by the bootloader.

Originally-From: Jiri Kosina <jkosina@suse.cz>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 15:26:15 +02:00
Mark Einon
7f99f8b94c x86/earlyprintk: Put CONFIG_PCI-only functions under the #ifdef
Two static functions are only used if CONFIG_PCI is defined, so only
build them if this is the case. Fixes the build warnings:

  arch/x86/kernel/early_printk.c:98:13: warning: ‘mem32_serial_out’ defined but not used [-Wunused-function]
   static void mem32_serial_out(unsigned long addr, int offset, int value)
             ^
  arch/x86/kernel/early_printk.c:105:21: warning: ‘mem32_serial_in’ defined but not used [-Wunused-function]
   static unsigned int mem32_serial_in(unsigned long addr, int offset)
                     ^

Also convert a few related instances of uintXX_t to kernel specific uXX
defines.

Signed-off-by: Mark Einon <mark.einon@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stuart.r.anderson@intel.com
Link: http://lkml.kernel.org/r/1427923924-22653-1-git-send-email-mark.einon@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 15:21:56 +02:00
Aravind Gopalakrishnan
cee8f5a6c8 x86/mce/severity: Fix warning about indented braces
Dan reported compiler warnings about missing curly braces in
mce_severity_amd(). Reindent the catch-all "return MCE_AR_SEVERITY"
correctly to single tab.

While at it, chain ctx == IN_KERNEL check with mcgstatus check to make
it cleaner, as suggested by Boris.

No functional changes are introduced by this patch.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1427814281-18192-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 15:20:38 +02:00
Thomas Gleixner
435c350e81 x86/amd/idle, clockevents: Use explicit broadcast oneshot control functions
Replace the clockevents_notify() call with an explicit function call.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/8569669.lgxIty9PKW@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 08:44:34 +02:00
Thomas Gleixner
162a688e84 x86/amd/idle, clockevents: Use explicit broadcast control function
Replace the clockevents_notify() call with an explicit function call.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1528188.S1pjqkSL1P@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 08:44:31 +02:00
Andy Lutomirski
cf9328cc99 x86/asm/entry/32: Stop caching MSR_IA32_SYSENTER_ESP in tss.sp1
We write a stack pointer to MSR_IA32_SYSENTER_ESP exactly once,
and we unnecessarily cache the value in tss.sp1.  We never
read the cached value.

Remove all of the caching.  It serves no purpose.

Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/05a0163eb33ef5208363f0015496855da7cebadd.1428002830.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 08:30:44 +02:00
Andy Lutomirski
ff8287f363 x86/asm/entry/32: Improve a TOP_OF_KERNEL_STACK_PADDING comment
At Denys' request, clean up the comment describing stack padding
in the 32-bit sysenter path.

No code changes.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/41fee7bb8490ae840fe7ef2699f9c2feb932e729.1428002830.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 08:30:44 +02:00
Ingo Molnar
2e54a5bdba perf/x86/intel/pt: Fix the 32-bit build
On a 32-bit build I got:

  arch/x86/kernel/cpu/perf_event_intel_pt.c:413:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
  arch/x86/kernel/cpu/perf_event_intel_bts.c:162:24: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

Fix it. The code should probably be (re-)tested on 32-bit systems to make
sure all is fine.

Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: linux-kernel@vger.kernel.org
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:58:45 +02:00
Andi Kleen
cd1f11de69 perf/x86/intel: Avoid rewriting DEBUGCTL with the same value for LBRs
perf with LBRs on has a tendency to rewrite the DEBUGCTL MSR with
the same value. Add a little optimization to skip the unnecessary
write.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1426871484-21285-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:20 +02:00
Andi Kleen
1a78d93750 perf/x86/intel: Streamline LBR MSR handling in PMI
The perf PMI currently does unnecessary MSR accesses when
LBRs are enabled. We use LBR freezing, or when in callstack
mode force the LBRs to only filter on ring 3.

So there is no need to disable the LBRs explicitely in the
PMI handler.

Also we always unnecessarily rewrite LBR_SELECT in the LBR
handler, even though it can never change.

 5)               |  /* write_msr: MSR_LBR_SELECT(1c8), value 0 */
 5)               |  /* read_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */
 5)               |  /* write_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */
 5)               |  /* write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f */
 5)               |  /* write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 */
 5)               |  /* write_msr: MSR_LBR_SELECT(1c8), value 0 */
 5)               |  /* read_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */
 5)               |  /* write_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */

This patch:

  - Avoids disabling already frozen LBRs unnecessarily in the PMI
  - Avoids changing LBR_SELECT in the PMI

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1426871484-21285-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:19 +02:00
Andi Kleen
15fde1101a perf/x86: Only dump PEBS register when PEBS has been detected
Technically PEBS_ENABLED is only guaranteed to exist when we
detected PEBS. So add a check for this to the PMU dump function.
I don't think it can happen on a real CPU, but could in a VM.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1425059312-18217-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:17 +02:00
Andi Kleen
da3e606d88 perf/x86: Dump DEBUGCTL in PMU dump
LBRs and LBR freezing are controlled through the DEBUGCTL MSR. So
dump the state of DEBUGCTL too when dumping the PMU state.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1425059312-18217-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:17 +02:00
Andi Kleen
8882edf735 perf/x86/intel: Reset more state in PMU reset
The PMU reset code didn't quite keep up with newer PMU features.
Improve it a bit to really reset a modern PMU:

  - Clear all overflow status
  - Clear LBRs and freezing state
  - Disable fixed counters too

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1425059312-18217-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:16 +02:00
Stephane Eranian
b37609c30e perf/x86/intel: Make the HT bug workaround conditional on HT enabled
This patch disables the PMU HT bug when Hyperthreading (HT)
is disabled. We cannot do this test immediately when perf_events
is initialized. We need to wait until the topology information
is setup properly. As such, we register a later initcall, check
the topology and potentially disable the workaround. To do this,
we need to ensure there is no user of the PMU. At this point of
the boot, the only user is the NMI watchdog, thus we disable
it during the switch and re-enable it right after.

Having the workaround disabled when it is not needed provides
some benefits by limiting the overhead is time and space.
The workaround still ensures correct scheduling of the corrupting
memory events (0xd0, 0xd1, 0xd2) when HT is off. Those events
can only be measured on counters 0-3. Something else the current
kernel did not handle correctly.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Cc: maria.n.dimakopoulou@gmail.com
Link: http://lkml.kernel.org/r/1416251225-17721-13-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:15 +02:00
Stephane Eranian
c02cdbf60b perf/x86/intel: Limit to half counters when the HT workaround is enabled, to avoid exclusive mode starvation
This patch limits the number of counters available to each CPU when
the HT bug workaround is enabled.

This is necessary to avoid situation of counter starvation. Such can
arise from configuration where one HT thread, HT0, is using all 4 counters
with corrupting events which require exclusion the the sibling HT, HT1.

In such case, HT1 would not be able to schedule any event until HT0
is done. To mitigate this problem, this patch artificially limits
the number of counters to 2.

That way, we can gurantee that at least 2 counters are not in exclusive
mode and therefore allow the sibling thread to schedule events of the
same type (system vs. per-thread). The 2 counters are not determined
in advance. We simply set the limit to two events per HT.

This helps mitigate starvation in case of events with specific counter
constraints such a PREC_DIST.

Note that this does not elimintate the starvation is all cases. But
it is better than not having it.

(Solution suggested by Peter Zjilstra.)

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Cc: maria.n.dimakopoulou@gmail.com
Link: http://lkml.kernel.org/r/1416251225-17721-11-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:14 +02:00
Stephane Eranian
a90738c2cb perf/x86/intel: Fix intel_get_event_constraints() for dynamic constraints
With dynamic constraint, we need to restart from the static
constraints each time the intel_get_event_constraints() is called.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1416251225-17721-10-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:14 +02:00
Maria Dimakopoulou
b63b4b459a perf/x86/intel: Enforce HT bug workaround with PEBS for SNB/IVB/HSW
This patch modifies the PEBS constraint tables for SNB/IVB/HSW
such that corrupting events supporting PEBS activate the HT
workaround.

Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1416251225-17721-9-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:13 +02:00
Maria Dimakopoulou
93fcf72cc0 perf/x86/intel: Enforce HT bug workaround for SNB/IVB/HSW
This patches activates the HT bug workaround for the
SNB/IVB/HSW processors. This covers non-PEBS mode.
Activation is done thru the constraint tables.

Both client and server processors needs this workaround.

Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1416251225-17721-8-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:12 +02:00
Maria Dimakopoulou
e979121b1b perf/x86/intel: Implement cross-HT corruption bug workaround
This patch implements a software workaround for a HW erratum
on Intel SandyBridge, IvyBridge and Haswell processors
with Hyperthreading enabled. The errata are documented for
each processor in their respective specification update
documents:

  - SandyBridge: BJ122
  - IvyBridge: BV98
  - Haswell: HSD29

The bug causes silent counter corruption across hyperthreads only
when measuring certain memory events (0xd0, 0xd1, 0xd2, 0xd3).
Counters measuring those events may leak counts to the sibling
counter. For instance, counter 0, thread 0 measuring event 0xd0,
may leak to counter 0, thread 1, regardless of the event measured
there. The size of the leak is not predictible. It all depends on
the workload and the state of each sibling hyper-thread. The
corrupting events do undercount as a consequence of the leak. The
leak is compensated automatically only when the sibling counter measures
the exact same corrupting event AND the workload is on the two threads
is the same. Given, there is no way to guarantee this, a work-around
is necessary. Furthermore, there is a serious problem if the leaked count
is added to a low-occurrence event. In that case the corruption on
the low occurrence event can be very large, e.g., orders of magnitude.

There is no HW or FW workaround for this problem.

The bug is very easy to reproduce on a loaded system.
Here is an example on a Haswell client, where CPU0, CPU4
are siblings. We load the CPUs with a simple triad app
streaming large floating-point vector. We use 0x81d0
corrupting event (MEM_UOPS_RETIRED:ALL_LOADS) and
0x20cc (ROB_MISC_EVENTS:LBR_INSERTS). Given we are not
using the LBR, the 0x20cc event should be zero.

  $ taskset -c 0 triad &
  $ taskset -c 4 triad &
  $ perf stat -a -C 0 -e r81d0 sleep 100 &
  $ perf stat -a -C 4 -r20cc sleep 10
  Performance counter stats for 'system wide':
        139 277 291      r20cc
       10,000969126 seconds time elapsed

In this example, 0x81d0 and r20cc ar eusing sinling counters
on CPU0 and CPU4. 0x81d0 leaks into 0x20cc and corrupts it
from 0 to 139 millions occurrences.

This patch provides a software workaround to this problem by modifying the
way events are scheduled onto counters by the kernel. The patch forces
cross-thread mutual exclusion between counters in case a corrupting event
is measured by one of the hyper-threads. If thread 0, counter 0 is measuring
event 0xd0, then nothing can be measured on counter 0, thread 1. If no corrupting
event is measured on any hyper-thread, event scheduling proceeds as before.

The same example run with the workaround enabled, yield the correct answer:

  $ taskset -c 0 triad &
  $ taskset -c 4 triad &
  $ perf stat -a -C 0 -e r81d0 sleep 100 &
  $ perf stat -a -C 4 -r20cc sleep 10
  Performance counter stats for 'system wide':
        0 r20cc
       10,000969126 seconds time elapsed

The patch does provide correctness for all non-corrupting events. It does not
"repatriate" the leaked counts back to the leaking counter. This is planned
for a second patch series. This patch series makes this repatriation more
easy by guaranteeing the sibling counter is not measuring any useful event.

The patch introduces dynamic constraints for events. That means that events which
did not have constraints, i.e., could be measured on any counters, may now be
constrained to a subset of the counters depending on what is going on the sibling
thread. The algorithm is similar to a cache coherency protocol. We call it XSU
in reference to Exclusive, Shared, Unused, the 3 possible states of a PMU
counter.

As a consequence of the workaround, users may see an increased amount of event
multiplexing, even in situtations where there are fewer events than counters
measured on a CPU.

Patch has been tested on all three impacted processors. Note that when
HT is off, there is no corruption. However, the workaround is still enabled,
yet not costing too much. Adding a dynamic detection of HT on turned out to
be complex are requiring too much to code to be justified.

This patch addresses the issue when PEBS is not used. A subsequent patch
fixes the problem when PEBS is used.

Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
[spinlock_t -> raw_spinlock_t]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1416251225-17721-7-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:12 +02:00
Maria Dimakopoulou
6f6539cad9 perf/x86/intel: Add cross-HT counter exclusion infrastructure
This patch adds a new shared_regs style structure to the
per-cpu x86 state (cpuc). It is used to coordinate access
between counters which must be used with exclusion across
HyperThreads on Intel processors. This new struct is not
needed on each PMU, thus is is allocated on demand.

Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
[peterz: spinlock_t -> raw_spinlock_t]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1416251225-17721-6-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:11 +02:00
Stephane Eranian
79cba82244 perf/x86: Add 'index' param to get_event_constraint() callback
This patch adds an index parameter to the get_event_constraint()
x86_pmu callback. It is expected to represent the index of the
event in the cpuc->event_list[] array. When the callback is used
for fake_cpuc (evnet validation), then the index must be -1. The
motivation for passing the index is to use it to index into another
cpuc array.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Cc: maria.n.dimakopoulou@gmail.com
Link: http://lkml.kernel.org/r/1416251225-17721-5-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:10 +02:00
Maria Dimakopoulou
c5362c0c37 perf/x86: Add 3 new scheduling callbacks
This patch adds 3 new PMU model specific callbacks
during the event scheduling done by x86_schedule_events().

  ->start_scheduling():  invoked when entering the schedule routine.
  ->stop_scheduling():   invoked at the end of the schedule routine
  ->commit_scheduling(): invoked for each committed event

To be used optionally by model-specific code.

Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1416251225-17721-4-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:09 +02:00
Stephane Eranian
9041346431 perf/x86: Vectorize cpuc->kfree_on_online
Make the cpuc->kfree_on_online a vector to accommodate
more than one entry and add the second entry to be
used by a later patch.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1416251225-17721-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:08 +02:00
Stephane Eranian
9a5e3fb52a perf/x86: Rename x86_pmu::er_flags to 'flags'
Because it will be used for more than just tracking the
presence of extra registers.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Cc: maria.n.dimakopoulou@gmail.com
Link: http://lkml.kernel.org/r/1416251225-17721-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:33:08 +02:00
Ingo Molnar
c2b078e78a Merge branch 'perf/urgent' into perf/core, before applying dependent patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:17:46 +02:00
Alexander Shishkin
8062382c8d perf/x86/intel/bts: Add BTS PMU driver
Add support for Branch Trace Store (BTS) via kernel perf event infrastructure.
The difference with the existing implementation of BTS support is that this
one is a separate PMU that exports events' trace buffers to userspace by means
of AUX area of the perf buffer, which is zero-copy mapped into userspace.

The immediate benefit is that the buffer size can be much bigger, resulting in
fewer interrupts and no kernel side copying is involved and little to no trace
data loss. Also, kernel code can be traced with this driver.

The old way of collecting BTS traces still works.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1422614435-114702-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:14:21 +02:00
Alexander Shishkin
52ca9ced3f perf/x86/intel/pt: Add Intel PT PMU driver
Add support for Intel Processor Trace (PT) to kernel's perf events.
PT is an extension of Intel Architecture that collects information about
software execuction such as control flow, execution modes and timings and
formats it into highly compressed binary packets. Even being compressed,
these packets are generated at hundreds of megabytes per second per core,
which makes it impractical to decode them on the fly in the kernel.

This driver exports trace data by through AUX space in the perf ring
buffer, which is zero-copy mapped into userspace for faster data retrieval.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1422614392-114498-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:14:20 +02:00
Alexander Shishkin
4807034248 perf/x86: Mark Intel PT and LBR/BTS as mutually exclusive
Intel PT cannot be used at the same time as LBR or BTS and will cause a
general protection fault if they are used together. In order to avoid
fixing up GPs in the fast path, instead we disallow creating LBR/BTS
events when PT events are present and vice versa.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1421237903-181015-12-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:14:19 +02:00
Alexander Shishkin
ed69628b3b x86: Add Intel Processor Trace (INTEL_PT) cpu feature detection
Intel Processor Trace is an architecture extension that allows for program
flow tracing.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1421237903-181015-11-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:14:18 +02:00
Andi Kleen
c420f19b9c perf/x86/intel: Fix Haswell CYCLE_ACTIVITY.* counter constraints
Some of the CYCLE_ACTIVITY.* events can only be scheduled on
counter 2.  Due to a typo Haswell matched those with
INTEL_EVENT_CONSTRAINT, which lead to the events never
matching as the comparison does not expect anything
in the umask too. Fix the typo.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1425925222-32361-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:07:43 +02:00
Kan Liang
687805e4a6 perf/x86/intel: Filter branches for PEBS event
For supporting Intel LBR branches filtering, Intel LBR sharing logic
mechanism is introduced from commit b36817e886 ("perf/x86: Add Intel
LBR sharing logic"). It modifies __intel_shared_reg_get_constraints() to
config lbr_sel, which is finally used to set LBR_SELECT.

However, the intel_shared_regs_constraints() function is called after
intel_pebs_constraints(). The PEBS event will return immediately after
intel_pebs_constraints(). So it's impossible to filter branches for PEBS
events.

This patch moves intel_shared_regs_constraints() ahead of
intel_pebs_constraints().

We can safely do that because the intel_shared_regs_constraints() function
only returns empty constraint if its rejecting the event, otherwise it
returns NULL such that we continue calling intel_pebs_constraints() and
x86_get_event_constraint().

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1427467105-9260-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:07:42 +02:00
Boris Ostrovsky
3f85483bd8 x86/cpu: Factor out common CPU initialization code, fix 32-bit Xen PV guests
Some of x86 bare-metal and Xen CPU initialization code is common
between the two and therefore can be factored out to avoid code
duplication.

As a side effect, doing so will also extend the fix provided by
commit a7fcf28d43 ("x86/asm/entry: Replace this_cpu_sp0() with
current_top_of_stack() to x86_32") to 32-bit Xen PV guests.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: konrad.wilk@oracle.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1427897534-5086-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 12:06:41 +02:00
Denys Vlasenko
0784b36448 x86/asm/entry/64: Fold the 'test_in_nmi' macro into its only user
No code changes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427899858-7165-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 12:00:10 +02:00
Steffen Liebergeld
f59df35fc2 kgdb/x86: Fix reporting of 'si' in kgdb on x86_64
This patch fixes an error in kgdb for x86_64 which would report
the value of dx when asked to give the value of si.

Signed-off-by: Steffen Liebergeld <steffen.liebergeld@kernkonzept.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 11:32:16 +02:00
Andy Lutomirski
7ea2416909 x86/asm/entry/64: Disable opportunistic SYSRET if regs->flags has TF set
When I wrote the opportunistic SYSRET code, I missed an important difference
between SYSRET and IRET.

Both instructions are capable of setting EFLAGS.TF, but they behave differently
when doing so:

 - IRET will not issue a #DB trap after execution when it sets TF.
   This is critical -- otherwise you'd never be able to make forward progress when
   returning to userspace.

 - SYSRET, on the other hand, will trap with #DB immediately after
   returning to CPL3, and the next instruction will never execute.

This breaks anything that opportunistically SYSRETs to a user
context with TF set.  For example, running this code with TF set
and a SIGTRAP handler loaded never gets past 'post_nop':

	extern unsigned char post_nop[];
	asm volatile ("pushfq\n\t"
		      "popq %%r11\n\t"
		      "nop\n\t"
		      "post_nop:"
		      : : "c" (post_nop) : "r11");

In my defense, I can't find this documented in the AMD or Intel manual.

Fix it by using IRET to restore TF.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2a23c6b8a9 ("x86_64, entry: Use sysret to return to userspace when possible")
Link: http://lkml.kernel.org/r/9472f1ca4c19a38ecda45bba9c91b7168135fcfa.1427923514.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 11:09:54 +02:00
Christoph Hellwig
ec776ef6bb x86/mm: Add support for the non-standard protected e820 type
Various recent BIOSes support NVDIMMs or ADR using a
non-standard e820 memory type, and Intel supplied reference
Linux code using this type to various vendors.

Wire this e820 table type up to export platform devices for the
pmem driver so that we can use it in Linux.

Based on earlier work from:

   Dave Jiang <dave.jiang@intel.com>
   Dan Williams <dan.j.williams@intel.com>

Includes fixes for NUMA regions from Boaz Harrosh.

Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-nvdimm@ml01.01.org
Link: http://lkml.kernel.org/r/1427872339-6688-2-git-send-email-hch@lst.de
[ Minor cleanups. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01 17:02:43 +02:00
Stefan Lippers-Hollmann
80313b3078 x86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirk
The ASRock Q1900DC-ITX mainboard (Baytrail-D) hangs randomly in
both BIOS and UEFI mode while rebooting unless reboot=pci is
used. Add a quirk to reboot via the pci method.

The problem is very intermittent and hard to debug, it might succeed
rebooting just fine 40 times in a row - but fails half a dozen times
the next day. It seems to be slightly less common in BIOS CSM mode
than native UEFI (with the CSM disabled), but it does happen in either
mode. Since I've started testing this patch in late january, rebooting
has been 100% reliable.

Most of the time it already hangs during POST, but occasionally it
might even make it through the bootloader and the kernel might even
start booting, but then hangs before the mode switch. The same symptoms
occur with grub-efi, gummiboot and grub-pc, just as well as (at least)
kernel 3.16-3.19 and 4.0-rc6 (I haven't tried older kernels than 3.16).
Upgrading to the most current mainboard firmware of the ASRock
Q1900DC-ITX, version 1.20, does not improve the situation.

( Searching the web seems to suggest that other Bay Trail-D mainboards
  might be affected as well. )
--
Signed-off-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: <stable@vger.kernel.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/20150330224427.0fb58e42@mir
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01 14:08:09 +02:00
Denys Vlasenko
a6de5a21fb x86/asm/entry/64: Use local label to skip around sycall dispatch
Logically, we just want to jump around the following instruction
and its prologue/epilogue:

  call *sys_call_table(,%rax,8)

if the syscall number is too big - we do not specifically target
the "int_ret_from_sys_call" label.

Use a local, numerical label for this jump, for more clarity.

This also makes the code smaller:

 -ffffffff8187756b:      0f 87 0f 00 00 00       ja     ffffffff81877580 <int_ret_from_sys_call>
 +ffffffff8187756b:      77 0f                   ja     ffffffff8187757c <int_ret_from_sys_call>

because jumps to global labels are never translated to short jump
instructions by GAS.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-9-git-send-email-dvlasenk@redhat.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01 13:17:39 +02:00
Denys Vlasenko
a734b4a23e x86/asm: Replace "MOVQ $imm, %reg" with MOVL
There is no reason to use MOVQ to load a non-negative immediate
constant value into a 64-bit register. MOVL does the same, since
the upper 32 bits are zero-extended by the CPU.

This makes the code a bit smaller, while leaving functionality
unchanged.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-8-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01 13:17:39 +02:00
Denys Vlasenko
36acef2510 x86/asm/entry/64: Simplify looping around preempt_schedule_irq()
At the 'exit_intr' label we test whether interrupt/exception was in
kernel. If it did, we jump to the preemption check. If preemption
does happen (IOW if we call preempt_schedule_irq()), we go back to
'exit_intr'.

But it's pointless, we already know that the test succeeded last
time, preemption doesn't change the fact that interrupt/exception
was in the kernel.

We can go back directly to checking PER_CPU_VAR(__preempt_count) instead.

This makes the 'exit_intr' label unused, drop it.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-5-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01 13:17:39 +02:00
Denys Vlasenko
32a04077fe x86/asm/entry/64: Remove redundant DISABLE_INTERRUPTS()
At this location, we already have interrupts off, always.
To be more specific, we already disabled them here:

    ret_from_intr:
	    DISABLE_INTERRUPTS(CLBR_NONE)

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-4-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01 13:17:38 +02:00
Denys Vlasenko
6ba71b7617 x86/asm/entry/64: Simplify retint_kernel label usage, make retint_restore_args label local
Get rid of #define obfuscation of retint_kernel in
CONFIG_PREEMPT case by defining retint_kernel label always, not
only for CONFIG_PREEMPT.

Strip retint_kernel of .global-ness (ENTRY macro) - it has no
users outside of this file.

This looks like cosmetics, but it is not:
"je LABEL" can be optimized to short jump by assember
only if LABEL is not global, for global labels jump is always
a near one with relocation.

Convert retint_restore_args to a local numeric label, making it
clearer that it is not used elsewhere in the file.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01 13:17:38 +02:00
Denys Vlasenko
4416c5a6da x86/asm/entry/64: Do not TRACE_IRQS fast SYSRET64 path
SYSRET code path has a small irq-off block.
On this code path, TRACE_IRQS_ON can't be called right before
interrupts are enabled for real, we can't clobber registers
there. So current code does it earlier, in a safe place.

But with this, TRACE_IRQS_OFF/ON frames just two fast
instructions, which is ridiculous: now most of irq-off block is
_outside_ of the framing.

Do the same thing that we do on SYSCALL entry: do not track this
irq-off block, it is very small to ever cause noticeable irq
latency.

Be careful: make sure that "jnz int_ret_from_sys_call_irqs_off"
now does invoke TRACE_IRQS_OFF - move
int_ret_from_sys_call_irqs_off label before TRACE_IRQS_OFF.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01 13:17:38 +02:00
Bandan Das
4399c03c67 x86/apic: Remove verify_local_APIC()
__verify_local_APIC() is detritus from the early APIC days.
Its return value isn't used anywhere and the information it
prints when debug is enabled is already part of APIC
initialization messages printed to syslog. Off with it!

Signed-off-by: Bandan Das <bsd@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/jpgy4mcsxsq.fsf@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01 10:47:57 +02:00
Ingo Molnar
55474c48b4 x86/asm/entry: Remove user_mode_ignore_vm86()
user_mode_ignore_vm86() can be used instead of user_mode(), in
places where we have already done a v8086_mode() security
check of ptregs.

But doing this check in the wrong place would be a bug that
could result in security problems, and also the naming still
isn't very clear.

Furthermore, it only affects 32-bit kernels, while most
development happens on 64-bit kernels.

If we replace them with user_mode() checks then the cost is only
a very minor increase in various slowpaths:

   text             data   bss     dec              hex    filename
   10573391         703562 1753042 13029995         c6d26b vmlinux.o.before
   10573423         703562 1753042 13030027         c6d28b vmlinux.o.after

So lets get rid of this distinction once and for all.

Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150329090233.GA1963@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-31 11:45:19 +02:00
Hector Marco-Gisbert
4e26d11f52 x86/mm: Improve AMD Bulldozer ASLR workaround
The ASLR implementation needs to special-case AMD F15h processors by
clearing out bits [14:12] of the virtual address in order to avoid I$
cross invalidations and thus performance penalty for certain workloads.
For details, see:

  dfb09f9b7a ("x86, amd: Avoid cache aliasing penalties on AMD family 15h")

This special case reduces the mmapped file's entropy by 3 bits.

The following output is the run on an AMD Opteron 62xx class CPU
processor under x86_64 Linux 4.0.0:

  $ for i in `seq 1 10`; do cat /proc/self/maps | grep "r-xp.*libc" ; done
  b7588000-b7736000 r-xp 00000000 00:01 4924       /lib/i386-linux-gnu/libc.so.6
  b7570000-b771e000 r-xp 00000000 00:01 4924       /lib/i386-linux-gnu/libc.so.6
  b75d0000-b777e000 r-xp 00000000 00:01 4924       /lib/i386-linux-gnu/libc.so.6
  b75b0000-b775e000 r-xp 00000000 00:01 4924       /lib/i386-linux-gnu/libc.so.6
  b7578000-b7726000 r-xp 00000000 00:01 4924       /lib/i386-linux-gnu/libc.so.6
  ...

Bits [12:14] are always 0, i.e. the address always ends in 0x8000 or
0x0000.

32-bit systems, as in the example above, are especially sensitive
to this issue because 32-bit randomness for VA space is 8 bits (see
mmap_rnd()). With the Bulldozer special case, this diminishes to only 32
different slots of mmap virtual addresses.

This patch randomizes per boot the three affected bits rather than
setting them to zero. Since all the shared pages have the same value
at bits [12..14], there is no cache aliasing problems. This value gets
generated during system boot and it is thus not known to a potential
remote attacker. Therefore, the impact from the Bulldozer workaround
gets diminished and ASLR randomness increased.

More details at:

  http://hmarco.org/bugs/AMD-Bulldozer-linux-ASLR-weakness-reducing-mmaped-files-by-eight.html

Original white paper by AMD dealing with the issue:

  http://developer.amd.com/wordpress/media/2012/10/SharedL1InstructionCacheonAMD15hCPU.pdf

Mentored-by: Ismael Ripoll <iripoll@disca.upv.es>
Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan-Simon <dl9pf@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-fsdevel@vger.kernel.org
Link: http://lkml.kernel.org/r/1427456301-3764-1-git-send-email-hecmargi@upv.es
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-31 10:01:17 +02:00
Michael S. Tsirkin
46423ffaf4 x86/microcode/amd: Drop the pci_ids.h dependency
This file doesn't use any macros from pci_ids.h anymore, drop the include.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427635734-24786-80-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-31 09:54:32 +02:00
Denys Vlasenko
a3675b32aa x86/asm/entry/64: Do not GET_THREAD_INFO() too early
At exit_intr, we GET_THREAD_INFO(%rcx) and then jump to
retint_kernel if saved CS was from kernel. But the code at
retint_kernel doesn't need %rcx.

Move GET_THREAD_INFO(%rcx) down, after CS check and branch.

While at it, remove "has a correct top of stack" comment.
After recent changes which eliminated FIXUP_TOP_OF_STACK,
we always have a correct pt_regs layout.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427738975-7391-5-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-31 09:31:11 +02:00
Denys Vlasenko
627276cb55 x86/asm/entry/64: Move retint_kernel code block closer to its user
The "retint_kernel" code block is misplaced. Since its logical
continuation is "retint_restore_args", it is more natural to
place it above that label. This also makes two jumps "short".

This change only moves code block around, without changing
logic.

This enables the next simplification: making
"retint_restore_args" label a local numeric one.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427738975-7391-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-31 09:31:11 +02:00
Ingo Molnar
c5e77f5216 Linux 4.0-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVGHwjAAoJEHm+PkMAQRiG8rcIAJ6cEJ6mbqLpyz5XrGf4yNp0
 +wG/QlEpT8rgrxe9wSjB3lfW3kR2Pe69b9fVVCdiklygdkmva5vfmDrVGGzYfe3M
 QrFSSlMVBplvh6IiM/L1mVMtr3DSmCO23YZZ9R5b7FoEYatNHRpNWBCBpuXpd4aD
 sLuIvO3L/S7LqeOAFkkYWv6AuL9umicmjR8u+nsmCSRJom7At/aJ6R66WIp9vxho
 Rn7r6wcUk6B2Q/gYNjdSE8SIwdyKhuBGyvqQ9U9s6Btg9DQfM/b0vG5kw9hqeAq/
 9445jqVDP1whA2vz6GjnvltidxrqRvuDPBwzOnFmY5U+KZz4lS3x2mnWAAJ3xWs=
 =TqVJ
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc6' into timers/core, before applying new patches

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-31 09:08:13 +02:00
Denys Vlasenko
27be87c5d5 x86/asm/entry/64: Add missing CFI annotation
This is a missing bit of the recent MOV-to-PUSH conversion.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427452582-21624-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 12:27:57 +01:00
Denys Vlasenko
487d1edb9a x86/asm/entry/64: Fix comment about SYSENTER MSRs
The comment is ancient, it dates to the time when only AMD's
x86_64 implementation existed. AMD wasn't (and still isn't)
supporting SYSENTER, so these writes were "just in case" back
then.

This has changed: Intel's x86_64 appeared, and Intel does
support SYSENTER in long mode. "Some future 64-bit CPU" is here
already.

The code may appear "buggy" for AMD as it stands, since
MSR_IA32_SYSENTER_EIP is only 32-bit for AMD CPUs. Writing a
kernel function's address to it would drop high bits. Subsequent
use of this MSR for branch via SYSENTER seem to allow user to
transition to CPL0 while executing his code. Scary, eh?

Explain why that is not a bug: because SYSENTER insn would not
work on AMD CPU.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427453956-21931-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 12:23:16 +01:00
Peter Zijlstra
34f439278c perf: Add per event clockid support
While thinking on the whole clock discussion it occurred to me we have
two distinct uses of time:

 1) the tracking of event/ctx/cgroup enabled/running/stopped times
    which includes the self-monitoring support in struct
    perf_event_mmap_page.

 2) the actual timestamps visible in the data records.

And we've been conflating them.

The first is all about tracking time deltas, nobody should really care
in what time base that happens, its all relative information, as long
as its internally consistent it works.

The second however is what people are worried about when having to
merge their data with external sources. And here we have the
discussion on MONOTONIC vs MONOTONIC_RAW etc..

Where MONOTONIC is good for correlating between machines (static
offset), MONOTNIC_RAW is required for correlating against a fixed rate
hardware clock.

This means configurability; now 1) makes that hard because it needs to
be internally consistent across groups of unrelated events; which is
why we had to have a global perf_clock().

However, for 2) it doesn't really matter, perf itself doesn't care
what it writes into the buffer.

The below patch makes the distinction between these two cases by
adding perf_event_clock() which is used for the second case. It
further makes this configurable on a per-event basis, but adds a few
sanity checks such that we cannot combine events with different clocks
in confusing ways.

And since we then have per-event configurability we might as well
retain the 'legacy' behaviour as a default.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 10:13:22 +01:00
Ingo Molnar
b381e63b48 Merge branch 'perf/core' into perf/timer, before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 10:10:47 +01:00
Ingo Molnar
4e6d7c2aa9 Merge branch 'timers/core' into perf/timer, to apply dependent patch
An upcoming patch will depend on tai_ns() and NMI-safe ktime_get_raw_fast(),
so merge timers/core here in a separate topic branch until it's all cooked
and timers/core is merged upstream.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 10:09:21 +01:00
Ingo Molnar
4bfe186dbe Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

  - Documentation updates.

  - Changes permitting use of call_rcu() and friends very early in
    boot, for example, before rcu_init() is invoked.

  - Miscellaneous fixes.

  - Add in-kernel API to enable and disable expediting of normal RCU
    grace periods.

  - Improve RCU's handling of (hotplug-) outgoing CPUs.

    Note: ARM support is lagging a bit here, and these improved
    diagnostics might generate (harmless) splats.

  - NO_HZ_FULL_SYSIDLE fixes.

  - Tiny RCU updates to make it more tiny.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 10:04:06 +01:00
Denys Vlasenko
47eb582e70 x86/asm/entry/64: Use smaller instructions
The $AUDIT_ARCH_X86_64 parameter to syscall_trace_enter_phase1/2
is a 32-bit constant, loading it with 32-bit MOV produces 5-byte
insn instead of 10-byte MOVABS one.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427303896-24023-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:57:06 +01:00
Denys Vlasenko
146b2b097d x86/asm/entry/64: Use better label name, fix comments
A named label "ret_from_sys_call" implies that there are jumps
to this location from elsewhere, as happens with many other
labels in this file.

But this label is used only by the JMP a few insns above.
To make that obvious, use local numeric label instead.

Improve comments:

"and return regs->ax" isn't too informative. We always return
regs->ax.

The comment suggesting that it'd be cool to use rip relative
addressing for CALL is deleted. It's unclear why that would be
an improvement - we aren't striving to use position-independent
code here. PIC code here would require something like LEA
sys_call_table(%rip),reg + CALL *(reg,%rax*8)...

"iret frame is also incomplete" is no longer true, fix that too.

Also fix typo in comment.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427303896-24023-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:57:05 +01:00
David Ahern
9332d250b4 perf/x86: Remove redundant calls to perf_pmu_{dis|en}able()
perf_pmu_disable() is called before pmu->add() and perf_pmu_enable() is called
afterwards. No need to call these inside of x86_pmu_add() as well.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424281543-67335-1-git-send-email-dsahern@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:49:44 +01:00
Ingo Molnar
936c663aed Merge branch 'perf/x86' into perf/core, because it's ready
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:46:19 +01:00
Ingo Molnar
072e5a1cfa Merge branch 'perf/urgent' into perf/core, to pick up fixes and to refresh the tree
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:46:03 +01:00
Peter Zijlstra
876e78818d time: Rename timekeeper::tkr to timekeeper::tkr_mono
In preparation of adding another tkr field, rename this one to
tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base,
since the structure is not specific to CLOCK_MONOTONIC and the mono
name got added to the tk_read_base instance.

Lots of trivial churn.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:45:06 +01:00
Andi Kleen
294fe0f52a perf/x86/intel: Add INST_RETIRED.ALL workarounds
On Broadwell INST_RETIRED.ALL cannot be used with any period
that doesn't have the lowest 6 bits cleared. And the period
should not be smaller than 128.

This is erratum BDM11 and BDM55:

  http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf

BDM11: When using a period < 100; we may get incorrect PEBS/PMI
interrupts and/or an invalid counter state.
BDM55: When bit0-5 of the period are !0 we may get redundant PEBS
records on overflow.

Add a new callback to enforce this, and set it for Broadwell.

How does this handle the case when an app requests a specific
period with some of the bottom bits set?

Short answer:

Any useful instruction sampling period needs to be 4-6 orders
of magnitude larger than 128, as an PMI every 128 instructions
would instantly overwhelm the system and be throttled.
So the +-64 error from this is really small compared to the
period, much smaller than normal system jitter.

Long answer (by Peterz):

IFF we guarantee perf_event_attr::sample_period >= 128.

Suppose we start out with sample_period=192; then we'll set period_left
to 192, we'll end up with left = 128 (we truncate the lower bits). We
get an interrupt, find that period_left = 64 (>0 so we return 0 and
don't get an overflow handler), up that to 128. Then we trigger again,
at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
an overflow). We increment with sample_period so we get left = 128. We
fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
overflow). And on and on.

So while the individual interrupts are 'wrong' we get then with
interval=256,128 in exactly the right ratio to average out at 192. And
this works for everything >=128.

So the num_samples*fixed_period thing is still entirely correct +- 127,
which is good enough I'd say, as you already have that error anyhow.

So no need to 'fix' the tools, al we need to do is refuse to create
INST_RETIRED:ALL events with sample_period < 128.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Updated comments and changelog a bit. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:14:03 +01:00
Andi Kleen
91f1b70582 perf/x86/intel: Add Broadwell core support
Add Broadwell support for Broadwell to perf.

The basic support is very similar to Haswell. We use the new cache
event list added for Haswell earlier. The only differences
are a few bits related to remote nodes. To avoid an extra,
mostly identical, table these are patched up in the initialization code.

The constraint list has one new event that needs to be handled over Haswell.

Includes code and testing from Kan Liang.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:14:02 +01:00
Andi Kleen
0f1b5ca240 perf/x86/intel: Add new cache events table for Haswell
Haswell offcore events are quite different from Sandy Bridge.
Add a new table to handle Haswell properly.

Note that the offcore bits listed in the SDM are not quite correct
(this is currently being fixed). An uptodate list of bits is
in the patch.

The basic setup is similar to Sandy Bridge. The prefetch columns
have been removed, as prefetch counting is not very reliable
on Haswell. One L1 event that is not in the event list anymore
has been also removed.

- data reads do not include code reads (comparable to earlier Sandy Bridge tables)
- data counts include speculative execution (except L1 write, dtlb, bpu)
- remote node access includes both remote memory, remote cache, remote mmio.
- prefetches are not included in the counts for consistency
  (different from Sandy Bridge, which includes prefetches in the remote node)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Removed the HSM30 comments; we don't have them for SNB/IVB either. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:14:01 +01:00
Catalin Marinas
828aef376d ACPI / processor: Introduce phys_cpuid_t for CPU hardware ID
CPU hardware ID (phys_id) is defined as u32 in structure acpi_processor,
but phys_id is used as int in acpi processor driver, so it will lead to
some inconsistence for the drivers.

Furthermore, to cater for ACPI arch ports that implement 64 bits CPU
ids a generic CPU physical id type is required.

So introduce typedef u32 phys_cpuid_t in a common file, and introduce
a macro PHYS_CPUID_INVALID as (phys_cpuid_t)(-1) if it's not defined
by other archs, this will solve the inconsistence in acpi processor driver,
and will prepare for the ACPI on ARM64 for the 64 bit CPU hardware ID
in the following patch.

CC: Rafael J Wysocki <rjw@rjwysocki.net>
Suggested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[hj: reworked cpu physid map return codes]
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-03-26 15:12:51 +00:00
Ingo Molnar
06ab9c1ba6 Merge branch 'x86/urgent' into x86/asm, to resolve conflict
Conflicts:
	arch/x86/kernel/entry_64.S

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-25 13:19:43 +01:00
Andy Lutomirski
b3494a4ab2 x86/asm/entry: Check for syscall exit work with IRQs disabled
We currently have a race: if we're preempted during syscall
exit, we can fail to process syscall return work that is queued
up while we're preempted in ret_from_sys_call after checking
ti.flags.

Fix it by disabling interrupts before checking ti.flags.

Reported-by: Stefan Seyfried <stefan.seyfried@googlemail.com>
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Fixes: 96b6352c12 ("x86_64, entry: Remove the syscall exit audit")
Link: http://lkml.kernel.org/r/189320d42b4d671df78c10555976bb10af1ffc75.1427137498.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 21:08:28 +01:00
Ingo Molnar
dca5b52ad7 x86/asm/entry/64: Rename THREAD_INFO() to ASM_THREAD_INFO()
The THREAD_INFO() macro has a somewhat confusingly generic name,
defined in a generic .h C header file. It also does not make it
clear that it constructs a memory operand for use in assembly
code.

Rename it to ASM_THREAD_INFO() to make it all glaringly
obvious on first glance.

Acked-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/20150324184442.GC14760@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 20:57:31 +01:00
Ingo Molnar
f9d71854b4 x86/asm/entry/64: Merge the field offset into the THREAD_INFO() macro
Before:

   TI_sysenter_return+THREAD_INFO(%rsp,3*8),%r10d

After:

   movl    THREAD_INFO(TI_sysenter_return, %rsp, 3*8), %r10d

to turn it into a clear thread_info accessor.

No code changed:

 md5:
   fb4cb2b3ce05d89940ca304efc8ff183  ia32entry.o.before.asm
   fb4cb2b3ce05d89940ca304efc8ff183  ia32entry.o.after.asm

   e39f2958a5d1300158e276e4f7663263  entry_64.o.before.asm
   e39f2958a5d1300158e276e4f7663263  entry_64.o.after.asm

Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/20150324184411.GB14760@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 20:57:31 +01:00
Ingo Molnar
d56fe4bf5f x86/asm/entry/64: Always set up SYSENTER MSRs
On CONFIG_IA32_EMULATION=y kernels we set up
MSR_IA32_SYSENTER_CS/ESP/EIP, but on !CONFIG_IA32_EMULATION
kernels we leave them unchanged.

Clear them to make sure the instruction is disabled properly.

SYSCALL is set up properly in both cases.

Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 20:57:25 +01:00
Denys Vlasenko
65c2377486 x86/asm/entry/64: Get rid of int_ret_from_sys_call_fixup
With the FIXUP_TOP_OF_STACK macro removed, this intermediate jump
is unnecessary.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-5-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 19:42:38 +01:00
Denys Vlasenko
a71ffdd780 x86/asm/entry/64: Get rid of the FIXUP_TOP_OF_STACK/RESTORE_TOP_OF_STACK macros
The FIXUP_TOP_OF_STACK macro is only necessary because we don't save %r11
to pt_regs->r11 on SYSCALL64 fast path, but we want ptrace to see it populated.

Bite the bullet, add a single additional PUSH instruction, and remove
the FIXUP_TOP_OF_STACK macro.

The RESTORE_TOP_OF_STACK macro is already a nop. Remove it too.

On SandyBridge CPU, it does not get slower:
measured 54.22 ns per getpid syscall before and after last two
changes on defconfig kernel.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-4-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 19:42:38 +01:00
Denys Vlasenko
9ed8e7d860 x86/asm/entry/64: Use PUSH instructions to build pt_regs on stack
With this change, on SYSCALL64 code path we are now populating
pt_regs->cs, pt_regs->ss and pt_regs->rcx unconditionally and
therefore don't need to do that in FIXUP_TOP_OF_STACK.

We lose a number of large instructions there:

    text    data     bss     dec     hex filename
   13298       0       0   13298    33f2 entry_64_before.o
   12978       0       0   12978    32b2 entry_64.o

What's more important, we convert two "MOVQ $imm,off(%rsp)" to
"PUSH $imm" (the ones which fill pt_regs->cs,ss).

Before this patch, placing them on fast path was slowing it down
by two cycles: this form of MOV is very large, 12 bytes, and
this probably reduces decode bandwidth to one instruction per cycle
when CPU sees them.

Therefore they were living in FIXUP_TOP_OF_STACK instead (away
from fast path).

"PUSH $imm" is a small 2-byte instruction. Moving it to fast path does
not slow it down in my measurements.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 19:42:38 +01:00
Denys Vlasenko
ef593260f0 x86/asm/entry: Get rid of KERNEL_STACK_OFFSET
PER_CPU_VAR(kernel_stack) was set up in a way where it points
five stack slots below the top of stack.

Presumably, it was done to avoid one "sub $5*8,%rsp"
in syscall/sysenter code paths, where iret frame needs to be
created by hand.

Ironically, none of them benefits from this optimization,
since all of them need to allocate additional data on stack
(struct pt_regs), so they still have to perform subtraction.

This patch eliminates KERNEL_STACK_OFFSET.

PER_CPU_VAR(kernel_stack) now points directly to top of stack.
pt_regs allocations are adjusted to allocate iret frame as well.
Hopefully we can merge it later with 32-bit specific
PER_CPU_VAR(cpu_current_top_of_stack) variable...

Net result in generated code is that constants in several insns
are changed.

This change is necessary for changing struct pt_regs creation
in SYSCALL64 code path from MOV to PUSH instructions.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 19:42:38 +01:00
Denys Vlasenko
b3fe8ba320 x86/asm/entry/64: Change the THREAD_INFO() definition to not depend on KERNEL_STACK_OFFSET
This changes the THREAD_INFO() definition and all its callsites
so that they do not count stack position from
(top of stack - KERNEL_STACK_OFFSET), but from top of stack.

Semi-mysterious expressions THREAD_INFO(%rsp,RIP) - "why RIP??"
are now replaced by more logical THREAD_INFO(%rsp,SIZEOF_PTREGS)
- "calculate thread_info's address using information that
rsp is SIZEOF_PTREGS bytes below top of stack".

While at it, replace "(off)-THREAD_SIZE(reg)" with equivalent
"((off)-THREAD_SIZE)(reg)". The form without parentheses
falsely looks like we invoke THREAD_SIZE() macro.

Improve comment atop THREAD_INFO macro definition.

This patch does not change generated code (verified by objdump).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 19:42:37 +01:00
Aravind Gopalakrishnan
43eaa2a1ad x86/mce: Define mce_severity function pointer
Rename mce_severity() to mce_severity_intel() and assign the
mce_severity function pointer to mce_severity_amd() during init on AMD.
This way, we can avoid a test to call mce_severity_amd every time we get
into mce_severity(). And it's cleaner to do it this way.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Suggested-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1427125373-2918-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-24 12:14:15 +01:00
Aravind Gopalakrishnan
bf80bbd7dc x86/mce: Add an AMD severities-grading function
Add a severities function that caters to AMD processors. This allows us
to do some vendor-specific work within the function if necessary.

Also, introduce a vendor flag bitfield for vendor-specific settings. The
severities code uses this to define error scope based on the prescence
of the flags field.

This is based off of work by Boris Petkov.

Testing details:
Fam10h, Model 9h (Greyhound)
Fam15h: Models 0h-0fh (Orochi), 30h-3fh (Kaveri) and 60h-6fh (Carrizo),
Fam16h Model 00h-0fh (Kabini)

Boris:
Intel SNB
AMD K8 (JH-E0)

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/1427125373-2918-2-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Fixup build, clean up comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-24 12:13:34 +01:00
Denys Vlasenko
a76c7f4604 x86/asm/entry/64: Fold syscall32_cpu_init() into its sole user
Having syscall32/sysenter32 initialization in a separate tiny
function, called from within a function that is already syscall
init specific, serves no real purpose.

Its existense also caused an unintended effect of having
wrmsrl(MSR_CSTAR) performed twice: once we set it to a dummy
function returning -ENOSYS, and immediately after
(if CONFIG_IA32_EMULATION), we set it to point to the proper
syscall32 entry point, ia32_cstar_target.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 08:20:51 +01:00
Marcelo Tosatti
0a4e6be9ca x86: kvm: Revert "remove sched notifier for cross-cpu migrations"
The following point:

    2. per-CPU pvclock time info is updated if the
       underlying CPU changes.

Is not true anymore since "KVM: x86: update pvclock area conditionally,
on cpu migration".

Add task migration notification back.

Problem noticed by Andy Lutomirski.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
CC: stable@kernel.org # 3.11+
2015-03-23 20:22:48 -03:00
Greg Kroah-Hartman
caa445d808 Merge 4.0-rc5 into tty-next
We want the tty/serial fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-23 21:45:24 +01:00
Denys Vlasenko
34061f134f x86/asm/entry/64: Fix incorrect comment
The recent old_rsp -> rsp_scratch rename also changed this
comment, but in this case "old_rsp" was not referring to
PER_CPU(old_rsp).

Fix this.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427115839-6397-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 14:28:54 +01:00
Andy Lutomirski
d74ef1118a x86/asm/entry: Replace some open-coded VM86 checks with v8086_mode() checks
This allows us to remove some unnecessary ifdefs.  There should
be no change to the generated code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f7e00f0d668e253abf0bd8bf36491ac47bd761ff.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 11:14:40 +01:00
Andy Lutomirski
f39b6f0ef8 x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()'
user_mode_vm() and user_mode() are now the same.  Change all callers
of user_mode_vm() to user_mode().

The next patch will remove the definition of user_mode_vm.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/43b1f57f3df70df5a08b0925897c660725015554.1426728647.git.luto@kernel.org
[ Merged to a more recent kernel. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 11:14:17 +01:00
Andy Lutomirski
ae60f0710a x86/asm/entry: Use user_mode_ignore_vm86() where appropriate
A few of the user_mode() checks in traps.c are immediately after
explicit checks for vm86 mode.  Change them to user_mode_ignore_vm86().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/0b324d5b75c3402be07f8d3c6245ed7f4995029e.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 11:13:46 +01:00
Andy Lutomirski
383f3af3f8 x86/asm/entry, perf: Explicitly optimize vm86 handling in code_segment_base()
There's no point in checking the VM bit on 64-bit, and, since
we're explicitly checking it, we can use user_mode_ignore_vm86()
after the check.

While we're at it, rearrange the #ifdef slightly to make the code
flow a bit clearer.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/dc1457a734feccd03a19bb3538a7648582f57cdd.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 11:13:41 +01:00
Ingo Molnar
e4518ab90f Linux 4.0-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVD1VGAAoJEHm+PkMAQRiG7yoH/juKOQ1zbxi5M+mleDEEJtA0
 RxQSojqEMWIKrWi8PNZxjENn1OZB6XOLIXOhlyAZBrmgsjO34p1DyXlZMznr/R8W
 kQ2Xxs061hRtB3OuruMIqOApUrjuqsaCwgbgUS1qWmqZcoyZN4oELyZMP6OOlqv5
 UUBZm8MfyXGyxrCcg39mjct3VEOhiuEcvL6SUxOC380CdSVAnyqHFPcz0JVqMUn9
 9RUBs0T9cMdhb0mZ2bfXzt6AKArj63G2nXOum+VzFcvspSm2U+MPIDCuoE+ZbTPS
 jqIAgG0rj1ezRyb5oeJrvlU0Yy3u/cXoMPs9+kORvpladooYNLti8ovh6qllm0I=
 =d/ye
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc5' into x86/asm, to resolve conflicts

Conflicts:
	arch/x86/kernel/entry_64.S

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 11:13:15 +01:00
Peter Zijlstra
50f16a8bf9 perf: Remove type specific target pointers
The only reason CQM had to use a hard-coded pmu type was so it could use
cqm_target in hw_perf_event.

Do away with the {tp,bp,cqm}_target pointers and provide a non type
specific one.

This allows us to do away with that silly pmu type as well.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Vince Weaver <vince@deater.net>
Cc: acme@kernel.org
Cc: acme@redhat.com
Cc: hpa@zytor.com
Cc: jolsa@redhat.com
Cc: kanaka.d.juvva@intel.com
Cc: matt.fleming@intel.com
Cc: tglx@linutronix.de
Cc: torvalds@linux-foundation.org
Cc: vikas.shivappa@linux.intel.com
Link: http://lkml.kernel.org/r/20150305211019.GU21418@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:58:04 +01:00
Matt Fleming
4e16ed9941 perf/x86/intel: Fix Makefile to actually build the cqm driver
Someone fat fingered a merge conflict and lost the Makefile hunk.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <acme@redhat.com>
Cc: <hpa@zytor.com>
Cc: <jolsa@redhat.com>
Cc: <kanaka.d.juvva@intel.com>
Cc: <tglx@linutronix.de>
Cc: <torvalds@linux-foundation.org>
Cc: <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1424976420.15321.35.camel@mfleming-mobl1.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:58:03 +01:00
Ingo Molnar
e1b63dec2d Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying new patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:50:29 +01:00
Sudeep Holla
37dea8c52c x86/cpu/cacheinfo: Fix cache_get_priv_group() for Intel processors
The private pointer provided by the cacheinfo code is used to implement
the AMD L3 cache-specific attributes using a pointer to the northbridge
descriptor. It is needed for performing L3-specific operations and for
that we need a couple of PCI devices and other service information, all
contained in the northbridge descriptor.

This results in failure of cacheinfo setup as shown below as
cache_get_priv_group() returns the uninitialised private attributes which
are not valid for Intel processors.

  ------------[ cut here ]------------
  WARNING: CPU: 3 PID: 1 at fs/sysfs/group.c:102
  internal_create_group+0x151/0x280()
  sysfs: (bin_)attrs not set by subsystem for group: index3/
  Modules linked in:
  CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.0.0-rc3+ #1
  Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
  ...
  Call Trace:
    dump_stack
    warn_slowpath_common
    warn_slowpath_fmt
    internal_create_group
    sysfs_create_groups
    device_add
    cpu_device_create
    ? __kmalloc
    cache_add_dev
    cacheinfo_sysfs_init
    ? container_dev_init
    do_one_initcall
    kernel_init_freeable
    ? rest_init
    kernel_init
    ret_from_fork
    ? rest_init

This patch fixes the issue by checking if the L3 cache indices are
populated correctly (AMD-specific) before initializing the private
attributes.

Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:22:38 +01:00
Borislav Petkov
c9ce871283 x86/mce: Reindent __mcheck_cpu_apply_quirks() properly
Had some strange 3 tabs + 2 chars indentation, probably from me. Fix it.

No code changed:

  # arch/x86/kernel/cpu/mcheck/mce.o:

   text    data     bss     dec     hex filename
  21371    5923     264   27558    6ba6 mce.o.before
  21371    5923     264   27558    6ba6 mce.o.after

md5:
   eb3996c84d15e08ed836f043df2cbb01  mce.o.before.asm
   eb3996c84d15e08ed836f043df2cbb01  mce.o.after.asm

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:16:44 +01:00
Jesse Larrew
f77ac507f8 x86/mce: Use safe MSR accesses for AMD quirk
Certain MSRs are only relevant to a kernel in host mode, and kvm had
chosen not to implement these MSRs at all for guests. If a guest kernel
ever tried to access these MSRs, the result was a general protection
fault.

KVM will be separately patched to return 0 when these MSRs are read,
and this patch ensures that MSR accesses are tolerant of exceptions.

Signed-off-by: Jesse Larrew <jesse.larrew@amd.com>
[ Drop {} braces around loop ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joel Schopp <joel.schopp@amd.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/1426262619-5016-1-git-send-email-jesse.larrew@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:16:43 +01:00
Oleg Nesterov
7fc253e277 x86/fpu: Kill eager_fpu_init_bp()
Now that eager_fpu_init_bp() does setup_init_fpu_buf() only and
nothing else, we can remove it and move this code into its "caller",
eager_fpu_init().

This avoids the confusing games with "static __refdata void (*boot_func)":

init_xstate_buf can be NULL only during boot, so it is safe to call the
__init-annotated setup_init_fpu_buf() function in eager_fpu_init(), we
just need to mark it as __init_refok.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150314151334.GC13029@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:14:00 +01:00
Oleg Nesterov
4bd5bf8c85 x86/fpu: Don't allocate fpu->state for swapper/0
Now that kthreads do not use FPU until they get executed, swapper/0
doesn't need to allocate fpu->state.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150313182716.GB8249@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:13:59 +01:00
Borislav Petkov
b85e67d148 x86/fpu: Rename drop_init_fpu() to fpu_reset_state()
Call it what it does and in accordance with the context where it is
used: we reset the FPU state either because we were unable to restore it
from the one saved in the task or because we simply want to reset it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:13:59 +01:00
Oleg Nesterov
f893959b08 x86/fpu: Don't abuse drop_init_fpu() in flush_thread()
flush_thread() -> drop_init_fpu() is suboptimal and confusing. It does
drop_fpu() or restore_init_xstate() depending on !use_eager_fpu(). But
flush_thread() too checks eagerfpu right after that, and if it is true
then restore_init_xstate() just burns CPU for no reason. We are going to
load init_xstate_buf again after we set used_math()/user_has_fpu(), until
then the FPU state can't survive after switch_to().

Remove it, and change the "if (!use_eager_fpu())" to call drop_fpu().
While at it, clean up the tsk/current usage.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150313173030.GA31217@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:13:58 +01:00
Oleg Nesterov
9cb6ce823b x86/fpu: Use restore_init_xstate() instead of math_state_restore() on kthread exec
Change flush_thread() to do user_fpu_begin() and restore_init_xstate()
instead of math_state_restore().

Note: "TODO: cleanup this horror" is still valid. We do not need
init_fpu() at all, we only need fpu_alloc() and memset(0). But this
needs other changes, in particular user_fpu_begin() should set
used_math().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150311173449.GE5032@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:13:58 +01:00
Ingo Molnar
eda2360ad1 Linux 4.0-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVD1VGAAoJEHm+PkMAQRiG7yoH/juKOQ1zbxi5M+mleDEEJtA0
 RxQSojqEMWIKrWi8PNZxjENn1OZB6XOLIXOhlyAZBrmgsjO34p1DyXlZMznr/R8W
 kQ2Xxs061hRtB3OuruMIqOApUrjuqsaCwgbgUS1qWmqZcoyZN4oELyZMP6OOlqv5
 UUBZm8MfyXGyxrCcg39mjct3VEOhiuEcvL6SUxOC380CdSVAnyqHFPcz0JVqMUn9
 9RUBs0T9cMdhb0mZ2bfXzt6AKArj63G2nXOum+VzFcvspSm2U+MPIDCuoE+ZbTPS
 jqIAgG0rj1ezRyb5oeJrvlU0Yy3u/cXoMPs9+kORvpladooYNLti8ovh6qllm0I=
 =d/ye
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc5' into x86/fpu, to prevent conflicts

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:13:36 +01:00
Andy Lutomirski
c56716af8d x86/asm/entry, perf: Fix incorrect TIF_IA32 check in code_segment_base()
We want to check whether user code is in 32-bit mode, not
whether the task is nominally 32-bit.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/33e5107085ce347a8303560302b15c2cadd62c4c.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:08:21 +01:00
Brian Gerst
1daeaa3151 x86/asm/entry: Fix execve() and sigreturn() syscalls to always return via IRET
Both the execve() and sigreturn() family of syscalls have the
ability to change registers in ways that may not be compatabile
with the syscall path they were called from.

In particular, SYSRET and SYSEXIT can't handle non-default %cs and %ss,
and some bits in eflags.

These syscalls have stubs that are hardcoded to jump to the IRET path,
and not return to the original syscall path.

The following commit:

   76f5df43ca ("Always allocate a complete "struct pt_regs" on the kernel stack")

recently changed this for some 32-bit compat syscalls, but introduced a bug where
execve from a 32-bit program to a 64-bit program would fail because it still returned
via SYSRETL. This caused Wine to fail when built for both 32-bit and 64-bit.

This patch sets TIF_NOTIFY_RESUME for execve() and sigreturn() so
that the IRET path is always taken on exit to userspace.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1426978461-32089-1-git-send-email-brgerst@gmail.com
[ Improved the changelog and comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 08:52:46 +01:00
Ingo Molnar
c38e503804 x86/asm/entry/64: Rename 'old_rsp' to 'rsp_scratch'
Make clear that the usage of PER_CPU(old_rsp) is purely temporary,
by renaming it to 'rsp_scratch'.

Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 16:01:42 +01:00
Ingo Molnar
7fcb3bc361 x86/asm/entry/64: Update comments about stack frames
Tweak a few outdated comments that were obsoleted by recent changes
to syscall entry code:

 - we no longer have a "partial stack frame" on
   entry, ever.

 - explain the syscall entry usage of old_rsp.

Partially based on a (split out of) patch from Denys Vlasenko.

Originally-from: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 16:01:41 +01:00
Ingo Molnar
ac9af4983e x86/asm/entry/64: Remove thread_struct::usersp
Nothing uses thread_struct::usersp anymore, so remove it.

Originally-from: Denys Vlasenko <dvlasenk@redhat.com>
Tested-by: Borislav Petkov <bp@alien8.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 16:01:41 +01:00
Ingo Molnar
9854dd74c3 x86/asm/entry/64: Simplify 'old_rsp' usage
Remove all manipulations of PER_CPU(old_rsp) in C code:

 - it is not used on SYSRET return anymore, and system entries
   are atomic, so updating it from the fork and context switch
   paths is pointless.

 - Tweak a few related comments as well: we no longer have a
   "partial stack frame" on entry, ever.

Based on (split out of) patch from Denys Vlasenko.

Originally-from: Denys Vlasenko <dvlasenk@redhat.com>
Tested-by: Borislav Petkov <bp@alien8.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426599779-8010-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 16:01:41 +01:00
Denys Vlasenko
33db1fd48a x86/asm/entry/64: Enable interrupts *after* we fetch PER_CPU_VAR(old_rsp)
We want to use PER_CPU_VAR(old_rsp) as a simple temporary register,
to shuffle user-space RSP into (and from) when we set up the system
call stack frame. At that point we cannot shuffle values into general
purpose registers, because we have not saved them yet.

To be able to do this shuffling into a memory location, we must be
atomic and must not be preempted while we do the shuffling, otherwise
the 'temporary' register gets overwritten by some other task's
temporary register contents ...

Tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426600344-8254-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 16:01:40 +01:00
Alexander Kuleshov
91d8f0416f x86/boot/64: Remove pointless early_printk() message
earlyprintk is not initialised yet by the setup_early_printk() function
so we can remove it.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1426597205-5142-1-git-send-email-kuleshovmail@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 14:03:04 +01:00
Eugene Shatokhin
c80e5c0c23 kprobes/x86: Return correct length in __copy_instruction()
On x86-64, __copy_instruction() always returns 0 (error) if the
instruction uses %rip-relative addressing. This is because
kernel_insn_init() is called the second time for 'insn' instance
in such cases and sets all its fields to 0.

Because of this, trying to place a kprobe on such instruction
will fail, register_kprobe() will return -EINVAL.

This patch fixes the problem.

Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20150317100918.28349.94654.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 14:00:38 +01:00
Ingo Molnar
8b6c0ab1a1 x86/asm/entry: Document and clean up the enable_sep_cpu() and syscall32_cpu_init() functions
Clean up the flow and document the functions a bit better.

Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:29 +01:00
Denys Vlasenko
d828c71fba x86/asm/entry/32: Document the 32-bit SYSENTER "emergency stack" better
Before the patch, the 'tss_struct::stack' field was not referenced anywhere.

It was used only to set SYSENTER's stack to point after the last byte
of tss_struct, thus the trailing field, stack[64], was used.

But grep would not know it. You can comment it out, compile,
and kernel will even run until an unlucky NMI corrupts
io_bitmap[] (which is also not easily detectable).

This patch changes code so that the purpose and usage of this
field is not mysterious anymore, and can be easily grepped for.

This does change generated code, for a subtle reason:
since tss_struct is ____cacheline_aligned, there happens to be
5 longs of padding at the end. Old code was using the padding
too; new code will strictly use it only for SYSENTER_stack[].

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425912738-559-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:29 +01:00
Andy Lutomirski
d9e05cc5a5 x86/asm/entry: Unify and fix initial thread_struct::sp0 values
x86_32 and x86_64 need slightly different thread_struct::sp0 values, and
x86_32's was incorrect for init.

This never mattered -- the init thread never runs user code, so we never
used thread_struct::sp0 for anything.

Fix it and mostly unify them.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1b810c1d2e797e27bb4a7708c426101161edd1f6.1426009661.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:27 +01:00
Andy Lutomirski
3ee4298f44 x86/asm/entry: Create and use a 'TOP_OF_KERNEL_STACK_PADDING' macro
x86_32, unlike x86_64, pads the top of the kernel stack, because the
hardware stack frame formats are variable in size.

Document this padding and give it a name.

This should make no change whatsoever to the compiled kernel
image. It also doesn't fix any of the current bugs in this area.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/02bf2f54b8dcb76a62a142b6dfe07d4ef7fc582e.1426009661.git.luto@amacapital.net
[ Fixed small details, such as a missed magic constant in entry_32.S pointed out by Denys Vlasenko. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:26 +01:00
Andy Lutomirski
9a036b93a3 x86/signal/64: Remove 'fs' and 'gs' from sigcontext
As far as I can tell, these fields have been set to zero on save
and ignored on restore since Linux was imported into git.
Rename them '__pad1' and '__pad2' to avoid confusion.  This may
also allow us to recycle them some day.

This also adds a comment clarifying the history of those fields.

I'm intentionally avoiding calling either of them '__pad0': the
field formerly known as '__pad0' is now 'ss'.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/844f8490e938780c03355be4c9b69eb4c494bf4e.1426193719.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:26 +01:00
Andy Lutomirski
c6f2062935 x86/signal/64: Fix SS handling for signals delivered to 64-bit programs
The comment in the signal code says that apps can save/restore
other segments on their own.  It's true that apps can *save* SS
on their own, but there's no way for apps to restore it: SYSCALL
effectively resets SS to __USER_DS, so any value that user code
tries to load into SS gets lost on entry to sigreturn.

This recycles two padding bytes in the segment selector area for SS.

While we're at it, we need a second change to make this useful.

If the signal we're delivering is caused by a bad SS value,
saving that value isn't enough.  We need to remove that bad
value from the regs before we try to deliver the signal.  Oddly,
the i386 code already got this right.

I suspect that 64-bit programs that try to run 16-bit code and
use signals will have a lot of trouble without this.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/405594361340a2ec32f8e2b115c142df0e180d8e.1426193719.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:25 +01:00
Ingo Molnar
1524b74540 Merge branch 'nohz/guest' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/nohz
Pull full dynticks support for virt guests from Frederic Weisbecker:

 "Some measurements showed that disabling the tick on the host while the
  guest is running can be interesting on some workloads. Indeed the
  host tick is irrelevant while a vcpu runs, it consumes CPU time and cache
  footprint for no good reasons.

  Full dynticks already works in every context, but RCU prevents it to
  be effective outside userspace, because the CPU needs to take part of
  RCU grace period completion as long as RCU may be used on it, which is
  the case in kernel context.

  However guest is similar to userspace and idle in that we know RCU is
  unused on such context. Therefore a CPU in guest/userspace/idle context
  can let other CPUs report its own RCU quiescent state on its behalf
  and shut down the tick safely, provided it isn't needed for other
  reasons than RCU. This is called RCU extended quiescent state.

  This was already implemented for idle and userspace. This patchset now
  brings it for guest contexts through the following steps:

  - Generalize the context tracking APIs to also track guest state
  - Rename/sanitize a few CPP symbols accordingly
  - Report guest entry/exit to RCU and define this context area as an RCU
    extended quiescent state."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-16 15:49:30 +01:00
Borislav Petkov
69797dafe3 Revert "x86/mm/ASLR: Propagate base load address calculation"
This reverts commit:

  f47233c2d3 ("x86/mm/ASLR: Propagate base load address calculation")

The main reason for the revert is that the new boot flag does not work
at all currently, and in order to make this work, we need non-trivial
changes to the x86 boot code which we didn't manage to get done in
time for merging.

And even if we did, they would've been too risky so instead of
rushing things and break booting 4.1 on boxes left and right, we
will be very strict and conservative and will take our time with
this to fix and test it properly.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/20150316100628.GD22995@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-16 11:18:21 +01:00
Mike Galbraith
f8e617f458 sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs
To fully take advantage of MWAIT, apparently the CLFLUSH instruction needs
another quirk on certain CPUs: proper barriers around it on certain machines.

On a Q6600 SMP system, pipe-test scheduling performance, cross core,
improves significantly:

  3.8.13                   487.2 KHz    1.000
  3.13.0-master            415.5 KHz     .852
  3.13.0-master+           415.2 KHz     .852     + restore mwait_idle
  3.13.0-master++          488.5 KHz    1.002     + restore mwait_idle + IPI fix

Since X86_BUG_CLFLUSH_MONITOR is already a quirk, don't create a separate
quirk for the extra smp_mb()s.

Signed-off-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.10+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1390061684.5566.4.camel@marge.simpson.net
[ Ported to recent kernel, added comments about the quirk. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-16 11:14:22 +01:00
Len Brown
b253149b84 sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:

  69fb3676df ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")

The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.

But two machines reported problems:

 1. Certain Core2-era machines support MWAIT-C1 and HALT only.
    MWAIT-C1 is preferred for optimal power and performance.
    But if they support just C1, cpuidle never loads and
    so they use the boot-time default idle loop forever.

 2. Some laptops will boot-hang if HALT is used,
    but will boot successfully if MWAIT is used.
    This appears to be a hidden assumption in BIOS SMI,
    that is presumably valid on the proprietary OS
    where the BIOS was validated.

       https://bugzilla.kernel.org/show_bug.cgi?id=60770

So here we effectively revert the patch above, restoring
the mwait_idle() loop.  However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.

Maintainer notes:

  For 3.9, simply revert 69fb3676df
  for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
  context For 3.11, 3.12, 3.13, this patch applies cleanly

Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-16 11:14:21 +01:00
Ingo Molnar
56544d29c3 Linux 4.0-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU/NacAAoJEHm+PkMAQRiGdUcIAJU5dHclwd9HRc7LX5iOwYN6
 mN0aCsYjMD8Pjx2VcPCgJvkIoESQO5pkwYpFFWCwILup1bVEidqXfr8EPOdThzdh
 kcaT0FwUvd19K+0jcKVNCX1RjKBtlUfUKONk6sS2x4RrYZpv0Ur8Gh+yXV8iMWtf
 fAusNEYlxQJvEz5+NSKw86EZTr4VVcykKLNvj+/t/JrXEuue7IG8EyoAO/nLmNd2
 V/TUKKttqpE6aUVBiBDmcMQl2SUVAfp5e+KJAHmizdDpSE80nU59UC1uyV8VCYdM
 qwHXgttLhhKr8jBPOkvUxl4aSXW7S0QWO8TrMpNdEOeB3ZB8AKsiIuhe1JrK0ro=
 =Xkue
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc3' into x86/build, to refresh an older tree before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-13 14:21:04 +01:00
Oleg Nesterov
a7c80ebcac x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig()
math_state_restore() assumes it is called with irqs disabled,
but this is not true if the caller is __restore_xstate_sig().

This means that if ia32_fxstate == T and __copy_from_user()
fails, __restore_xstate_sig() returns with irqs disabled too.

This triggers:

  BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41
   dump_stack
   ___might_sleep
   ? _raw_spin_unlock_irqrestore
   __might_sleep
   down_read
   ? _raw_spin_unlock_irqrestore
   print_vma_addr
   signal_fault
   sys32_rt_sigreturn

Change __restore_xstate_sig() to call set_used_math()
unconditionally. This avoids enabling and disabling interrupts
in math_state_restore(). If copy_from_user() fails, we can
simply do fpu_finit() by hand.

[ Note: this is only the first step. math_state_restore() should
        not check used_math(), it should set this flag. While
	init_fpu() should simply die. ]

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150307153844.GB25954@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-13 12:44:28 +01:00
Daniel J Blueman
c8a470cab0 x86/apic/numachip: Fix sibling map with NumaChip
On NumaChip systems, the physical processor ID assignment wasn't
accounting for the number of nodes in AMD multi-module
processors, giving an incorrect sibling map:

  $ cd /sys/devices/system/cpu/cpu29/topology
  $ grep . *
  core_id:5
  core_siblings:00000000,ff000000
  core_siblings_list:24-31
  physical_package_id:3
  thread_siblings:00000000,30000000
  thread_siblings_list:28-29

This fixes it:

  $ cd /sys/devices/system/cpu/cpu29/topology
  $ grep . *
  core_id:5
  core_siblings:00000000,ffff0000
  core_siblings_list:16-31
  physical_package_id:1
  thread_siblings:00000000,30000000
  thread_siblings_list:28-29

Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Steffen Persvold <sp@numascale.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426135950-10110-1-git-send-email-daniel@numascale.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-12 16:58:59 +01:00
Li, Aubrey
7486341a98 x86/platform, acpi: Bypass legacy PIC and PIT in ACPI hardware reduced mode
On a platform in ACPI Hardware-reduced mode, the legacy PIC and
PIT may not be initialized even though they may be present in
silicon. Touching these legacy components causes unexpected
results on the system.

On the Bay Trail-T(ASUS-T100) platform, touching these legacy
components blocks platform hardware low idle power state(S0ix)
during system suspend. So we should bypass them in ACPI hardware
reduced mode.

Suggested-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Li Aubrey <aubrey.li@linux.intel.com>
Cc: <alan@linux.intel.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/54FFF81C.20703@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-12 12:07:13 +01:00
Paul E. McKenney
2a442c9c64 x86: Use common outgoing-CPU-notification code
This commit removes the open-coded CPU-offline notification with new
common code.  Among other things, this change avoids calling scheduler
code using RCU from an offline CPU that RCU is ignoring.  It also allows
Xen to notice at online time that the CPU did not go offline correctly.
Note that Xen has the surviving CPU carry out some cleanup operations,
so if the surviving CPU times out, these cleanup operations might have
been carried out while the outgoing CPU was still running.  It might
therefore be unwise to bring this CPU back online, and this commit
avoids doing so.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <x86@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: <xen-devel@lists.xenproject.org>
2015-03-11 13:22:35 -07:00
Denys Vlasenko
263042e463 x86/asm/entry/64: Save user RSP in pt_regs->sp on SYSCALL64 fastpath
Prepare for the removal of 'usersp', by simplifying PER_CPU(old_rsp) usage:

  - use it only as temp storage

  - store the userspace stack pointer immediately in pt_regs->sp
    on syscall entry, instead of using it later, on syscall exit.

  - change C code to use pt_regs->sp only, instead of PER_CPU(old_rsp)
    and task->thread.usersp.

FIXUP/RESTORE_TOP_OF_STACK are simplified as well.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425926364-9526-4-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-10 13:56:10 +01:00
Denys Vlasenko
616ab249f1 x86/asm/entry/64: Remove stub_iopl
stub_iopl is no longer needed: pt_regs->flags needs no fixing up
after previous change. Remove it.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425984307-2143-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-10 13:56:10 +01:00
Denys Vlasenko
29722cd4ef x86/asm/entry/64: Save R11 into pt_regs->flags on SYSCALL64 fastpath
Before this patch, R11 was saved in pt_regs->r11.

Which looks natural, but requires messy shuffling to/from iret
frame whenever ptrace or e.g. sys_iopl() wants to modify flags -
because that's how this register is used by SYSCALL/SYSRET.

This patch saves R11 in pt_regs->flags, and uses that value for
the SYSRET64 instruction. Shuffling is eliminated.

FIXUP/RESTORE_TOP_OF_STACK are simplified.

stub_iopl is no longer needed: pt_regs->flags needs no fixing up.

Testing shows that syscall fast path is ~54.3 ns before
and after the patch (on 2.7 GHz Sandy Bridge CPU).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425926364-9526-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-10 13:56:10 +01:00
Oleg Nesterov
1d23c4518b x86/fpu: Factor out memset(xstate, 0) in fpu_finit() paths
fx_finit() has two users but only fpu_finit() needs to clear
xstate, alloc_bootmem_align() in setup_init_fpu_buf() returns
zero-filled memory.

And note that both memset()'s look confusing. Yes, offsetof() is
0 for ->fxsave or ->fsave, but it would be cleaner to turn
them into a single memset() which zeroes fpu->state.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425967585-4725-2-git-send-email-bp@alien8.de
Link: http://lkml.kernel.org/r/20150302183257.GC23085@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-10 07:14:31 +01:00
Oleg Nesterov
e7f180dcd8 x86/fpu: Change xstateregs_get()/set() to use ->xsave.i387 rather than ->fxsave
This is a cosmetic change: xstateregs_get() and xstateregs_set()
abuse ->fxsave to access xsave->i387.sw_reserved.

This practice is correct, ->fxsave and xsave->i387 share the same memory,
but IMHO this looks confusing.

And we can make this code more readable if we add a
"struct xsave_struct *" local variable as well.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425967585-4725-1-git-send-email-bp@alien8.de
Link: http://lkml.kernel.org/r/20150302183237.GB23085@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-10 07:14:31 +01:00