Commit Graph

12012 Commits

Author SHA1 Message Date
Thomas Gleixner
bd0b9ac405 genirq: Remove irq argument from irq flow handlers
Most interrupt flow handlers do not use the irq argument. Those few
which use it can retrieve the irq number from the irq descriptor.

Remove the argument.

Search and replace was done with coccinelle and some extra helper
scripts around it. Thanks to Julia for her help!

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
2015-09-16 15:47:51 +02:00
Jiang Liu
9df872faa7 genirq: Move field 'affinity' from irq_data into irq_common_data
Irq affinity mask is per-irq instead of per irqchip, so move it into
struct irq_common_data.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/1433303281-27688-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-16 15:46:49 +02:00
Shaohua Li
5d7c631d92 x86/apic: Serialize LVTT and TSC_DEADLINE writes
The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an
MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not
guaranteed that the write to LVTT has reached the APIC before the
TSC_DEADLINE MSR is written. In such a case the write to the MSR is
ignored and as a consequence the local timer interrupt never fires.

The SDM decribes this issue for xAPIC and x2APIC modes. The
serialization methods recommended by the SDM differ.

xAPIC:
 "1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b.
  2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter.
  3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2.
  4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline."

x2APIC:
 "To allow for efficient access to the APIC registers in x2APIC mode,
  the serializing semantics of WRMSR are relaxed when writing to the
  APIC registers. Thus, system software should not use 'WRMSR to APIC
  registers in x2APIC mode' as a serializing instruction. Read and write
  accesses to the APIC registers will occur in program order. A WRMSR to
  an APIC register may complete before all preceding stores are globally
  visible; software can prevent this by inserting a serializing
  instruction, an SFENCE, or an MFENCE before the WRMSR."

The xAPIC method is to just wait for the memory mapped write to hit
the LVTT by checking whether the MSR write has reached the hardware.
There is no reason why a proper MFENCE after the memory mapped write would
not do the same. Andi Kleen confirmed that MFENCE is sufficient for the
xAPIC case as well.

Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done
unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE
support.

[ tglx: Massaged the changelog ]

Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: <Kernel-team@fb.com>
Cc: <lenb@kernel.org>
Cc: <fenghua.yu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org #v3.7+
Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-14 18:29:59 +02:00
Thomas Gleixner
4857c91f0d x86/ioapic: Force affinity setting in setup_ioapic_dest()
The recent ioapic cleanups changed the affinity setting in
setup_ioapic_dest() from a direct write to the hardware to the delayed
affinity setup via irq_set_affinity().

That results in a warning from chained_irq_exit():
WARNING: CPU: 0 PID: 5 at kernel/irq/migration.c:32 irq_move_masked_irq
[<ffffffff810a0a88>] irq_move_masked_irq+0xb8/0xc0
[<ffffffff8103c161>] ioapic_ack_level+0x111/0x130
[<ffffffff812bbfe8>] intel_gpio_irq_handler+0x148/0x1c0

The reason is that irq_set_affinity() does not write directly to the
hardware. It marks the affinity setting as pending and executes it
from the next interrupt. The chained handler infrastructure does not
take the irq descriptor lock for performance reasons because such a
chained interrupt is not visible to any interfaces. So the delayed
affinity setting triggers the warning in irq_move_masked_irq().

Restore the old behaviour by calling the set_affinity function of the
ioapic chip in setup_ioapic_dest(). This is safe as none of the
interrupts can be on the fly at this point.

Fixes: aa5cb97f14 'x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces'
Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: jarkko.nikula@linux.intel.com
2015-09-14 18:28:15 +02:00
Dave Hansen
ef78f2a4bf x86/fpu: Check CPU-provided sizes against struct declarations
We now have C structures defined for each of the XSAVE state
components that we support.  This patch adds checks during our
verification pass to ensure that the CPU-provided data
enumerated in CPUID leaves matches our C structures.

If not, we warn and dump all the XSAVE CPUID leaves.

Note: this *actually* found an inconsistency with the MPX
'bndcsr' state.  The hardware pads it out differently from
our C structures.  This patch caught it and warned.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233131.A8DB36DA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:22:02 +02:00
Dave Hansen
e6e888f96b x86/fpu: Check to ensure increasing-offset xstate offsets
The xstate CPUID leaves enumerate where each state component is
inside the XSAVE buffer, along with the size of the entire
buffer.  Our new XSAVE sanity-checking code extrapolates an
expected _total_ buffer size by looking at the last component
that it encounters.

That method requires that the highest-numbered component also
be the one with the highest offset.  This is a pretty safe
assumption, but let's add some code to ensure it stays true.

To make this check work correctly, we also need to ensure we
only consider the offsets from enabled features because the
offset register (ebx) will return 0 on unsupported features.

This also means that we will preserve the -1's that we
initialized xstate_offsets/sizes[] with.  That will help
find bugs.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233130.0843AB15@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:22:02 +02:00
Dave Hansen
65ac2e9baa x86/fpu: Correct and check XSAVE xstate size calculations
Note: our xsaves support is currently broken and disabled.  This
patch does not fix it, but it is an incremental improvement.

This might be useful to someone backporting the entire set of
XSAVES patches at some point, but it should not be backported
alone.

Ingo said he wanted something like this (bullets 2 and 3):

  http://lkml.kernel.org/r/20150808091508.GB32641@gmail.com

There are currently two xsave buffer formats: standard and
compacted.  The standard format is waht 'XSAVE' and 'XSAVEOPT'
produce while 'XSAVES' and 'XSAVEC' produce a compacted-formet
buffer.  (The kernel never uses XSAVEC)

But, the XSAVES buffer *ALSO* contains "system state components"
which are never saved by a plain XSAVE.  So, XSAVES has two
things that might make its buffer differently-sized from an
XSAVE-produced one.

The current code assumes that an XSAVES buffer's size is simply
the sum of the sizes of the (user) states which are supported.
This seems to work in most cases, but it is not consistent with
what the SDM says, and it breaks if we 'align' a component in
the buffer.  The calculation is also unnecessary work since the
CPU *tells* us the size of the buffer directly.

This patch just reads the size of the buffer right out of the
CPUID leaf instead of trying to derive it.

But, blindly trusting the CPU like this is dangerous.  We add
a verification pass in do_extra_xstate_size_checks() to ensure
that the size we calculate matches with what we see from the
hardware.  When it comes down to it, we trust but verify the
CPU.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233130.234FE1EC@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:22:01 +02:00
Dave Hansen
1126cb4535 x86/fpu/mpx: Rework MPX 'xstate' types
MPX includes two separate "extended state components".  There is
no real need to have an 'mpx_struct' because we never really
manage the states together.

We also separate out the actual data in 'mpx_bndcsr_state' from
the padding.  We will shortly be checking the state sizes
against our structures and need them to match.  For consistency,
we also ensure to prefix these types with 'mpx_'.

Lastly, we add some comments to mirror some of the descriptions
in the Intel documents (SDM) of the various state components.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233129.384B73EB@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:22:00 +02:00
Dave Hansen
633d54c47a x86/fpu: Add xfeature_enabled() helper instead of test_bit()
We currently use test_bit() in a few places to see if an
xfeature is enabled.  It ends up being a bit ugly because
'xfeatures_mask' is a u64 and test_bit wants an 'unsigned long'
so it requires a cast.  The *_bit() functions are also
techincally atomic, which we have no need for here.

So, remove the test_bit()s and replace with the new
xfeature_enabled() helper.

This also provides a central place to add a comment about the
future need to support 'system xstates'.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233129.B1534F86@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:21:59 +02:00
Dave Hansen
ee9ae257eb x86/fpu: Remove 'xfeature_nr'
xfeature_nr ended up being initialized too late for me to
use it in the "xsave size sanity check" patch which is
later in the series.  I tried to move around its initialization
but realized that it was just as easy to get rid of it.

We only have 9 XFEATURES.  Instead of dynamically calculating
and storing the last feature, just use the compile-time max:
XFEATURES_NR_MAX.  Note that even with 'xfeatures_nr' we can
had "holes" in the xfeatures_mask that we had to deal with.

We also change a 'leaf' variable to be a plain 'i'.  Although
it is used to grab a cpuid leaf in this one loop, all of the
other loops just use an 'i' and I find it much more obvious
to keep the naming consistent across all the similar loops.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233128.3F30DF5A@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:21:59 +02:00
Dave Hansen
8a93c9e0dc x86/fpu: Rework XSTATE_* macros to remove magic '2'
The 'xstate.c' code has a bunch of references to '2'.  This
is because we have a lot more work to do for the "extended"
xstates than the "legacy" ones and state component 2 is the
first "extended" state.

This patch replaces all of the instances of '2' with
FIRST_EXTENDED_XFEATURE, which clearly explains what is
going on.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233128.A8C0BF51@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:21:58 +02:00
Dave Hansen
dad8c4fe85 x86/fpu: Rename XFEATURES_NR_MAX
This is a logcal followon to the last patch.  It makes the
XFEATURE_MAX naming consistent with the other enum values.
This is what Ingo suggested.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233127.A541448F@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:21:57 +02:00
Dave Hansen
d91cab7813 x86/fpu: Rename XSAVE macros
There are two concepts that have some confusing naming:
 1. Extended State Component numbers (currently called
    XFEATURE_BIT_*)
 2. Extended State Component masks (currently called XSTATE_*)

The numbers are (currently) from 0-9.  State component 3 is the
bounds registers for MPX, for instance.

But when we want to enable "state component 3", we go set a bit
in XCR0.  The bit we set is 1<<3.  We can check to see if a
state component feature is enabled by looking at its bit.

The current 'xfeature_bit's are at best xfeature bit _numbers_.
Calling them bits is at best inconsistent with ending the enum
list with 'XFEATURES_NR_MAX'.

This patch renames the enum to be 'xfeature'.  These also
happen to be what the Intel documentation calls a "state
component".

We also want to differentiate these from the "XSTATE_*" macros.
The "XSTATE_*" macros are a mask, and we rename them to match.

These macros are reasonably widely used so this patch is a
wee bit big, but this really is just a rename.

The only non-mechanical part of this is the

	s/XSTATE_EXTEND_MASK/XFEATURE_MASK_EXTEND/

We need a better name for it, but that's another patch.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233126.38653250@viggo.jf.intel.com
[ Ported to v4.3-rc1. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:21:46 +02:00
Jan Beulich
f454b47886 x86/ldt: Fix small LDT allocation for Xen
While the following commit:

  37868fe113 ("x86/ldt: Make modify_ldt synchronous")

added a nice comment explaining that Xen needs page-aligned
whole page chunks for guest descriptor tables, it then
nevertheless used kzalloc() on the small size path.

As I'm unaware of guarantees for kmalloc(PAGE_SIZE, ) to return
page-aligned memory blocks, I believe this needs to be switched
back to __get_free_page() (or better get_zeroed_page()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/55E735D6020000780009F1E6@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:10:50 +02:00
Dave Hansen
4109ca066b x86/fpu: Remove XSTATE_RESERVE
The original purpose of XSTATE_RESERVE was to carve out space
to store all of the possible extended state components that
get saved with the XSAVE instruction(s).

However, we are now almost entirely dynamically allocating
the buffers we use for XSAVE by placing them at the end of
the task_struct and them sizing them at boot.  The one
exception for that is the init_task.

The maximum extended state component size that we have today
is on systems with space for AVX-512 and Memory Protection
Keys: 2696 bytes.  We have reserved a PAGE_SIZE buffer in
the init_task via fpregs_state->__padding.

This check ensures that even if the component sizes or
layout were changed (which we do not expect), that we will
still not overflow the init_task's buffer.

In the case that we detect we might overflow the buffer,
we completely disable XSAVE support in the kernel and try
to boot as if we had 'legacy x87 FPU' support in place.
This is a crippled state without any of the XSAVE-enabled
features (MPX, AVX, etc...).  But, it at least let us
boot safely.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233125.D948D475@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:07:56 +02:00
Dave Hansen
0a26537502 x86/fpu: Move XSAVE-disabling code to a helper
When we want to _completely_ disable XSAVE support as far as
the kernel is concerned, we have a big set of feature flags
to clear.  We currently only do this in cases where the user
asks for it to be disabled, but we are about to expand the
places where we do it to handle errors too.

Move the code in to xstate.c, and put it in the xstate.h
header.  We will use it in the next patch too.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233124.EA9A70E5@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:07:56 +02:00
Dave Hansen
b081535959 x86/fpu: Print xfeature buffer size in decimal
This is utterly a personal taste thing, but I find it way easier
to read structure sizes in decimal than in hex.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233124.1A8B04A8@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:07:55 +02:00
Sukadev Bhattiprolu
8f3e5684d3 perf/core: Drop PERF_EVENT_TXN
We currently use PERF_EVENT_TXN flag to determine if we are in the middle
of a transaction. If in a transaction, we defer the schedulability checks
from pmu->add() operation to the pmu->commit() operation.

Now that we have "transaction types" (PERF_PMU_TXN_ADD, PERF_PMU_TXN_READ)
we can use the type to determine if we are in a transaction and drop the
PERF_EVENT_TXN flag.

When PERF_EVENT_TXN is dropped, the cpuhw->group_flag on some architectures
becomes unused, so drop that field as well.

This is an extension of the Powerpc patch from Peter Zijlstra to s390,
Sparc and x86 architectures.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1441336073-22750-11-git-send-email-sukadev@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 11:27:30 +02:00
Sukadev Bhattiprolu
fbbe070115 perf/core: Add a 'flags' parameter to the PMU transactional interfaces
Currently, the PMU interface allows reading only one counter at a time.
But some PMUs like the 24x7 counters in Power, support reading several
counters at once. To leveage this functionality, extend the transaction
interface to support a "transaction type".

The first type, PERF_PMU_TXN_ADD, refers to the existing transactions,
i.e. used to _schedule_ all the events on the PMU as a group. A second
transaction type, PERF_PMU_TXN_READ, will be used in a follow-on patch,
by the 24x7 counters to read several counters at once.

Extend the transaction interfaces to the PMU to accept a 'txn_flags'
parameter and use this parameter to ignore any transactions that are
not of type PERF_PMU_TXN_ADD.

Thanks to Peter Zijlstra for his input.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[peterz: s390 compile fix]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1441336073-22750-3-git-send-email-sukadev@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 11:27:25 +02:00
Huaitong Han
73fdeb6659 perf/x86/intel/pt: Fix KVM warning due to doing rdmsr() before the CPUID test
If KVM does not support INTEL_PT, guest MSR_IA32_RTIT_CTL reading will
produce host warning like "kvm [2469]: vcpu0 unhandled rdmsr: 0x570".

Guest can determine whether the CPU supports Intel_PT according to CPUID,
so test_cpu_cap function is added before rdmsr,and it is more in line with
the code style.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Link: http://lkml.kernel.org/r/1441009262-9792-1-git-send-email-huaitong.han@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 11:27:23 +02:00
Kan Liang
deb27519bf perf/x86/intel: Fix LBR callstack issue caused by FREEZE_LBRS_ON_PMI
This patch fixes an issue which introduced by commit
1a78d93750 ("perf/x86/intel: Streamline
LBR MSR handling in PMI").

The old patch not only avoids writing LBR_SELECT MSR in PMI, but also
avoids updating lbr_select variable. So in PMI, FREEZE_LBRS_ON_PMI bit
is always mistakenly set for IA32_DEBUGCTLMSR MSR, which causes
superfluous increase/decrease of LBR_TOS when collecting LBR callstack.

Reported-by: Milian Wolff <mail@milianw.de>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1439815051-8616-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 11:27:22 +02:00
Alexander Shishkin
d2878d642a perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems
BTS leaks kernel addresses even in userspace-only mode due to imprecise IP
sampling, so sometimes syscall entry points or page fault handler addresses
end up in a userspace trace.

Now, intel_bts driver exports trace data zero-copy, it does not scan through
it to filter out the kernel addresses and it's would be a O(n) job.

To work around this situation, this patch forbids the use of intel_bts
driver by unprivileged users on systems with the paranoid setting above the
(kernel's) default "1", which still allows kernel profiling. In other words,
using intel_bts driver implies kernel tracing, regardless of the
"exclude_kernel" attribute setting.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1441030168-6853-3-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 11:27:22 +02:00
Alexander Shishkin
a09d31f452 perf/x86/intel/ds: Work around BTS leaking kernel addresses
BTS leaks kernel addresses even in userspace-only mode due to imprecise IP
sampling, so sometimes syscall entry points or page fault handler addresses
end up in a userspace trace.

Since this driver uses a relatively small buffer for BTS records and it has
to iterate through them anyway, it can also take on the additional job of
filtering out the records that contain kernel addresses when kernel space
tracing is not enabled.

This patch changes the bts code to skip the offending records from perf
output. In order to request the exact amount of space on the ring buffer,
we need to do an extra pass through the records to know how many there are
of the valid ones, but considering the small size of the buffer, this extra
pass adds very little overhead to the nmi handler. This way we won't end
up with awkward IP samples with zero IPs in the perf stream.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1441030168-6853-2-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 11:27:21 +02:00
Adrian Hunter
b20112edea perf/x86: Improve accuracy of perf/sched clock
When TSC is stable perf/sched clock is based on it.
However the conversion from cycles to nanoseconds
is not as accurate as it could be.  Because
CYC2NS_SCALE_FACTOR is 10, the accuracy is +/- 1/2048

The change is to calculate the maximum shift that
results in a multiplier that is still a 32-bit number.
For example all frequencies over 1 GHz will have
a shift of 32, making the accuracy of the conversion
+/- 1/(2^33).  That is achieved by using the
'clocks_calc_mult_shift()' function.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1440147918-22250-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 11:27:20 +02:00
Ingo Molnar
216dcaf290 Merge branch 'perf/urgent' into perf/core, to pick up fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 11:25:55 +02:00
Peter Zijlstra
ebfb4988f0 perf/x86/intel: Fix constraint access
Sasha reported that we can get here with .idx==-1, and
cpuc->event_constraints unallocated.

Suggested-by: Stephane Eranian <eranian@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Fixes: b371b59431 ("perf/x86: Fix event/group validation")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 09:37:10 +02:00
Borislav Petkov
7c5b190e11 x86/cpu: Print family/model/stepping in hex
924e101a7a ("x86/debug: Dump family, model, stepping of the
boot CPU") had its good intentions to dump the exact F/M/S as an
aid during debugging sessions but its output can be ambiguous.
Fix that:

-smpboot: CPU0: Intel Core Processor (Broadwell) (fam: 06, model: 47, stepping: 02)
+smpboot: CPU0: Intel Core Processor (Broadwell) (family: 0x6, model: 0x47, stepping: 0x2)

Also, spell out "family".

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1441914927-32037-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 09:30:07 +02:00
Alexander Shishkin
d249872939 perf/x86/intel/bts: Set event->hw.itrace_started in pmu::start to match the new logic
Since event->hw.itrace_started is now set in pmu::start() to signal the beginning of
the trace, do so also in the intel_bts driver.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1437140050-23363-4-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-11 10:06:03 +02:00
Christoph Hellwig
452e06af1f dma-mapping: consolidate dma_set_mask
Almost everyone implements dma_set_mask the same way, although some time
that's hidden in ->set_dma_mask methods.

This patch consolidates those into a common implementation that either
calls ->set_dma_mask if present or otherwise uses the default
implementation.  Some architectures used to only call ->set_dma_mask
after the initial checks, and those instance have been fixed to do the
full work.  h8300 implemented dma_set_mask bogusly as a no-ops and has
been fixed.

Unfortunately some architectures overload unrelated semantics like changing
the dma_ops into it so we still need to allow for an architecture override
for now.

[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Christoph Hellwig
6894258eda dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}
Since 2009 we have a nice asm-generic header implementing lots of DMA API
functions for architectures using struct dma_map_ops, but unfortunately
it's still missing a lot of APIs that all architectures still have to
duplicate.

This series consolidates the remaining functions, although we still need
arch opt outs for two of them as a few architectures have very
non-standard implementations.

This patch (of 5):

The coherent DMA allocator works the same over all architectures supporting
dma_map operations.

This patch consolidates them and converges the minor differences:

 - the debug_dma helpers are now called from all architectures, including
   those that were previously missing them
 - dma_alloc_from_coherent and dma_release_from_coherent are now always
   called from the generic alloc/free routines instead of the ops
   dma-mapping-common.h always includes dma-coherent.h to get the defintions
   for them, or the stubs if the architecture doesn't support this feature
 - checks for ->alloc / ->free presence are removed.  There is only one
   magic instead of dma_map_ops without them (mic_dma_ops) and that one
   is x86 only anyway.

Besides that only x86 needs special treatment to replace a default devices
if none is passed and tweak the gfp_flags.  An optional arch hook is provided
for that.

[linux@roeck-us.net: fix build]
[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Dave Young
2965faa5e0 kexec: split kexec_load syscall from kexec core code
There are two kexec load syscalls, kexec_load another and kexec_file_load.
 kexec_file_load has been splited as kernel/kexec_file.c.  In this patch I
split kexec_load syscall code to kernel/kexec.c.

And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.

The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled.  But kexec-tools use
kexec_load syscall can bypass the checking.

Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel.  KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.

Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig.  Also updated general kernel code with to
kexec_load syscall.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Linus Torvalds
f6f7a63692 Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:
 "Almost all of the rest of MM.  There was an unusually large amount of
  MM material this time"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (141 commits)
  zpool: remove no-op module init/exit
  mm: zbud: constify the zbud_ops
  mm: zpool: constify the zpool_ops
  mm: swap: zswap: maybe_preload & refactoring
  zram: unify error reporting
  zsmalloc: remove null check from destroy_handle_cache()
  zsmalloc: do not take class lock in zs_shrinker_count()
  zsmalloc: use class->pages_per_zspage
  zsmalloc: consider ZS_ALMOST_FULL as migrate source
  zsmalloc: partial page ordering within a fullness_list
  zsmalloc: use shrinker to trigger auto-compaction
  zsmalloc: account the number of compacted pages
  zsmalloc/zram: introduce zs_pool_stats api
  zsmalloc: cosmetic compaction code adjustments
  zsmalloc: introduce zs_can_compact() function
  zsmalloc: always keep per-class stats
  zsmalloc: drop unused variable `nr_to_migrate'
  mm/memblock.c: fix comment in __next_mem_range()
  mm/page_alloc.c: fix type information of memoryless node
  memory-hotplug: fix comments in zone_spanned_pages_in_node() and zone_spanned_pages_in_node()
  ...
2015-09-08 17:52:23 -07:00
Mark Salter
5dd2c4bded x86: use generic early mem copy
The early_ioremap library now has a generic copy_from_early_mem()
function.  Use the generic copy function for x86 relocate_initrd().

[akpm@linux-foundation.org: remove MAX_MAP_CHUNK define, per Yinghai Lu]
Signed-off-by: Mark Salter <msalter@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-08 15:35:28 -07:00
Linus Torvalds
12f03ee606 libnvdimm for 4.3:
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
    mechanism for adding device-driver-discovered memory regions to the
    kernel's direct map.  This facility is used by the pmem driver to
    enable pfn_to_page() operations on the page frames returned by DAX
    ('direct_access' in 'struct block_device_operations'). For now, the
    'memmap' allocation for these "device" pages comes from "System
    RAM".  Support for allocating the memmap from device memory will
    arrive in a later kernel.
 
 2/ Introduce memremap() to replace usages of ioremap_cache() and
    ioremap_wt().  memremap() drops the __iomem annotation for these
    mappings to memory that do not have i/o side effects.  The
    replacement of ioremap_cache() with memremap() is limited to the
    pmem driver to ease merging the api change in v4.3.  Completion of
    the conversion is targeted for v4.4.
 
 3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
    driver, update the VFS DAX implementation and PMEM api to provide
    persistence guarantees for kernel operations on a DAX mapping.
 
 4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
    cacheable to improve performance.
 
 5/ Miscellaneous updates and fixes to libnvdimm including support
    for issuing "address range scrub" commands, clarifying the optimal
    'sector size' of pmem devices, a clarification of the usage of the
    ACPI '_STA' (status) property for DIMM devices, and other minor
    fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5
 JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R
 OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C
 nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ
 NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e
 zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR
 1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA
 sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb
 bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1
 o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz
 dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn
 slsw6DkrWT60CRE42nbK
 =o57/
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "This update has successfully completed a 0day-kbuild run and has
  appeared in a linux-next release.  The changes outside of the typical
  drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
  removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
  the introduction of ZONE_DEVICE + devm_memremap_pages().

  Summary:

   - Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
     mechanism for adding device-driver-discovered memory regions to the
     kernel's direct map.

     This facility is used by the pmem driver to enable pfn_to_page()
     operations on the page frames returned by DAX ('direct_access' in
     'struct block_device_operations').

     For now, the 'memmap' allocation for these "device" pages comes
     from "System RAM".  Support for allocating the memmap from device
     memory will arrive in a later kernel.

   - Introduce memremap() to replace usages of ioremap_cache() and
     ioremap_wt().  memremap() drops the __iomem annotation for these
     mappings to memory that do not have i/o side effects.  The
     replacement of ioremap_cache() with memremap() is limited to the
     pmem driver to ease merging the api change in v4.3.

     Completion of the conversion is targeted for v4.4.

   - Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
     driver, update the VFS DAX implementation and PMEM api to provide
     persistence guarantees for kernel operations on a DAX mapping.

   - Convert the ACPI NFIT 'BLK' driver to map the block apertures as
     cacheable to improve performance.

   - Miscellaneous updates and fixes to libnvdimm including support for
     issuing "address range scrub" commands, clarifying the optimal
     'sector size' of pmem devices, a clarification of the usage of the
     ACPI '_STA' (status) property for DIMM devices, and other minor
     fixes"

* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
  libnvdimm, pmem: direct map legacy pmem by default
  libnvdimm, pmem: 'struct page' for pmem
  libnvdimm, pfn: 'struct page' provider infrastructure
  x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
  add devm_memremap_pages
  mm: ZONE_DEVICE for "device memory"
  mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
  dax: drop size parameter to ->direct_access()
  nd_blk: change aperture mapping from WC to WB
  nvdimm: change to use generic kvfree()
  pmem, dax: have direct_access use __pmem annotation
  dax: update I/O path to do proper PMEM flushing
  pmem: add copy_from_iter_pmem() and clear_pmem()
  pmem, x86: clean up conditional pmem includes
  pmem: remove layer when calling arch_has_wmb_pmem()
  pmem, x86: move x86 PMEM API to new pmem.h header
  libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
  pmem: switch to devm_ allocations
  devres: add devm_memremap
  libnvdimm, btt: write and validate parent_uuid
  ...
2015-09-08 14:35:59 -07:00
Linus Torvalds
b793c005ce Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Highlights:

   - PKCS#7 support added to support signed kexec, also utilized for
     module signing.  See comments in 3f1e1bea.

     ** NOTE: this requires linking against the OpenSSL library, which
        must be installed, e.g.  the openssl-devel on Fedora **

   - Smack
      - add IPv6 host labeling; ignore labels on kernel threads
      - support smack labeling mounts which use binary mount data

   - SELinux:
      - add ioctl whitelisting (see
        http://kernsec.org/files/lss2015/vanderstoep.pdf)
      - fix mprotect PROT_EXEC regression caused by mm change

   - Seccomp:
      - add ptrace options for suspend/resume"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (57 commits)
  PKCS#7: Add OIDs for sha224, sha284 and sha512 hash algos and use them
  Documentation/Changes: Now need OpenSSL devel packages for module signing
  scripts: add extract-cert and sign-file to .gitignore
  modsign: Handle signing key in source tree
  modsign: Use if_changed rule for extracting cert from module signing key
  Move certificate handling to its own directory
  sign-file: Fix warning about BIO_reset() return value
  PKCS#7: Add MODULE_LICENSE() to test module
  Smack - Fix build error with bringup unconfigured
  sign-file: Document dependency on OpenSSL devel libraries
  PKCS#7: Appropriately restrict authenticated attributes and content type
  KEYS: Add a name for PKEY_ID_PKCS7
  PKCS#7: Improve and export the X.509 ASN.1 time object decoder
  modsign: Use extract-cert to process CONFIG_SYSTEM_TRUSTED_KEYS
  extract-cert: Cope with multiple X.509 certificates in a single file
  sign-file: Generate CMS message as signature instead of PKCS#7
  PKCS#7: Support CMS messages also [RFC5652]
  X.509: Change recorded SKID & AKID to not include Subject or Issuer
  PKCS#7: Check content type and versions
  MAINTAINERS: The keyrings mailing list has moved
  ...
2015-09-08 12:41:25 -07:00
Linus Torvalds
6f0a2fc1fe Merge branch 'nmi' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull NMI backtrace update from Russell King:
 "These changes convert the x86 NMI handling to be a library
  implementation which other architectures can make use of.  Thomas
  Gleixner has reviewed and tested these changes, and wishes me to send
  these rather than taking them through the tip tree.

  The final patch in the set adds an initial implementation using this
  infrastructure to ARM, even though it doesn't send the IPI at "NMI"
  level.  Patches are in progress to add the ARM equivalent of NMI, but
  we still need the IRQ-level fallback for systems where the "NMI" isn't
  available due to secure firmware denying access to it"

* 'nmi' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: add basic support for on-demand backtrace of other CPUs
  nmi: x86: convert to generic nmi handler
  nmi: create generic NMI backtrace implementation
2015-09-08 12:28:10 -07:00
Ingo Molnar
8fcb346b91 x86/headers: Convert sigcontext_ia32 uses to sigcontext_32
Use the new name in kernel code, and move the old name to the
user-space-only legacy section of the UAPI header.

Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1441438363-9999-14-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08 10:03:59 +02:00
Ingo Molnar
530e5c8271 x86/headers: Make sigcontext pointers bit independent
Before we can eliminate the duplication between 'struct
sigcontext_32' and 'struct sigcontext_ia32', make the 'fpstate'
pointer field in 'struct sigcontext_32' bit independent.

Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1441438363-9999-12-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08 10:03:58 +02:00
Ingo Molnar
86e9fc3a0e x86/headers: Convert uses of _fpstate_ia32 to _fpstate_32
Remove uses of _fpstate_ia32 from the kernel, and move the
legacy _fpstate_ia32 definition to the user-space only portion
of the header.

Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1441438363-9999-9-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08 10:03:57 +02:00
Andy Lutomirski
76fc5e7b23 x86/vm86: Block non-root vm86(old) if mmap_min_addr != 0
vm86 exposes an interesting attack surface against the entry
code. Since vm86 is mostly useless anyway if mmap_min_addr != 0,
just turn it off in that case.

There are some reports that vbetool can work despite setting
mmap_min_addr to zero.  This shouldn't break that use case,
as CAP_SYS_RAWIO already overrides mmap_min_addr.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-05 09:01:16 +02:00
Ingo Molnar
95cd2ea7d5 Merge branch 'linus' into x86/urgent, to be able to merge a dependent fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-05 09:00:47 +02:00
Ulrich Obergfell
ec6a90661a watchdog: rename watchdog_suspend() and watchdog_resume()
Rename watchdog_suspend() to lockup_detector_suspend() and
watchdog_resume() to lockup_detector_resume() to avoid confusion with the
watchdog subsystem and to be consistent with the existing name
lockup_detector_init().

Also provide comment blocks to explain the watchdog_running and
watchdog_suspended variables and their relationship.

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Stephane Eranian <eranian@google.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Ulrich Obergfell
999bbe49ea watchdog: use suspend/resume interface in fixup_ht_bug()
Remove watchdog_nmi_disable_all() and watchdog_nmi_enable_all() since
these functions are no longer needed.  If a subsystem has a need to
deactivate the watchdog temporarily, it should utilize the
watchdog_suspend() and watchdog_resume() functions.

[akpm@linux-foundation.org: fix build with CONFIG_LOCKUP_DETECTOR=m]
Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Stephane Eranian <eranian@google.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Guenter Roeck
aacfbe6a97 kernel/watchdog: move NMI function header declarations from watchdog.h to nmi.h
The kernel's NMI watchdog has nothing to do with the watchdog subsystem.
Its header declarations should be in linux/nmi.h, not linux/watchdog.h.

The code provided two sets of dummy functions if HARDLOCKUP_DETECTOR is
not configured, one in the include file and one in kernel/watchdog.c.
Remove the dummy functions from kernel/watchdog.c and use those from the
include file.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Linus Torvalds
ca520cab25 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking and atomic updates from Ingo Molnar:
 "Main changes in this cycle are:

   - Extend atomic primitives with coherent logic op primitives
     (atomic_{or,and,xor}()) and deprecate the old partial APIs
     (atomic_{set,clear}_mask())

     The old ops were incoherent with incompatible signatures across
     architectures and with incomplete support.  Now every architecture
     supports the primitives consistently (by Peter Zijlstra)

   - Generic support for 'relaxed atomics':

       - _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return()
       - atomic_read_acquire()
       - atomic_set_release()

     This came out of porting qwrlock code to arm64 (by Will Deacon)

   - Clean up the fragile static_key APIs that were causing repeat bugs,
     by introducing a new one:

       DEFINE_STATIC_KEY_TRUE(name);
       DEFINE_STATIC_KEY_FALSE(name);

     which define a key of different types with an initial true/false
     value.

     Then allow:

       static_branch_likely()
       static_branch_unlikely()

     to take a key of either type and emit the right instruction for the
     case.  To be able to know the 'type' of the static key we encode it
     in the jump entry (by Peter Zijlstra)

   - Static key self-tests (by Jason Baron)

   - qrwlock optimizations (by Waiman Long)

   - small futex enhancements (by Davidlohr Bueso)

   - ... and misc other changes"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
  jump_label/x86: Work around asm build bug on older/backported GCCs
  locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations
  locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h
  locking/qrwlock: Make use of _{acquire|release|relaxed}() atomics
  locking/qrwlock: Implement queue_write_unlock() using smp_store_release()
  locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition
  locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'
  locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication
  locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations
  locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic
  locking/static_keys: Make verify_keys() static
  jump label, locking/static_keys: Update docs
  locking/static_keys: Provide a selftest
  jump_label: Provide a self-test
  s390/uaccess, locking/static_keys: employ static_branch_likely()
  x86, tsc, locking/static_keys: Employ static_branch_likely()
  locking/static_keys: Add selftest
  locking/static_keys: Add a new static_key interface
  locking/static_keys: Rework update logic
  locking/static_keys: Add static_key_{en,dis}able() helpers
  ...
2015-09-03 15:46:07 -07:00
Thomas Gleixner
66c117d7fa x86/alternatives: Make optimize_nops() interrupt safe and synced
Richard reported the following crash:

[    0.036000] BUG: unable to handle kernel paging request at 55501e06
[    0.036000] IP: [<c0aae48b>] common_interrupt+0xb/0x38
[    0.036000] Call Trace:
[    0.036000]  [<c0409c80>] ? add_nops+0x90/0xa0
[    0.036000]  [<c040a054>] apply_alternatives+0x274/0x630

Chuck decoded:

 "  0:   8d 90 90 83 04 24       lea    0x24048390(%eax),%edx
    6:   80 fc 0f                cmp    $0xf,%ah
    9:   a8 0f                   test   $0xf,%al
 >> b:   a0 06 1e 50 55          mov    0x55501e06,%al
   10:   57                      push   %edi
   11:   56                      push   %esi

 Interrupt 0x30 occurred while the alternatives code was replacing the
 initial 0x90,0x90,0x90 NOPs (from the ASM_CLAC macro) with the
 optimized version, 0x8d,0x76,0x00. Only the first byte has been
 replaced so far, and it makes a mess out of the insn decoding."

optimize_nops() is buggy in two aspects:

- It's not disabling interrupts across the modification
- It's lacking a sync_core() call

Add both.

Fixes: 4fd4b6e553 'x86/alternatives: Use optimized NOPs for padding'
Reported-and-tested-by: "Richard W.M. Jones" <rjones@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard W.M. Jones <rjones@redhat.com>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1509031232340.15006@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-03 21:27:47 +02:00
Linus Torvalds
ae98207309 Power management and ACPI material for v4.3-rc1
- ACPICA update to upstream revision 20150818 including method
    tracing extensions to allow more in-depth AML debugging in the
    kernel and a number of assorted fixes and cleanups (Bob Moore,
    Lv Zheng, Markus Elfring).
 
  - ACPI sysfs code updates and a documentation update related to
    AML method tracing (Lv Zheng).
 
  - ACPI EC driver fix related to serialized evaluations of _Qxx
    methods and ACPI tools updates allowing the EC userspace tool
    to be built from the kernel source (Lv Zheng).
 
  - ACPI processor driver updates preparing it for future
    introduction of CPPC support and ACPI PCC mailbox driver
    updates (Ashwin Chaugule).
 
  - ACPI interrupts enumeration fix for a regression related
    to the handling of IRQ attribute conflicts between MADT
    and the ACPI namespace (Jiang Liu).
 
  - Fixes related to ACPI device PM (Mika Westerberg, Srinidhi Kasagar).
 
  - ACPI device registration code reorganization to separate the
    sysfs-related code and bus type operations from the rest (Rafael
    J Wysocki).
 
  - Assorted cleanups in the ACPI core (Jarkko Nikula, Mathias Krause,
    Andy Shevchenko, Rafael J Wysocki, Nicolas Iooss).
 
  - ACPI cpufreq driver and ia64 cpufreq driver fixes and cleanups
    (Pan Xinhui, Rafael J Wysocki).
 
  - cpufreq core cleanups on top of the previous changes allowing it
    to preseve its sysfs directories over system suspend/resume (Viresh
    Kumar, Rafael J Wysocki, Sebastian Andrzej Siewior).
 
  - cpufreq fixes and cleanups related to governors (Viresh Kumar).
 
  - cpufreq updates (core and the cpufreq-dt driver) related to the
    turbo/boost mode support (Viresh Kumar, Bartlomiej Zolnierkiewicz).
 
  - New DT bindings for Operating Performance Points (OPP), support
    for them in the OPP framework and in the cpufreq-dt driver plus
    related OPP framework fixes and cleanups (Viresh Kumar).
 
  - cpufreq powernv driver updates (Shilpasri G Bhat).
 
  - New cpufreq driver for Mediatek MT8173 (Pi-Cheng Chen).
 
  - Assorted cpufreq driver (speedstep-lib, sfi, integrator) cleanups
    and fixes (Abhilash Jindal, Andrzej Hajda, Cristian Ardelean).
 
  - intel_pstate driver updates including Skylake-S support, support
    for enabling HW P-states per CPU and an additional vendor bypass
    list entry (Kristen Carlson Accardi, Chen Yu, Ethan Zhao).
 
  - cpuidle core fixes related to the handling of coupled idle states
    (Xunlei Pang).
 
  - intel_idle driver updates including Skylake Client support and
    support for freeze-mode-specific idle states (Len Brown).
 
  - Driver core updates related to power management (Andy Shevchenko,
    Rafael J Wysocki).
 
  - Generic power domains framework fixes and cleanups (Jon Hunter,
    Geert Uytterhoeven, Rajendra Nayak, Ulf Hansson).
 
  - Device PM QoS framework update to allow the latency tolerance
    setting to be exposed to user space via sysfs (Mika Westerberg).
 
  - devfreq support for PPMUv2 in Exynos5433 and a fix for an incorrect
    exynos-ppmu DT binding (Chanwoo Choi, Javier Martinez Canillas).
 
  - System sleep support updates (Alan Stern, Len Brown, SungEun Kim).
 
  - rockchip-io AVS support updates (Heiko Stuebner).
 
  - PM core clocks support fixup (Colin Ian King).
 
  - Power capping RAPL driver update including support for Skylake H/S
    and Broadwell-H (Radivoje Jovanovic, Seiichi Ikarashi).
 
  - Generic device properties framework fixes related to the handling
    of static (driver-provided) property sets (Andy Shevchenko).
 
  - turbostat and cpupower updates (Len Brown, Shilpasri G Bhat,
    Shreyas B Prabhu).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJV5hhGAAoJEILEb/54YlRxs+EQAK51iFk48+IbpHYaZZ50Yo4m
 ZZc2zBcbwRcBlU9vKERrhG+jieSl8J/JJNxT8vBjKqyvNw038mCjewQh02ol0HuC
 R7nlDiVJkmZ50sLO4xwE/1UBZr/XqbddwCUnYzvFMkMTA0ePzFtf8BrJ1FXpT8S/
 fkwSXQty6hvJDwxkfrbMSaA730wMju9lahx8D6MlmUAedWYZOJDMQKB4WKa/St5X
 9uckBPHUBB2KiKlXxdbFPwKLNxHvLROq5SpDLc6cM/7XZB+QfNFy85CUjCUtYo1O
 1W8k0qnztvZ6UEv27qz5dejGyAGOarMWGGNsmL9evoeGeHRpQL+dom7HcTnbAfUZ
 walyhYSm/zKkdy7Vl3xWUUQkMG48+PviMI6K0YhHXb3Rm5wlR/yBNZTwNIty9SX/
 fKCHEa8QynWwLxgm53c3xRkiitJxMsHNK03moLD9zQMjshTyTNvpNbZoahyKQzk6
 H+9M1DBRHhkkREDWSwGutukxfEMtWe2vcZcyERrFiY7l5k1j58DwDBMPqjPhRv6q
 P/1NlCzr0XYf83Y86J18LbDuPGDhTjjIEn6CqbtI2mmWqTg3+rF7zvS2ux+FzMnA
 gisv8l6GT9JiWhxKFqqL/rrVpwtyHebWLYE/RpNUW6fEzLziRNj1qyYO9dqI/GGi
 I3rfxlXoc/5xJWCgNB8f
 =fTgI
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael Wysocki:
 "From the number of commits perspective, the biggest items are ACPICA
  and cpufreq changes with the latter taking the lead (over 50 commits).

  On the cpufreq front, there are many cleanups and minor fixes in the
  core and governors, driver updates etc.  We also have a new cpufreq
  driver for Mediatek MT8173 chips.

  ACPICA mostly updates its debug infrastructure and adds a number of
  fixes and cleanups for a good measure.

  The Operating Performance Points (OPP) framework is updated with new
  DT bindings and support for them among other things.

  We have a few updates of the generic power domains framework and a
  reorganization of the ACPI device enumeration code and bus type
  operations.

  And a lot of fixes and cleanups all over.

  Included is one branch from the MFD tree as it contains some
  PM-related driver core and ACPI PM changes a few other commits are
  based on.

  Specifics:

   - ACPICA update to upstream revision 20150818 including method
     tracing extensions to allow more in-depth AML debugging in the
     kernel and a number of assorted fixes and cleanups (Bob Moore, Lv
     Zheng, Markus Elfring).

   - ACPI sysfs code updates and a documentation update related to AML
     method tracing (Lv Zheng).

   - ACPI EC driver fix related to serialized evaluations of _Qxx
     methods and ACPI tools updates allowing the EC userspace tool to be
     built from the kernel source (Lv Zheng).

   - ACPI processor driver updates preparing it for future introduction
     of CPPC support and ACPI PCC mailbox driver updates (Ashwin
     Chaugule).

   - ACPI interrupts enumeration fix for a regression related to the
     handling of IRQ attribute conflicts between MADT and the ACPI
     namespace (Jiang Liu).

   - Fixes related to ACPI device PM (Mika Westerberg, Srinidhi
     Kasagar).

   - ACPI device registration code reorganization to separate the
     sysfs-related code and bus type operations from the rest (Rafael J
     Wysocki).

   - Assorted cleanups in the ACPI core (Jarkko Nikula, Mathias Krause,
     Andy Shevchenko, Rafael J Wysocki, Nicolas Iooss).

   - ACPI cpufreq driver and ia64 cpufreq driver fixes and cleanups (Pan
     Xinhui, Rafael J Wysocki).

   - cpufreq core cleanups on top of the previous changes allowing it to
     preseve its sysfs directories over system suspend/resume (Viresh
     Kumar, Rafael J Wysocki, Sebastian Andrzej Siewior).

   - cpufreq fixes and cleanups related to governors (Viresh Kumar).

   - cpufreq updates (core and the cpufreq-dt driver) related to the
     turbo/boost mode support (Viresh Kumar, Bartlomiej Zolnierkiewicz).

   - New DT bindings for Operating Performance Points (OPP), support for
     them in the OPP framework and in the cpufreq-dt driver plus related
     OPP framework fixes and cleanups (Viresh Kumar).

   - cpufreq powernv driver updates (Shilpasri G Bhat).

   - New cpufreq driver for Mediatek MT8173 (Pi-Cheng Chen).

   - Assorted cpufreq driver (speedstep-lib, sfi, integrator) cleanups
     and fixes (Abhilash Jindal, Andrzej Hajda, Cristian Ardelean).

   - intel_pstate driver updates including Skylake-S support, support
     for enabling HW P-states per CPU and an additional vendor bypass
     list entry (Kristen Carlson Accardi, Chen Yu, Ethan Zhao).

   - cpuidle core fixes related to the handling of coupled idle states
     (Xunlei Pang).

   - intel_idle driver updates including Skylake Client support and
     support for freeze-mode-specific idle states (Len Brown).

   - Driver core updates related to power management (Andy Shevchenko,
     Rafael J Wysocki).

   - Generic power domains framework fixes and cleanups (Jon Hunter,
     Geert Uytterhoeven, Rajendra Nayak, Ulf Hansson).

   - Device PM QoS framework update to allow the latency tolerance
     setting to be exposed to user space via sysfs (Mika Westerberg).

   - devfreq support for PPMUv2 in Exynos5433 and a fix for an incorrect
     exynos-ppmu DT binding (Chanwoo Choi, Javier Martinez Canillas).

   - System sleep support updates (Alan Stern, Len Brown, SungEun Kim).

   - rockchip-io AVS support updates (Heiko Stuebner).

   - PM core clocks support fixup (Colin Ian King).

   - Power capping RAPL driver update including support for Skylake H/S
     and Broadwell-H (Radivoje Jovanovic, Seiichi Ikarashi).

   - Generic device properties framework fixes related to the handling
     of static (driver-provided) property sets (Andy Shevchenko).

   - turbostat and cpupower updates (Len Brown, Shilpasri G Bhat,
     Shreyas B Prabhu)"

* tag 'pm+acpi-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (180 commits)
  cpufreq: speedstep-lib: Use monotonic clock
  cpufreq: powernv: Increase the verbosity of OCC console messages
  cpufreq: sfi: use kmemdup rather than duplicating its implementation
  cpufreq: drop !cpufreq_driver check from cpufreq_parse_governor()
  cpufreq: rename cpufreq_real_policy as cpufreq_user_policy
  cpufreq: remove redundant 'policy' field from user_policy
  cpufreq: remove redundant 'governor' field from user_policy
  cpufreq: update user_policy.* on success
  cpufreq: use memcpy() to copy policy
  cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event
  cpufreq: mediatek: Add MT8173 cpufreq driver
  dt-bindings: mediatek: Add MT8173 CPU DVFS clock bindings
  PM / Domains: Fix typo in description of genpd_dev_pm_detach()
  PM / Domains: Remove unusable governor dummies
  PM / Domains: Make pm_genpd_init() available to modules
  PM / domains: Align column headers and data in pm_genpd_summary output
  powercap / RAPL: disable the 2nd power limit properly
  tools: cpupower: Fix error when running cpupower monitor
  PM / OPP: Drop unlikely before IS_ERR(_OR_NULL)
  PM / OPP: Fix static checker warning (broken 64bit big endian systems)
  ...
2015-09-01 19:45:46 -07:00
Linus Torvalds
e713c80a4e Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 clockevent update from Thomas Gleixner:
 "A single commit, which converts HPET clockevents driver to the new
  callbacks"

* 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hpet: Migrate to new set_state interface
2015-09-01 15:42:28 -07:00
Linus Torvalds
43af9872f5 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Thomas Gleixner:
 "This udpate contains:

   - rework the irq vector array to store a pointer to the irq
     descriptor instead of the irq number to avoid a lookup of the irq
     descriptor in the irq entry path

   - lguest interrupt handling cleanups

   - conversion of the local apic timer to the new clockevent callbacks

   - preparatory changes for the irq argument removal of interrupt flow
     handlers"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Do not dereference irq descriptor before checking it
  tools/lguest: Clean up include dir
  tools/lguest: Fix redefinition of struct virtio_pci_cfg_cap
  x86/irq: Store irq descriptor in vector array
  genirq: Provide irq_desc_has_action
  x86/irq: Get rid of an indentation level
  x86/irq: Rename VECTOR_UNDEFINED to VECTOR_UNUSED
  x86/irq: Replace numeric constant
  x86/irq: Protect smp_cleanup_move
  x86/lguest: Do not setup unused irq vectors
  x86/lguest: Clean up lguest_setup_irq
  x86/apic: Drop local_irq_save/restore in timer callbacks
  x86/apic: Migrate apic timer to new set_state interface
  x86/irq: Use access helper irq_data_get_affinity_mask()
  x86/irq: Use accessor irq_data_get_irq_handler_data()
  x86/irq: Use accessor irq_data_get_node()
2015-09-01 15:20:51 -07:00
Linus Torvalds
5e359bf221 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Rather large, but nothing exiting:

   - new range check for settimeofday() to prevent that boot time
     becomes negative.
   - fix for file time rounding
   - a few simplifications of the hrtimer code
   - fix for the proc/timerlist code so the output of clock realtime
     timers is accurate
   - more y2038 work
   - tree wide conversion of clockevent drivers to the new callbacks"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (88 commits)
  hrtimer: Handle failure of tick_init_highres() gracefully
  hrtimer: Unconfuse switch_hrtimer_base() a bit
  hrtimer: Simplify get_target_base() by returning current base
  hrtimer: Drop return code of hrtimer_switch_to_hres()
  time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64()
  time: Introduce current_kernel_time64()
  time: Introduce struct itimerspec64
  time: Add the common weak version of update_persistent_clock()
  time: Always make sure wall_to_monotonic isn't positive
  time: Fix nanosecond file time rounding in timespec_trunc()
  timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers
  cris/time: Migrate to new 'set-state' interface
  kernel: broadcast-hrtimer: Migrate to new 'set-state' interface
  xtensa/time: Migrate to new 'set-state' interface
  unicore/time: Migrate to new 'set-state' interface
  um/time: Migrate to new 'set-state' interface
  sparc/time: Migrate to new 'set-state' interface
  sh/localtimer: Migrate to new 'set-state' interface
  score/time: Migrate to new 'set-state' interface
  s390/time: Migrate to new 'set-state' interface
  ...
2015-09-01 14:04:50 -07:00
Linus Torvalds
361f7d1757 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core platform updates from Ingo Molnar:
 "The main changes are:

   - Intel Atom platform updates.  (Andy Shevchenko)

   - modularity fixlets.  (Paul Gortmaker)

   - x86 platform clockevents driver updates for lguest, uv and Xen.
     (Viresh Kumar)

   - Microsoft Hyper-V TSC fixlet.  (Vitaly Kuznetsov)"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform: Make atom/pmc_atom.c explicitly non-modular
  x86/hyperv: Mark the Hyper-V TSC as unstable
  x86/xen/time: Migrate to new set-state interface
  x86/uv/time: Migrate to new set-state interface
  x86/lguest/timer: Migrate to new set-state interface
  x86/pci/intel_mid_pci: Use proper constants for irq polarity
  x86/pci/intel_mid_pci: Make intel_mid_pci_ops static
  x86/pci/intel_mid_pci: Propagate actual return code
  x86/pci/intel_mid_pci: Work around for IRQ0 assignment
  x86/platform/iosf_mbi: Add Intel Tangier PCI id
  x86/platform/iosf_mbi: Source cleanup
  x86/platform/iosf_mbi: Remove NULL pointer checks for pci_dev_put()
  x86/platform/iosf_mbi: Check return value of debugfs_create properly
  x86/platform/iosf_mbi: Move to dedicated folder
  x86/platform/intel/pmc_atom: Move the PMC-Atom code to arch/x86/platform/atom
  x86/platform/intel/pmc_atom: Add Cherrytrail PMC interface
  x86/platform/intel/pmc_atom: Supply register mappings via PMC object
  x86/platform/intel/pmc_atom: Print index of device in loop
  x86/platform/intel/pmc_atom: Export accessors to PMC registers
2015-09-01 10:33:31 -07:00
Linus Torvalds
25525bea46 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The dominant change in this cycle was the continued work to isolate
  kernel drivers from MTRR legacies: this tree gets rid of all kernel
  internal driver interfaces to MTRRs (mostly by rewriting it to proper
  PAT interfaces), the only access left is the /proc/mtrr ABI.

  This work was done by Luis R Rodriguez.

  There's also some related PCI interface additions for which I've
  Cc:-ed Bjorn"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/mm/mtrr: Remove kernel internal MTRR interfaces: unexport mtrr_add() and mtrr_del()
  s390/io: Add pci_iomap_wc() and pci_iomap_wc_range()
  drivers/dma/iop-adma: Use dma_alloc_writecombine() kernel-style
  drivers/video/fbdev/vt8623fb: Use arch_phys_wc_add() and pci_iomap_wc()
  drivers/video/fbdev/s3fb: Use arch_phys_wc_add() and pci_iomap_wc()
  drivers/video/fbdev/arkfb.c: Use arch_phys_wc_add() and pci_iomap_wc()
  PCI: Add pci_iomap_wc() variants
  drivers/video/fbdev/gxt4500: Use pci_ioremap_wc_bar() to map framebuffer
  drivers/video/fbdev/kyrofb: Use arch_phys_wc_add() and pci_ioremap_wc_bar()
  drivers/video/fbdev/i740fb: Use arch_phys_wc_add() and pci_ioremap_wc_bar()
  PCI: Add pci_ioremap_wc_bar()
  x86/mm: Make kernel/check.c explicitly non-modular
  x86/mm/pat: Make mm/pageattr[-test].c explicitly non-modular
  x86/mm/pat: Add comments to cachemode translation tables
  arch/*/io.h: Add ioremap_uc() to all architectures
  drivers/video/fbdev/atyfb: Use arch_phys_wc_add() and ioremap_wc()
  drivers/video/fbdev/atyfb: Replace MTRR UC hole with strong UC
  drivers/video/fbdev/atyfb: Clarify ioremap() base and length used
  drivers/video/fbdev/atyfb: Carve out framebuffer length fudging into a helper
  x86/mm, asm-generic: Add IOMMU ioremap_uc() variant default
  ...
2015-09-01 10:07:40 -07:00
Linus Torvalds
2962156d5c Merge branch 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 irq fixlet from Ingo Molnar:
 "A single change that hides the 'HYP:' line in /proc/interrupts when
  it's unused"

* 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Hide 'HYP:' line in /proc/interrupts when not on Xen/Hyper-V
2015-09-01 10:05:44 -07:00
Linus Torvalds
6b2282aa37 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
 "Two changes: a suspend/resume quirk and a new CPUID bit definition"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpufeature: Add feature bit for Intel's Silicon Debug CPUID bit
  x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume
2015-09-01 09:41:03 -07:00
Linus Torvalds
0c0fee018d Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 init code fixlet from Ingo Molnar:
 "A single change: fix obsolete init code annotations"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Drop bogus __ref / __refdata annotations
2015-09-01 09:33:26 -07:00
Linus Torvalds
11e612ddb4 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "The main x86 bootup related changes in this cycle were:

   - more boot time optimizations.  (Len Brown)

   - implement hex output to allow the debugging of early bootup
     parameters.  (Kees Cook)

   - remove obsolete MCA leftovers.  (Paolo Pisati)"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/smpboot: Remove APIC.wait_for_init_deassert and atomic init_deasserted
  x86/smpboot: Remove SIPI delays from cpu_up()
  x86/smpboot: Remove udelay(100) when polling cpu_callin_map
  x86/smpboot: Remove udelay(100) when polling cpu_initialized_map
  x86/boot: Obsolete the MCA sys_desc_table
  x86/boot: Add hex output for debugging
2015-09-01 09:04:31 -07:00
Linus Torvalds
5778077d03 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm changes from Ingo Molnar:
 "The biggest changes in this cycle were:

   - Revamp, simplify (and in some cases fix) Time Stamp Counter (TSC)
     primitives.  (Andy Lutomirski)

   - Add new, comprehensible entry and exit handlers written in C.
     (Andy Lutomirski)

   - vm86 mode cleanups and fixes.  (Brian Gerst)

   - 32-bit compat code cleanups.  (Brian Gerst)

  The amount of simplification in low level assembly code is already
  palpable:

     arch/x86/entry/entry_32.S                          | 130 +----
     arch/x86/entry/entry_64.S                          | 197 ++-----

  but more simplifications are planned.

  There's also the usual laudry mix of low level changes - see the
  changelog for details"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (83 commits)
  x86/asm: Drop repeated macro of X86_EFLAGS_AC definition
  x86/asm/msr: Make wrmsrl() a function
  x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer
  x86/asm: Add MONITORX/MWAITX instruction support
  x86/traps: Weaken context tracking entry assertions
  x86/asm/tsc: Add rdtscll() merge helper
  selftests/x86: Add syscall_nt selftest
  selftests/x86: Disable sigreturn_64
  x86/vdso: Emit a GNU hash
  x86/entry: Remove do_notify_resume(), syscall_trace_leave(), and their TIF masks
  x86/entry/32: Migrate to C exit path
  x86/entry/32: Remove 32-bit syscall audit optimizations
  x86/vm86: Rename vm86->v86flags and v86mask
  x86/vm86: Rename vm86->vm86_info to user_vm86
  x86/vm86: Clean up vm86.h includes
  x86/vm86: Move the vm86 IRQ definitions to vm86.h
  x86/vm86: Use the normal pt_regs area for vm86
  x86/vm86: Eliminate 'struct kernel_vm86_struct'
  x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86'
  x86/vm86: Move vm86 fields out of 'thread_struct'
  ...
2015-09-01 08:40:25 -07:00
Linus Torvalds
65a99597f0 Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull NOHZ updates from Ingo Molnar:
 "The main changes, mostly written by Frederic Weisbecker, include:

   - Fix some jiffies based cputime assumptions.  (No real harm because
     the concerned code isn't used by full dynticks.)

   - Simplify jiffies <-> usecs conversions.  Remove dead code.

   - Remove early hacks on nohz full code that avoided messing up idle
     nohz internals.  Now nohz integrates well full and idle and such
     hack have become needless.

   - Restart nohz full tick from irq exit.  (A simplification and a
     preparation for future optimization on scheduler kick to nohz
     full)

   - Code cleanups.

   - Tile driver isolation enhancement on top of nohz.  (Chris Metcalf)"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  nohz: Remove useless argument on tick_nohz_task_switch()
  nohz: Move tick_nohz_restart_sched_tick() above its users
  nohz: Restart nohz full tick from irq exit
  nohz: Remove idle task special case
  nohz: Prevent tilegx network driver interrupts
  alpha: Fix jiffies based cputime assumption
  apm32: Fix cputime == jiffies assumption
  jiffies: Remove HZ > USEC_PER_SEC special case
2015-08-31 21:04:24 -07:00
Linus Torvalds
3959df1dfb Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
 "MCE handling updates, but also some generic drivers/edac/ changes to
  better organize the Kconfig space"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ras: Move AMD MCE injector to arch/x86/ras/
  x86/mce: Add a wrapper around mce_log() for injection
  x86/mce: Rename rcu_dereference_check_mce() to mce_log_get_idx_check()
  RAS: Add a menuconfig option with descriptive text
  x86/mce: Reenable CMCI banks when swiching back to interrupt mode
  x86/mce: Clear Local MCE opt-in before kexec
  x86/mce: Remove unused function declarations
  x86/mce: Kill drain_mcelog_buffer()
  x86/mce: Avoid potential deadlock due to printk() in MCE context
  x86/mce: Remove the MCE ring for Action Optional errors
  x86/mce: Don't use percpu workqueues
  x86/mce: Provide a lockless memory pool to save error records
  x86/mce: Reuse one of the u16 padding fields in 'struct mce'
2015-08-31 20:20:30 -07:00
Linus Torvalds
41d859a83c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Main perf kernel side changes:

   - uprobes updates/fixes.  (Oleg Nesterov)

   - Add PERF_RECORD_SWITCH to indicate context switches and use it in
     tooling.  (Adrian Hunter)

   - Support BPF programs attached to uprobes and first steps for BPF
     tooling support.  (Wang Nan)

   - x86 generic x86 MSR-to-perf PMU driver.  (Andy Lutomirski)

   - x86 Intel PT, LBR and BTS updates.  (Alexander Shishkin)

   - x86 Intel Skylake support.  (Andi Kleen)

   - x86 Intel Knights Landing (KNL) RAPL support.  (Dasaratharaman
     Chandramouli)

   - x86 Intel Broadwell-DE uncore support.  (Kan Liang)

   - x86 hw breakpoints robustization (Andy Lutomirski)

  Main perf tooling side changes:

   - Support Intel PT in several tools, enabling the use of the
     processor trace feature introduced in Intel Broadwell processors:
     (Adrian Hunter)

       # dmesg | grep Performance
       # [0.188477] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell events, full-width counters, Intel PMU driver.
       # perf record -e intel_pt//u -a sleep 1
       [ perf record: Woken up 1 times to write data ]
       [ perf record: Captured and wrote 0.216 MB perf.data ]
       # perf script # then navigate in the tool output to some area, like this one:
       184 1030 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba661440 dl_main (/usr/lib64/ld-2.17.so)
       185 1457 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba669f10 _dl_new_object (/usr/lib64/ld-2.17.so)
       186 9f37 _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba677b90 strlen (/usr/lib64/ld-2.17.so)
       187 7ba3 strlen (/usr/lib64/ld-2.17.so) => 7f21ba677c75 strlen (/usr/lib64/ld-2.17.so)
       188 7c78 strlen (/usr/lib64/ld-2.17.so) => 7f21ba669f3c _dl_new_object (/usr/lib64/ld-2.17.so)
       189 9f8a _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba65fab0 calloc@plt (/usr/lib64/ld-2.17.so)
       190 fab0 calloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e70 calloc (/usr/lib64/ld-2.17.so)
       191 5e87 calloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa90 malloc@plt (/usr/lib64/ld-2.17.so)
       192 fa90 malloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e60 malloc (/usr/lib64/ld-2.17.so)
       193 5e68 malloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so)
       194 fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so) => 7f21ba675d50 __libc_memalign (/usr/lib64/ld-2.17.so)
       195 5d63 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e20 __libc_memalign (/usr/lib64/ld-2.17.so)
       196 5e40 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675d73 __libc_memalign (/usr/lib64/ld-2.17.so)
       197 5d97 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e18 __libc_memalign (/usr/lib64/ld-2.17.so)
       198 5e1e __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675df9 __libc_memalign (/usr/lib64/ld-2.17.so)
       199 5e10 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba669f8f _dl_new_object (/usr/lib64/ld-2.17.so)
       200 9fc2 _dl_new_object (/usr/lib64/ld-2.17.so) =>  7f21ba678e70 memcpy (/usr/lib64/ld-2.17.so)
       201 8e8c memcpy (/usr/lib64/ld-2.17.so) => 7f21ba678ea0 memcpy (/usr/lib64/ld-2.17.so)

   - Add support for using several Intel PT features (CYC, MTC packets),
     the relevant documentation was updated in:
         tools/perf/Documentation/intel-pt.txt
     briefly describing those packets, its purposes, how to configure
     them in the event config terms and relevant external documentation
     for further reading.  (Adrian Hunter)

   - Introduce support for probing at an absolute address, for user and
     kernel 'perf probe's, useful when one have the symbol maps on a
     developer machine but not on an embedded system.  (Wang Nan)

   - Add Intel BTS support, with a call-graph script to show it and PT
     in use in a GUI using 'perf script' python scripting with
     postgresql and Qt.  (Adrian Hunter)

   - Allow selecting the type of callchains per event, including
     disabling callchains in all but one entry in an event list, to save
     space, and also to ask for the callchains collected in one event to
     be used in other events.  (Kan Liang)

   - Beautify more syscall arguments in 'perf trace': (Arnaldo Carvalho
     de Melo)
       * A bunch more translate file/pathnames from pointers to strings.
       * Convert numbers to strings for the 'keyctl' syscall 'option'
         arg.
       * Add missing 'clockid' entries.

   - Introduce 'srcfile' sort key: (Andi Kleen)

       # perf record -F 10000 usleep 1
       # perf report --stdio --dsos '[kernel.vmlinux]' -s srcfile
       <SNIP>
       # Overhead  Source File
          26.49%  copy_page_64.S
           5.49%  signal.c
           0.51%  msr.h
       #

     It can be combined with other fields, for instance, experiment with
     '-s srcfile,symbol'.

     There are some oddities in some distros and with some specific
     DSOs, being investigated, so your mileage may vary.

   - Support per-event 'freq' term: (Namhyung Kim)

       $ perf record -e 'cpu/instructions,freq=1234/',cycles -c 1000 sleep 1
       $ perf evlist -F
       cpu/instructions,freq=1234/: sample_freq=1234
       cycles: sample_period=1000
       $

   - Deref sys_enter pointer args with contents from probe:vfs_getname,
     showing pathnames instead of pointers in many syscalls in 'perf
     trace'.  (Arnaldo Carvalho de Melo)

   - Stop collecting /proc/kallsyms in perf.data files, saving about
     4.5MB on a typical x86-64 system, use the the symbol resolution
     routines used in all the other tools (report, top, etc) now that we
     can ask libtraceevent to use perf's symbol resolution code.
     (Arnaldo Carvalho de Melo)

   - Allow filtering out of perf's PID via 'perf record --exclude-perf'.
     (Wang Nan)

   - 'perf trace' now supports syscall groups, like strace, i.e:

       $ trace -e file touch file

     Will expand 'file' into multiple, file related, syscalls.  More
     work needed to add extra groups for other syscall groups, and also
     to complement what was added for the 'file' group, included as a
     proof of concept.  (Arnaldo Carvalho de Melo)

   - Add lock_pi stresser to 'perf bench futex', to test the kernel code
     related to FUTEX_(UN)LOCK_PI.  (Davidlohr Bueso)

   - Let user have timestamps with per-thread recording in 'perf record'
     (Adrian Hunter)

   - ... and tons of other changes, see the shortlog and the Git log for
     details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (240 commits)
  perf evlist: Add backpointer for perf_env to evlist
  perf tools: Rename perf_session_env to perf_env
  perf tools: Do not change lib/api/fs/debugfs directly
  perf tools: Add tracing_path and remove unneeded functions
  perf buildid: Introduce sysfs/filename__sprintf_build_id
  perf evsel: Add a backpointer to the evlist a evsel is in
  perf trace: Add header with copyright and background info
  perf scripts python: Add new compaction-times script
  perf stat: Get correct cpu id for print_aggr
  tools lib traceeveent: Allow for negative numbers in print format
  perf script: Add --[no-]-demangle/--[no-]-demangle-kernel
  tracing/uprobes: Do not print '0x (null)' when offset is 0
  perf probe: Support probing at absolute address
  perf probe: Fix error reported when offset without function
  perf probe: Fix list result when address is zero
  perf probe: Fix list result when symbol can't be found
  tools build: Allow duplicate objects in the object list
  perf tools: Remove export.h from MANIFEST
  perf probe: Prevent segfault when reading probe point with absolute address
  perf tools: Update Intel PT documentation
  ...
2015-08-31 19:49:05 -07:00
Rafael J. Wysocki
5d2a1a927d Merge branches 'acpi-pci', 'acpi-soc', 'acpi-ec' and 'acpi-osl'
* acpi-pci:
  ACPI, PCI: Penalize legacy IRQ used by ACPI SCI

* acpi-soc:
  ACPI / LPSS: Ignore 10ms delay for Braswell

* acpi-ec:
  ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations

* acpi-osl:
  ACPI / osl: replace custom implementation of readq / writeq
2015-09-01 03:41:19 +02:00
Linus Torvalds
7073bc6612 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main RCU changes in this cycle are:

   - the combination of tree geometry-initialization simplifications and
     OS-jitter-reduction changes to expedited grace periods.  These two
     are stacked due to the large number of conflicts that would
     otherwise result.

   - privatize smp_mb__after_unlock_lock().

     This commit moves the definition of smp_mb__after_unlock_lock() to
     kernel/rcu/tree.h, in recognition of the fact that RCU is the only
     thing using this, that nothing else is likely to use it, and that
     it is likely to go away completely.

   - documentation updates.

   - torture-test updates.

   - misc fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  rcu,locking: Privatize smp_mb__after_unlock_lock()
  rcu: Silence lockdep false positive for expedited grace periods
  rcu: Don't disable CPU hotplug during OOM notifiers
  scripts: Make checkpatch.pl warn on expedited RCU grace periods
  rcu: Update MAINTAINERS entry
  rcu: Clarify CONFIG_RCU_EQS_DEBUG help text
  rcu: Fix backwards RCU_LOCKDEP_WARN() in synchronize_rcu_tasks()
  rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()
  rcu: Make rcu_is_watching() really notrace
  cpu: Wait for RCU grace periods concurrently
  rcu: Create a synchronize_rcu_mult()
  rcu: Fix obsolete priority-boosting comment
  rcu: Use WRITE_ONCE in RCU_INIT_POINTER
  rcu: Hide RCU_NOCB_CPU behind RCU_EXPERT
  rcu: Add RCU-sched flavors of get-state and cond-sync
  rcu: Add fastpath bypassing funnel locking
  rcu: Rename RCU_GP_DONE_FQS to RCU_GP_DOING_FQS
  rcu: Pull out wait_event*() condition into helper function
  documentation: Describe new expedited stall warnings
  rcu: Add stall warnings to synchronize_sched_expedited()
  ...
2015-08-31 18:12:07 -07:00
Linus Torvalds
1af115d675 Driver core patches for 4.3-rc1
Here is the new patches for the driver core / sysfs for 4.3-rc1.
 
 Very small number of changes here, all the details are in the shortlog,
 nothing major happening at all this kernel release, which is nice to
 see.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlXV9EwACgkQMUfUDdst+ylv1ACgj7srYyvumehX1zfRVzEWNuez
 chQAoKHnSpDMME/WmhQQRxzQ5pfd1Pni
 =uGHg
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the new patches for the driver core / sysfs for 4.3-rc1.

  Very small number of changes here, all the details are in the
  shortlog, nothing major happening at all this kernel release, which is
  nice to see"

* tag 'driver-core-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  bus: subsys: update return type of ->remove_dev() to void
  driver core: correct device's shutdown order
  driver core: fix docbook for device_private.device
  selftests: firmware: skip timeout checks for kernels without user mode helper
  kernel, cpu: Remove bogus __ref annotations
  cpu: Remove bogus __ref annotation of cpu_subsys_online()
  firmware: fix wrong memory deallocation in fw_add_devm_name()
  sysfs.txt: update show method notes about sprintf/snprintf/scnprintf usage
  devres: fix devres_get()
2015-08-31 08:47:40 -07:00
Linus Torvalds
1c00038c76 Char/Misc driver patches for 4.3-rc1
Here's the "big" char/misc driver update for 4.3-rc1.
 
 Not much really interesting here, just a number of little changes all
 over the place, and some nice consolidation of the nvmem drivers to a
 common framework.  As usual, the mei drivers stand out as the largest
 "churn" to handle new devices and features in their hardware.
 
 All have been in linux-next for a while with no issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlXV844ACgkQMUfUDdst+ymYfQCgmDKjq3fsVHCxNZPxnukFYzvb
 xZkAnRb8fuub5gVQFP29A+rhyiuWD13v
 =Bq9K
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver patches from Greg KH:
 "Here's the "big" char/misc driver update for 4.3-rc1.

  Not much really interesting here, just a number of little changes all
  over the place, and some nice consolidation of the nvmem drivers to a
  common framework.  As usual, the mei drivers stand out as the largest
  "churn" to handle new devices and features in their hardware.

  All have been in linux-next for a while with no issues"

* tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits)
  auxdisplay: ks0108: initialize local parport variable
  extcon: palmas: Fix build break due to devm_gpiod_get_optional API change
  extcon: palmas: Support GPIO based USB ID detection
  extcon: Fix signedness bugs about break error handling
  extcon: Drop owner assignment from i2c_driver
  extcon: arizona: Simplify pdata symantics for micd_dbtime
  extcon: arizona: Declare 3-pole jack if we detect open circuit on mic
  extcon: Add exception handling to prevent the NULL pointer access
  extcon: arizona: Ensure variables are set for headphone detection
  extcon: arizona: Use gpiod inteface to handle micd_pol_gpio gpio
  extcon: arizona: Add basic microphone detection DT/ACPI bindings
  extcon: arizona: Update to use the new device properties API
  extcon: palmas: Remove the mutually_exclusive array
  extcon: Remove optional print_state() function pointer of struct extcon_dev
  extcon: Remove duplicate header file in extcon.h
  extcon: max77843: Clear IRQ bits state before request IRQ
  toshiba laptop: replace ioremap_cache with ioremap
  misc: eeprom: max6875: clean up max6875_read()
  misc: eeprom: clean up eeprom_read()
  misc: eeprom: 93xx46: clean up eeprom_93xx46_bin_read/write
  ...
2015-08-31 08:34:13 -07:00
Ingo Molnar
02b643b643 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-31 10:25:26 +02:00
Thomas Gleixner
a47d4576cd x86/irq: Do not dereference irq descriptor before checking it
Having the IS_NULL_OR_ERR() check after dereferencing the pointer is
not really working well.

Move the dereference after the check.

Fixes: a782a7e46b 'x86/irq: Store irq descriptor in vector array'
Reported-and-tested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-28 10:30:15 +02:00
Luis R. Rodriguez
2baa891e42 x86/mm/mtrr: Remove kernel internal MTRR interfaces: unexport mtrr_add() and mtrr_del()
The effort to replace mtrr_add() with architecture agnostic
arch_phys_wc_add() is complete, this will ensure write-combining
implementations (PAT on x86) is taken advantage instead of using
MTRR. With the effort done now, hide direct MTRR access for
drivers.

The legacy user-space /proc/mtrr ABI is not affected.

Update x86 documentation on MTRR to reflect the completion of
the phasing out of direct access to MTRR, also add a note on
platform firmware code use of MTRRs based on the obituary
discussion of MTRRs on Linux [0].

  [0] http://lkml.kernel.org/r/1438991330.3109.196.camel@hp.com

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: <syrjala@sci.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Doug Ledford <dledford@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: airlied@linux.ie
Cc: benh@kernel.crashing.org
Cc: bhelgaas@google.com
Cc: dan.j.williams@intel.com
Cc: konrad.wilk@oracle.com
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: mst@redhat.com
Cc: netdev@vger.kernel.org
Cc: vinod.koul@intel.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1440443613-13696-12-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-28 10:09:28 +02:00
Jiang Liu
5d0ddfebb9 ACPI, PCI: Penalize legacy IRQ used by ACPI SCI
Nick Meier reported a regression with HyperV that "
  After rebooting the VM, the following messages are logged in syslog
  when trying to load the tulip driver:
    tulip: Linux Tulip drivers version 1.1.15 (Feb 27, 2007)
    tulip: 0000:00:0a.0: PCI INT A: failed to register GSI
    tulip: Cannot enable tulip board #0, aborting
    tulip: probe of 0000:00:0a.0 failed with error -16
  Errors occur in 3.19.0 kernel
  Works in 3.17 kernel.
"

According to the ACPI dump file posted by Nick at
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1440072

The ACPI MADT table includes an interrupt source overridden entry for
ACPI SCI:
[236h 0566  1]                Subtable Type : 02 <Interrupt Source Override>
[237h 0567  1]                       Length : 0A
[238h 0568  1]                          Bus : 00
[239h 0569  1]                       Source : 09
[23Ah 0570  4]                    Interrupt : 00000009
[23Eh 0574  2]        Flags (decoded below) : 000D
                                   Polarity : 1
                               Trigger Mode : 3

And in DSDT table, we have _PRT method to define PCI interrupts, which
eventually goes to:
        Name (PRSA, ResourceTemplate ()
        {
            IRQ (Level, ActiveLow, Shared, )
                {3,4,5,7,9,10,11,12,14,15}
        })
        Name (PRSB, ResourceTemplate ()
        {
            IRQ (Level, ActiveLow, Shared, )
                {3,4,5,7,9,10,11,12,14,15}
        })
        Name (PRSC, ResourceTemplate ()
        {
            IRQ (Level, ActiveLow, Shared, )
                {3,4,5,7,9,10,11,12,14,15}
        })
        Name (PRSD, ResourceTemplate ()
        {
            IRQ (Level, ActiveLow, Shared, )
                {3,4,5,7,9,10,11,12,14,15}
        })

According to the MADT and DSDT tables, IRQ 9 may be used for:
 1) ACPI SCI in level, high mode
 2) PCI legacy IRQ in level, low mode
So there's a conflict in polarity setting for IRQ 9.

Prior to commit cd68f6bd53 ("x86, irq, acpi: Get rid of special
handling of GSI for ACPI SCI"), ACPI SCI is handled specially and
there's no check for conflicts between ACPI SCI and PCI legagy IRQ.
And it seems that the HyperV hypervisor doesn't make use of the
polarity configuration in IOAPIC entry, so it just works.

Commit cd68f6bd53 gets rid of the specially handling of ACPI SCI,
and then the pin attribute checking code discloses the conflicts
between ACPI SCI and PCI legacy IRQ on HyperV virtual machine,
and rejects the request to assign IRQ9 to PCI devices.

So penalize legacy IRQ used by ACPI SCI and mark it unusable if ACPI
SCI attributes conflict with PCI IRQ attributes.

Please refer to following links for more information:
https://bugzilla.kernel.org/show_bug.cgi?id=101301
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1440072

Fixes: cd68f6bd53 ("x86, irq, acpi: Get rid of special handling of GSI for ACPI SCI")
Reported-and-tested-by: Nick Meier <nmeier@microsoft.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: 3.19+ <stable@vger.kernel.org> # 3.19+
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-08-27 01:12:23 +02:00
Ingo Molnar
8d58b66ed2 Linux 4.2-rc8
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJV2pUkAAoJEHm+PkMAQRiGCIoH/Rb29ZjdCoZJp9OtnjAG+qRc
 bG3YuomIdib86x7xHRKKaLWBa7din7IYjuwT/X4S4duO5a1R5Lp1sRG3IlGfhT0W
 nBNbjFl4q4bOyiTPtTRTYyh4g5UQv4IuyCnCmZyCTJyVi/O6HVM9TWKUzm68P2dJ
 30LwLUcQJ+mHueGJwFBAXe2BaojEpvYCdSX6tvbrQ/8X3FrVExZXuJl4uMYNFYNK
 ZwG/v5t7tYOiAe76JGbrEuVFPZWLPEW7amHOWR0T4Ye4nWTlBgx7fENiNRlfgcvI
 CM16l/xkyrZQ3Q5jZy1qYDfdHYF++dyEDysX4w1ae/X0aaLZn7l+u5VQD6WpkQQ=
 =IF6I
 -----END PGP SIGNATURE-----

Merge tag 'v4.2-rc8' into x86/mm, before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-25 09:59:19 +02:00
Paul Gortmaker
13fe86f465 x86/mm: Make kernel/check.c explicitly non-modular
The Kconfig currently controlling compilation of this code is:

  arch/x86/Kconfig:config X86_CHECK_BIOS_CORRUPTION
  arch/x86/Kconfig:       bool "Check for low memory corruption"

...meaning that it currently is not being built as a module by
anyone.

Lets remove the couple traces of modularity so that when reading
the code there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the
non-modular case, the init ordering remains unchanged with this
commit.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1440459295-21814-4-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-25 09:48:38 +02:00
Andy Lutomirski
47edb65178 x86/asm/msr: Make wrmsrl() a function
As of cf991de2f6 ("x86/asm/msr: Make wrmsrl_safe() a
function"), wrmsrl_safe is a function, but wrmsrl is still a
macro.  The wrmsrl macro performs invalid shifts if the value
argument is 32 bits. This makes it unnecessarily awkward to
write code that puts an unsigned long into an MSR.

To make this work, syscall_init needs tweaking to stop passing
a function pointer to wrmsrl.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Willy Tarreau <w@1wt.eu>
Link: http://lkml.kernel.org/r/690f0c629a1085d054e2d1ef3da073cfb3f7db92.1437678821.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-23 13:25:38 +02:00
Thomas Gleixner
a57e456a7b x86/apic: Fix fallout from x2apic cleanup
In the recent x2apic cleanup I got two things really wrong:
1) The safety check in __disable_x2apic which allows the function to
   be called unconditionally is backwards. The check is there to
   prevent access to the apic MSR in case that the machine has no
   apic. Though right now it returns if the machine has an apic and
   therefor the disabling of x2apic is never invoked.

2) x2apic_disable() sets x2apic_mode to 0 after registering the local
   apic. That's wrong, because register_lapic_address() checks x2apic
   mode and therefor takes the wrong code path.

This results in boot failures on machines with x2apic preenabled by
BIOS and can also lead to an fatal MSR access on machines without
apic.

The solutions are simple:
1) Correct the sanity check for apic availability
2) Clear x2apic_mode _before_ calling register_lapic_address()

Fixes: 659006bf3a 'x86/x2apic: Split enable and setup function'
Reported-and-tested-by: Javier Monteagudo <javiermon@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1224764
Cc: stable@vger.kernel.org # 4.0+
Cc: Laura Abbott <labbott@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
2015-08-22 17:01:48 +02:00
Huang Rui
b466bdb614 x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer
MWAITX can enable a timer and a corresponding timer value
specified in SW P0 clocks. The SW P0 frequency is the same as
TSC. The timer provides an upper bound on how long the
instruction waits before exiting.

This way, a delay function in the kernel can leverage that
MWAITX timer of MWAITX.

When a CPU core executes MWAITX, it will be quiesced in a
waiting phase, diminishing its power consumption. This way, we
can save power in comparison to our default TSC-based delays.

A simple test shows that:

	$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc
	$ sleep 10000s
	$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc

Results:

	* TSC-based default delay:      485115 uWatts average power
	* MWAITX-based delay:           252738 uWatts average power

Thus, that's about 240 milliWatts less power consumption. The
test method relies on the support of AMD CPU accumulated power
algorithm in fam15h_power for which patches are forthcoming.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Suggested-by: Borislav Petkov <bp@suse.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ Fix delay truncation. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Li <tony.li@amd.com>
Link: http://lkml.kernel.org/r/1438744732-1459-3-git-send-email-ray.huang@amd.com
Link: http://lkml.kernel.org/r/1439201994-28067-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-22 14:52:16 +02:00
Andy Lutomirski
f0a97af83f x86/traps: Weaken context tracking entry assertions
We were asserting that we were all the way in CONTEXT_KERNEL
when exception handlers were called.  While having this be true
is, I think, a nice goal (or maybe a variant in which we assert
that we're in CONTEXT_KERNEL or some new IRQ context), we're not
quite there.

In particular, if an IRQ interrupts the SYSCALL prologue and the
IRQ handler in turn causes an exception, the exception entry
will be called in RCU IRQ mode but with CONTEXT_USER.

This is okay (nothing goes wrong), but until we fix up the
SYSCALL prologue, we need to avoid warning.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c81faf3916346c0e04346c441392974f49cd7184.1440133286.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-22 11:12:10 +02:00
Ingo Molnar
827409b2f5 x86/fpu/math-emu: Fix crash in fork()
During later stages of math-emu bootup the following crash triggers:

	 math_emulate: 0060:c100d0a8
	 Kernel panic - not syncing: Math emulation needed in kernel
	 CPU: 0 PID: 1511 Comm: login Not tainted 4.2.0-rc7+ #1012
	 [...]
	 Call Trace:
	  [<c181d50d>] dump_stack+0x41/0x52
	  [<c181c918>] panic+0x77/0x189
	  [<c1003530>] ? math_error+0x140/0x140
	  [<c164c2d7>] math_emulate+0xba7/0xbd0
	  [<c100d0a8>] ? fpu__copy+0x138/0x1c0
	  [<c1109c3c>] ? __alloc_pages_nodemask+0x12c/0x870
	  [<c136ac20>] ? proc_clear_tty+0x40/0x70
	  [<c136ac6e>] ? session_clear_tty+0x1e/0x30
	  [<c1003530>] ? math_error+0x140/0x140
	  [<c1003575>] do_device_not_available+0x45/0x70
	  [<c100d0a8>] ? fpu__copy+0x138/0x1c0
	  [<c18258e6>] error_code+0x5a/0x60
	  [<c1003530>] ? math_error+0x140/0x140
	  [<c100d0a8>] ? fpu__copy+0x138/0x1c0
	  [<c100c205>] arch_dup_task_struct+0x25/0x30
	  [<c1048cea>] copy_process.part.51+0xea/0x1480
	  [<c115a8e5>] ? dput+0x175/0x200
	  [<c136af70>] ? no_tty+0x30/0x30
	  [<c1157242>] ? do_vfs_ioctl+0x322/0x540
	  [<c104a21a>] _do_fork+0xca/0x340
	  [<c1057b06>] ? SyS_rt_sigaction+0x66/0x90
	  [<c104a557>] SyS_clone+0x27/0x30
	  [<c1824a80>] sysenter_do_call+0x12/0x12

The reason is the incorrect assumption in fpu_copy(), that FNSAVE
can be executed from math-emu kernels as well.

Don't try to copy the registers, the soft state will be copied
by fork anyway, so the child task inherits the parent task's
soft math state.

With this fix applied math-emu kernels boot up fine on modern
hardware and the 'no387 nofxsr' boot options.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-22 10:23:03 +02:00
Ingo Molnar
5fc960380e x86/fpu/math-emu: Fix math-emu boot crash
On a math-emu bootup the following crash occurs:

	Initializing CPU#0
	------------[ cut here ]------------
	kernel BUG at arch/x86/kernel/traps.c:779!
	invalid opcode: 0000 [#1] SMP
	[...]
	EIP is at do_device_not_available+0xe/0x70
	[...]
	Call Trace:
	 [<c18238e6>] error_code+0x5a/0x60
	 [<c1002bd0>] ? math_error+0x140/0x140
	 [<c100bbd9>] ? fpu__init_cpu+0x59/0xa0
	 [<c1012322>] cpu_init+0x202/0x330
	 [<c104509f>] ? __native_set_fixmap+0x1f/0x30
	 [<c1b56ab0>] trap_init+0x305/0x346
	 [<c1b548af>] start_kernel+0x1a5/0x35d
	 [<c1b542b4>] i386_start_kernel+0x82/0x86

The reason is that in the following commit:

  b1276c48e9 ("x86/fpu: Initialize fpregs in fpu__init_cpu_generic()")

I failed to consider math-emu's limitation that it cannot execute the
FNINIT instruction in kernel mode.

The long term fix might be to allow math-emu to execute (certain) kernel
mode FPU instructions, but for now apply the safe (albeit somewhat ugly)
fix: initialize the emulation state explicitly without trapping out to
the FPU emulator.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-22 10:02:04 +02:00
Vitaly Kuznetsov
88c9281a9f x86/hyperv: Mark the Hyper-V TSC as unstable
The Hyper-V top-level functional specification states, that
"algorithms should be resilient to sudden jumps forward or
backward in the TSC value", this means that we should consider
TSC as unstable. In some cases tsc tests are able to detect the
instability, it was detected in 543 out of 646 boots in my
testing:

 Measured 6277 cycles TSC warp between CPUs, turning off TSC clock.
 tsc: Marking TSC unstable due to check_tsc_sync_source failed

This is, however, just a heuristic. On Hyper-V platform there
are two good clocksources: MSR-based hyperv_clocksource and
recently introduced TSC page.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/1440003264-9949-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-21 08:44:38 +02:00
Ingo Molnar
82819ffb42 perf/x86/msr: Fix the MSR driver build
The new MSR PMU driver made use of rdtsc() which does not exist (yet) in
this tree:

  arch/x86/kernel/cpu/perf_event_msr.c:91:3: error: implicit declaration of function 'rdtsc'

Use the old rdtscll() primitive for now.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-21 08:17:01 +02:00
Jisheng Zhang
e43d0189ac x86/idle: Restore trace_cpu_idle to mwait_idle() calls
Commit b253149b84 ("sched/idle/x86: Restore mwait_idle() to fix boot
hangs, to improve power savings and to improve performance") restores
mwait_idle(), but the trace_cpu_idle related calls are missing. This
causes powertop on my old desktop powered by Intel Core2 E6550 to
report zero wakeups and zero events.

Add them back to restore the proper behaviour.

Fixes: b253149b84 ("sched/idle/x86: Restore mwait_idle() to ...")
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Cc: <len.brown@intel.com>
Cc: stable@vger.kernel.org # 4.1
Link: http://lkml.kernel.org/r/1440046479-4262-1-git-send-email-jszhang@marvell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-20 21:37:45 +02:00
Ingo Molnar
40a2ea1bd9 Merge branch 'perf/urgent' into perf/core, to pick up fixes before adding more changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-20 11:48:56 +02:00
Dan Williams
7a67832c7e libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
We currently register a platform device for e820 type-12 memory and
register a nvdimm bus beneath it.  Registering the platform device
triggers the device-core machinery to probe for a driver, but that
search currently comes up empty.  Building the nvdimm-bus registration
into the e820_pmem platform device registration in this way forces
libnvdimm to be built-in.  Instead, convert the built-in portion of
CONFIG_X86_PMEM_LEGACY to simply register a platform device and move the
rest of the logic to the driver for e820_pmem, for the following
reasons:

1/ Letting e820_pmem support be a module allows building and testing
   libnvdimm.ko changes without rebooting

2/ All the normal policy around modules can be applied to e820_pmem
   (unbind to disable and/or blacklisting the module from loading by
   default)

3/ Moving the driver to a generic location and converting it to scan
   "iomem_resource" rather than "e820.map" means any other architecture can
   take advantage of this simple nvdimm resource discovery mechanism by
   registering a resource named "Persistent Memory (legacy)"

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-19 00:34:34 -04:00
Jiang Liu
527f0a91e9 x86/irq: Build correct vector mapping for multiple MSI interrupts
Alex Deucher, Mark Rustad and Alexander Holler reported a regression
with the latest v4.2-rc4 kernel, which breaks some SATA controllers.
With multi-MSI capable SATA controllers, only the first port works,
all other ports time out when executing SATA commands.

This happens because the first argument to assign_irq_vector_policy()
is always the base linux irq number of the multi MSI interrupt block,
so all subsequent vector assignments operate on the base linux irq
number, so all MSI irqs are handled as the first irq number. Therefor
the other MSI irqs of a device are never set up correctly and never
fire.

Add the loop iterator to the base irq number so all vectors are
assigned correctly.

Fixes: b5dc8e6c21 "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
Reported-and-tested-by: Alex Deucher <alexdeucher@gmail.com>
Reported-and-tested-by: Mark Rustad <mrustad@gmail.com>
Reported-and-tested-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1439911228-9880-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-18 18:18:55 +02:00
Ingo Molnar
a5dd192496 Merge branch 'x86/urgent' into x86/asm to fix up conflicts and to pick up fixes
Conflicts:
	arch/x86/entry/entry_64_compat.S
	arch/x86/math-emu/get_address.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-18 09:39:47 +02:00
Len Brown
656bba3068 x86/smpboot: Remove APIC.wait_for_init_deassert and atomic init_deasserted
Both the per-APIC flag ".wait_for_init_deassert",
and the global atomic_t "init_deasserted"
are dead code -- remove them.

For all APIC types, "wait_for_master()"
prevents an AP from proceeding until the BSP has set
cpu_callout_mask, making "init_deasserted" {unnecessary}:

	BSP: <de-assert INIT>
	...
	BSP: {set init_deasserted}
	AP: wait_for_master()
		set cpu_initialized_mask
		wait for cpu_callout_mask
	BSP: test cpu_initialized_mask
	BSP: set cpu_callout_mask
	AP: test cpu_callout_mask
	AP: {wait for init_deasserted}
	...
	AP: <touch APIC>

Deleting the {dead code} above is necessary to enable
some parallelism in a future patch.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/de4b3a9bab894735e285870b5296da25ee6a8a5a.1439739165.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-17 10:42:28 +02:00
Len Brown
a9bcaa02a5 x86/smpboot: Remove SIPI delays from cpu_up()
MPS 1.4 example code shows the following required delays during processor
on-lining:

	INIT
	 udelay(10,000)
	SIPI
	 udelay(200)
	SIPI
	 udelay(200) /* Linux actually implements this as udelay(300) */

Linux skips the udelay(10,000) on modern processors.
This patch removes the udelay(200) after each SIPI
on those same processors.

All three legacy delays can be restored by the cmdline
"cpu_init_udelay=10000".

As measured by analyze_suspend.py, this patch speeds
processor resume time on my desktop from 2.4ms to 1.8ms, per AP.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/a5dfdbc8fbfdd813784da204aad5677fe459ac37.1439739165.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-17 10:42:27 +02:00
Len Brown
2d99af8e8f x86/smpboot: Remove udelay(100) when polling cpu_callin_map
After the BSP sends INIT/SIPI/SIP to the AP and sees the AP
in the cpu_initialized_map, it sets the AP loose via the
cpu_callout_map, and waits for it via the cpu_callin_map.

The BSP polls the cpu_callin_map with a udelay(100)
and a schedule() in each iteration.

The udelay(100) adds no value.

For example, on my 4-CPU dekstop, the AP finishes
cpu_callin() in under 70 usec and sets the cpu_callin_mask.
The BSP, however, doesn't see that setting until over 30 usec
later, because it was still running its udelay(100)
when the AP finished.

Deleting the udelay(100) in the cpu_callin_mask polling loop,
saves from 0 to 100 usec per Application Processor.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/0aade12eabeb89a688c929fe80856eaea0544bb7.1439739165.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-17 10:42:27 +02:00
Len Brown
6e38f1e79d x86/smpboot: Remove udelay(100) when polling cpu_initialized_map
After the BSP sends the APIC INIT/SIPI/SIPI to the AP,
it waits for the AP to come up and indicate that it is alive
by setting its own bit in the cpu_initialized_mask.

Linux polls for up to 10 seconds for this to happen.
Each polling loop has a udelay(100) and a call to schedule().

The udelay(100) adds no value.

For example, on my desktop, the BSP waits for the
other 3 CPUs to come on line at boot for 305, 404, 405 usec.
For resume from S3, it waits 317, 404, 405 usec.

But when the udelay(100) is removed, the BSP waits
305, 310, 306 for boot, and 305, 307, 306 for resume.

So for both boot and resume, removing the udelay(100)
speeds online by about 100us in 2 of 3 cases.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/33ef746c67d2489cad0a9b1958cf71167232ff2b.1439739165.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-17 10:42:27 +02:00
Ingo Molnar
5461bd81bf Linux 4.2-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJV0R4AAAoJEHm+PkMAQRiG8xIH/AmiRd+JDrs0qqEy46p6X8Gn
 0lB5/KsGycvIGIBTiy2nZzcT0Ly6LeFUKUjzPytlOhIZPMrxMVMShDaQKCXXIMUr
 1mN6hkvpkLNnUhvL2fR6mm0zkjbz3zZEazFY+Jic8wQrtSkHgfH0DXqSAo8le0f8
 kNrd5BPPhIwvpHGaNGFdTpbgpPcalXyQk/fHyvDGidbyXzY/d7l05QfYJ6XCD4Zm
 IAy48iK5BFts2+z3aOYrOeuuCcm1qFX8YArqzE1rfPp+U/LQpfUfij4cmOqDLn/F
 qnv9E7bRRVovvrgKe4I3Trta8kT53VLJvqpdw2Usqo8zvhs4VyrYpHC+gEE6YUY=
 =9Rd4
 -----END PGP SIGNATURE-----

Merge tag 'v4.2-rc7' into x86/boot, to refresh the branch before merging new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-17 10:41:59 +02:00
Linus Torvalds
01565479e9 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Merge x86 fixes from Ingo Molnar:
 "Two followup fixes related to the previous LDT fix"

Also applied a further FPU emulation fix from Andy Lutomirski to the
branch before actually merging it.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
  x86/ldt: Further fix FPU emulation
  x86/ldt: Correct FPU emulation access to LDT
  x86/ldt: Correct LDT access in single stepping logic
2015-08-16 15:11:25 -07:00
Linus Torvalds
b25c6cee55 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes: PMU driver corner cases, tooling fixes, and an 'AUX'
  (Intel PT) race related core fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/cqm: Do not access cpu_data() from CPU_UP_PREPARE handler
  perf/x86/intel: Fix memory leak on hot-plug allocation fail
  perf: Fix PERF_EVENT_IOC_PERIOD migration race
  perf: Fix double-free of the AUX buffer
  perf: Fix fasync handling on inherited events
  perf tools: Fix test build error when bindir contains double slash
  perf stat: Fix transaction lenght metrics
  perf: Fix running time accounting
2015-08-14 10:57:16 -07:00
Linus Torvalds
ed596cde94 Revert x86 sigcontext cleanups
This reverts commits 9a036b93a3 ("x86/signal/64: Remove 'fs' and 'gs'
from sigcontext") and c6f2062935 ("x86/signal/64: Fix SS handling for
signals delivered to 64-bit programs").

They were cleanups, but they break dosemu by changing the signal return
behavior (and removing 'fs' and 'gs' from the sigcontext struct - while
not actually changing any behavior - causes build problems).

Reported-and-tested-by: Stas Sergeev <stsp@list.ru>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-13 12:42:22 -07:00
Borislav Petkov
a79da38494 x86/mce: Add a wrapper around mce_log() for injection
Will be used by an injector module in a following patch.

Additionally, add a missing module export reported by 0-DAY
kernel test.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1439396985-12812-13-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-13 10:12:53 +02:00
Borislav Petkov
9a7783d021 x86/mce: Rename rcu_dereference_check_mce() to mce_log_get_idx_check()
The "rcu_" prefix misleads for it being a proper RCU interface
which is not. It basically checks whether we're preemptible or
holding the chrdev_read mutex.

Rename it accordingly.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1439396985-12812-12-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-13 10:12:53 +02:00
Xie XiuQi
1b48465500 x86/mce: Reenable CMCI banks when swiching back to interrupt mode
Zhang Liguang reported the following issue:

1) System detects a CMCI storm on the current CPU.

2) Kernel disables the CMCI interrupt on banks owned by the
   current CPU and switches to poll mode

3) After the CMCI storm subsides, kernel switches back to
   interrupt mode

4) We expect the system to reenable the CMCI interrupt on banks
   owned by the current CPU

   mce_intel_adjust_timer
   |-> cmci_reenable
       |-> cmci_discover     # owned banks are ignored here

  static void cmci_discover(int banks)
	...
	for (i = 0; i < banks; i++) {
		...
		if (test_bit(i, owned))	# ownd banks is ignore here
			continue;

So convert cmci_storm_disable_banks() to
cmci_toggle_interrupt_mode() which controls whether to enable or
disable CMCI interrupts with its argument.

NB: We cannot clear the owned bit because the banks won't be
polled, otherwise. See:

  27f6c573e0 ("x86, CMCI: Add proper detection of end of CMCI storms")

for more info.

Reported-by: Zhang Liguang <zhangliguang@huawei.com>
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # v3.15+
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: huawei.libin@huawei.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: rui.xiang@huawei.com
Link: http://lkml.kernel.org/r/1439396985-12812-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-13 10:12:52 +02:00
Ashok Raj
8838eb6c0b x86/mce: Clear Local MCE opt-in before kexec
kexec could boot a kernel that could be legacy with no knowledge
of LMCE. Hence we should make sure we clear LMCE optin before
kexec reboot.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1439396985-12812-9-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-13 10:12:52 +02:00
Borislav Petkov
eef4dfa0cb x86/mce: Kill drain_mcelog_buffer()
This used to flush out MCEs logged during early boot and which
were in the MCA registers from a previous system run. No need
for that now, since we've moved to a genpool.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1439396985-12812-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-13 10:12:52 +02:00
Chen, Gong
f29a7aff4b x86/mce: Avoid potential deadlock due to printk() in MCE context
Printing in MCE context is a no-no, currently, as printk() is
not NMI-safe. If some of the notifiers on the MCE chain call do
so, we may deadlock. In order to avoid that, delay printk() to
process context where it is safe.

Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
[ Fold in subsequent patch from Boris for early boot logging. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Kick irq_work in mce_log() directly. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1439396985-12812-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-13 10:12:51 +02:00
Chen, Gong
fd4cf79fcc x86/mce: Remove the MCE ring for Action Optional errors
Use unified genpool to save Action Optional error events and put
Action Optional error handling in the same notification chain as
MCE error decoding.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
[ Fold in subsequent patch from Boris for early boot logging. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Correct a lot. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1439396985-12812-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-13 10:12:51 +02:00
Chen, Gong
061120aed7 x86/mce: Don't use percpu workqueues
An MCE is a rare event. Therefore, there's no need to have
per-CPU instances of both normal and IRQ workqueues. Make them
both global.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
[ Fold in subsequent patch from Rui/Boris/Tony for early boot logging. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1439396985-12812-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-13 10:12:51 +02:00
Chen, Gong
648ed94038 x86/mce: Provide a lockless memory pool to save error records
printk() is not safe to use in MCE context. Add a lockless
memory allocator pool to save error records in MCE context.
Those records will be issued later, in a printk-safe context.
The idea is inspired by the APEI/GHES driver.

We're very conservative and allocate only two pages for it but
since we're going to use those pages throughout the system's
lifetime, we allocate them statically to avoid early boot time
allocation woes.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
[ Rewrite. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1439396985-12812-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-13 10:12:50 +02:00
David Howells
99db443506 PKCS#7: Appropriately restrict authenticated attributes and content type
A PKCS#7 or CMS message can have per-signature authenticated attributes
that are digested as a lump and signed by the authorising key for that
signature.  If such attributes exist, the content digest isn't itself
signed, but rather it is included in a special authattr which then
contributes to the signature.

Further, we already require the master message content type to be
pkcs7_signedData - but there's also a separate content type for the data
itself within the SignedData object and this must be repeated inside the
authattrs for each signer [RFC2315 9.2, RFC5652 11.1].

We should really validate the authattrs if they exist or forbid them
entirely as appropriate.  To this end:

 (1) Alter the PKCS#7 parser to reject any message that has more than one
     signature where at least one signature has authattrs and at least one
     that does not.

 (2) Validate authattrs if they are present and strongly restrict them.
     Only the following authattrs are permitted and all others are
     rejected:

     (a) contentType.  This is checked to be an OID that matches the
     	 content type in the SignedData object.

     (b) messageDigest.  This must match the crypto digest of the data.

     (c) signingTime.  If present, we check that this is a valid, parseable
     	 UTCTime or GeneralTime and that the date it encodes fits within
     	 the validity window of the matching X.509 cert.

     (d) S/MIME capabilities.  We don't check the contents.

     (e) Authenticode SP Opus Info.  We don't check the contents.

     (f) Authenticode Statement Type.  We don't check the contents.

     The message is rejected if (a) or (b) are missing.  If the message is
     an Authenticode type, the message is rejected if (e) is missing; if
     not Authenticode, the message is rejected if (d) - (f) are present.

     The S/MIME capabilities authattr (d) unfortunately has to be allowed
     to support kernels already signed by the pesign program.  This only
     affects kexec.  sign-file suppresses them (CMS_NOSMIMECAP).

     The message is also rejected if an authattr is given more than once or
     if it contains more than one element in its set of values.

 (3) Add a parameter to pkcs7_verify() to select one of the following
     restrictions and pass in the appropriate option from the callers:

     (*) VERIFYING_MODULE_SIGNATURE

	 This requires that the SignedData content type be pkcs7-data and
	 forbids authattrs.  sign-file sets CMS_NOATTR.  We could be more
	 flexible and permit authattrs optionally, but only permit minimal
	 content.

     (*) VERIFYING_FIRMWARE_SIGNATURE

	 This requires that the SignedData content type be pkcs7-data and
	 requires authattrs.  In future, this will require an attribute
	 holding the target firmware name in addition to the minimal set.

     (*) VERIFYING_UNSPECIFIED_SIGNATURE

	 This requires that the SignedData content type be pkcs7-data but
	 allows either no authattrs or only permits the minimal set.

     (*) VERIFYING_KEXEC_PE_SIGNATURE

	 This only supports the Authenticode SPC_INDIRECT_DATA content type
	 and requires at least an SpcSpOpusInfo authattr in addition to the
	 minimal set.  It also permits an SPC_STATEMENT_TYPE authattr (and
	 an S/MIME capabilities authattr because the pesign program doesn't
	 remove these).

     (*) VERIFYING_KEY_SIGNATURE
     (*) VERIFYING_KEY_SELF_SIGNATURE

	 These are invalid in this context but are included for later use
	 when limiting the use of X.509 certs.

 (4) The pkcs7_test key type is given a module parameter to select between
     the above options for testing purposes.  For example:

	echo 1 >/sys/module/pkcs7_test_key/parameters/usage
	keyctl padd pkcs7_test foo @s </tmp/stuff.pkcs7

     will attempt to check the signature on stuff.pkcs7 as if it contains a
     firmware blob (1 being VERIFYING_FIRMWARE_SIGNATURE).

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marcel Holtmann <marcel@holtmann.org>
Reviewed-by: David Woodhouse <David.Woodhouse@intel.com>
2015-08-12 17:01:01 +01:00
Ingo Molnar
9b9412dc70 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

  - The combination of tree geometry-initialization simplifications
    and OS-jitter-reduction changes to expedited grace periods.
    These two are stacked due to the large number of conflicts
    that would otherwise result.

    [ With one addition, a temporary commit to silence a lockdep false
      positive. Additional changes to the expedited grace-period
      primitives (queued for 4.4) remove the cause of this false
      positive, and therefore include a revert of this temporary commit. ]

  - Documentation updates.

  - Torture-test updates.

  - Miscellaneous fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12 12:12:12 +02:00
Takao Indoh
709bc87192 perf/x86/intel/pt: Clean up files of Intel Processor Trace
This patch just cleans up some files of Intel Processor Trace, does not
change its behavior. This patch removes unused definitions and replaces a
constant value with a macro.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin<alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: H.Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438681015-5124-1-git-send-email-indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12 11:43:22 +02:00
Peter Zijlstra
19b3340cf5 perf/x86: Fix MSR PMU driver
Currently we only update the sysfs event files per available MSR, we
didn't actually disallow creating unlisted events.

Rework things such that the dectection, sysfs listing and event
creation are better coordinated.

Sadly it appears it's impossible to probe R/O MSRs under virt. This
means we have to do the full model table to avoid listing all MSRs all
the time.

Tested-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12 11:43:20 +02:00
Ingo Molnar
3d325bf0da Merge branch 'perf/urgent' into perf/core, to pick up fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12 11:39:19 +02:00
Matt Fleming
d7a702f0b1 perf/x86/intel/cqm: Do not access cpu_data() from CPU_UP_PREPARE handler
Tony reports that booting his 144-cpu machine with maxcpus=10 triggers
the following WARN_ON():

[   21.045727] WARNING: CPU: 8 PID: 647 at arch/x86/kernel/cpu/perf_event_intel_cqm.c:1267 intel_cqm_cpu_prepare+0x75/0x90()
[   21.045744] CPU: 8 PID: 647 Comm: systemd-udevd Not tainted 4.2.0-rc4 #1
[   21.045745] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0066.R00.1506021730 06/02/2015
[   21.045747]  0000000000000000 0000000082771b09 ffff880856333ba8 ffffffff81669b67
[   21.045748]  0000000000000000 0000000000000000 ffff880856333be8 ffffffff8107b02a
[   21.045750]  ffff88085b789800 ffff88085f68a020 ffffffff819e2470 000000000000000a
[   21.045750] Call Trace:
[   21.045757]  [<ffffffff81669b67>] dump_stack+0x45/0x57
[   21.045759]  [<ffffffff8107b02a>] warn_slowpath_common+0x8a/0xc0
[   21.045761]  [<ffffffff8107b15a>] warn_slowpath_null+0x1a/0x20
[   21.045762]  [<ffffffff81036725>] intel_cqm_cpu_prepare+0x75/0x90
[   21.045764]  [<ffffffff81036872>] intel_cqm_cpu_notifier+0x42/0x160
[   21.045767]  [<ffffffff8109a33d>] notifier_call_chain+0x4d/0x80
[   21.045769]  [<ffffffff8109a44e>] __raw_notifier_call_chain+0xe/0x10
[   21.045770]  [<ffffffff8107b538>] _cpu_up+0xe8/0x190
[   21.045771]  [<ffffffff8107b65a>] cpu_up+0x7a/0xa0
[   21.045774]  [<ffffffff8165e920>] cpu_subsys_online+0x40/0x90
[   21.045777]  [<ffffffff81433b37>] device_online+0x67/0x90
[   21.045778]  [<ffffffff81433bea>] online_store+0x8a/0xa0
[   21.045782]  [<ffffffff81430e78>] dev_attr_store+0x18/0x30
[   21.045785]  [<ffffffff8126b6ba>] sysfs_kf_write+0x3a/0x50
[   21.045786]  [<ffffffff8126ad40>] kernfs_fop_write+0x120/0x170
[   21.045789]  [<ffffffff811f0b77>] __vfs_write+0x37/0x100
[   21.045791]  [<ffffffff811f38b8>] ? __sb_start_write+0x58/0x110
[   21.045795]  [<ffffffff81296d2d>] ? security_file_permission+0x3d/0xc0
[   21.045796]  [<ffffffff811f1279>] vfs_write+0xa9/0x190
[   21.045797]  [<ffffffff811f2075>] SyS_write+0x55/0xc0
[   21.045800]  [<ffffffff81067300>] ? do_page_fault+0x30/0x80
[   21.045804]  [<ffffffff816709ae>] entry_SYSCALL_64_fastpath+0x12/0x71
[   21.045805] ---[ end trace fe228b836d8af405 ]---

The root cause is that CPU_UP_PREPARE is completely the wrong notifier
action from which to access cpu_data(), because smp_store_cpu_info()
won't have been executed by the target CPU at that point, which in turn
means that ->x86_cache_max_rmid and ->x86_cache_occ_scale haven't been
filled out.

Instead let's invoke our handler from CPU_STARTING and rename it
appropriately.

Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Link: http://lkml.kernel.org/r/1438863163-14083-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12 11:37:23 +02:00
Peter Zijlstra
dbc72b7a0c perf/x86/intel: Fix memory leak on hot-plug allocation fail
We fail to free the shared_regs allocation if the constraint_list
allocation fails.

Cure this and be more consistent in NULL-ing the pointers after free.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12 11:37:22 +02:00
Viresh Kumar
8eda41b086 clockevents/drivers/i8253: Migrate to new 'set-state' interface
Migrate i8253 driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.

This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.

Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2015-08-10 11:40:30 +02:00
Greg Kroah-Hartman
5d44f4b348 Merge 4.2-rc6 into char-misc-next
We want the fixes in Linus's tree in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-09 16:28:09 -07:00
Juergen Gross
136d9d83c0 x86/ldt: Correct LDT access in single stepping logic
Commit 37868fe113 ("x86/ldt: Make modify_ldt synchronous")
introduced a new struct ldt_struct anchored at mm->context.ldt.

convert_ip_to_linear() was changed to reflect this, but indexing
into the ldt has to be changed as the pointer is no longer void *.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: <stable@vger.kernel.org> # On top of: 37868fe113: x86/ldt: Make modify_ldt synchronous
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/1438848278-12906-1-git-send-email-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-08 10:20:45 +02:00
Viresh Kumar
71db87ba57 bus: subsys: update return type of ->remove_dev() to void
Its return value is not used by the subsys core and nothing meaningful
can be done with it, even if we want to use it. The subsys device is
anyway getting removed.

Update prototype of ->remove_dev() to make its return type as void. Fix
all usage sites as well.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-05 17:08:14 -07:00
Thomas Gleixner
a782a7e46b x86/irq: Store irq descriptor in vector array
We can spare the irq_desc lookup in the interrupt entry code if we
store the descriptor pointer in the vector array instead the interrupt
number.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/20150802203609.717724106@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-06 00:14:59 +02:00
Thomas Gleixner
44825757a3 x86/irq: Get rid of an indentation level
Make the code simpler to read.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/20150802203609.555253675@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-06 00:14:59 +02:00
Thomas Gleixner
7276c6a2cb x86/irq: Rename VECTOR_UNDEFINED to VECTOR_UNUSED
VECTOR_UNDEFINED is a misnomer. The vector is defined, but unused.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/20150802203609.477282494@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-06 00:14:58 +02:00
Thomas Gleixner
24c70e07a0 x86/irq: Replace numeric constant
Use the proper define instead of 0.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/20150802203609.385495420@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-06 00:14:58 +02:00
Thomas Gleixner
df54c4934e x86/irq: Protect smp_cleanup_move
smp_cleanup_move fiddles without protection in the interrupt
descriptors and the vector array. A concurrent irq setup/teardown or
affinity setting can pull the rug under that operation.

Add proper locking.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/20150802203609.222975294@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-06 00:14:58 +02:00
Thomas Gleixner
b7edaca4e8 Merge branch 'linus' into x86/apic
Pull in upstream changes to avoid conflicts
2015-08-06 00:00:32 +02:00
Denis V. Lunev
cc2dd4027a mshyperv: fix recognition of Hyper-V guest crash MSR's
Hypervisor Top Level Functional Specification v3.1/4.0 notes that cpuid
(0x40000003) EDX's 10th bit should be used to check that Hyper-V guest
crash MSR's functionality available.

This patch should fix this recognition. Currently the code checks EAX
register instead of EDX.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-04 22:30:44 -07:00
Vitaly Kuznetsov
b4370df2b1 Drivers: hv: vmbus: add special crash handler
Full kernel hang is observed when kdump kernel starts after a crash. This
hang happens in vmbus_negotiate_version() function on
wait_for_completion() as Hyper-V host (Win2012R2 in my testing) never
responds to CHANNELMSG_INITIATE_CONTACT as it thinks the connection is
already established. We need to perform some mandatory minimalistic
cleanup before we start new kernel.

Reported-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-04 22:28:38 -07:00
Vitaly Kuznetsov
2517281d63 Drivers: hv: vmbus: add special kexec handler
When general-purpose kexec (not kdump) is being performed in Hyper-V guest
the newly booted kernel fails with an MCE error coming from the host. It
is the same error which was fixed in the "Drivers: hv: vmbus: Implement
the protocol for tearing down vmbus state" commit - monitor pages remain
special and when they're being written to (as the new kernel doesn't know
these pages are special) bad things happen. We need to perform some
minimalistic cleanup before booting a new kernel on kexec. To do so we
need to register a special machine_ops.shutdown handler to be executed
before the native_machine_shutdown(). Registering a shutdown notification
handler via the register_reboot_notifier() call is not sufficient as it
happens to early for our purposes. machine_ops is not being exported to
modules (and I don't think we want to export it) so let's do this in
mshyperv.c

The minimalistic cleanup consists of cleaning up clockevents, synic MSRs,
guest os id MSR, and hypercall MSR.

Kdump doesn't require all this stuff as it lives in a separate memory
space.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-04 22:25:29 -07:00
Peter Zijlstra
75f80859b1 perf/x86/intel/pebs: Robustify PEBS buffer drain
Vince Weaver and Stephane Eranian reported warnings in the PEBS
code when running the perf fuzzer. Stephane wrote:

  > I can reproduce the problem on my HSW running the fuzzer.
  >
  > I can see why this could be happening if you are mixing PEBS and non PEBS events
  > in the bottom 4 counters. I suspect:
  >         for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
  >                 if ((counts[bit] == 0) && (error[bit] == 0))
  >                         continue;
  >
  > This test is not correct when you have non-PEBS events mixed with
  > PEBS events and they overflow at the same time. They will have
  > counts[i] != 0 but error[i] == 0, and thus you fall thru the loop
  > and hit the assert. Or it is something along those lines.

The only way I can make this work is if ->status only has !PEBS events
set, because if it has both set we'll take that slow path which masks
out the !PEBS bits.

After masking there are 3 options:

 - there is one bit set, and its @bit, we increment counts[bit].

 - there are multiple bits set, we increment error[] for each set bit,
   we do not increment counts[].

 - there are no bits set, we do nothing.

The intent was to never increment counts[] for !PEBS events.

Now if we start out with only a single !PEBS event set, we'll pass the
test and increment counts[] for a !PEBS and hit the warn.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:17:01 +02:00
Liang, Kan
2a853e1123 perf/x86/intel/pebs: Fix event disable PEBS buffer drain
When disabling a PEBS event, we need to drain the buffer. Doing so
requires a correct cpuc->pebs_active mask.

The current code clears the pebs_active bit before draining the
buffer. Fix that.

Signed-off-by: "Liang, Kan" <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver<vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/37D7C6CF3E00A74B8858931C1DB2F07701885A65@SHSMSX103.ccr.corp.intel.com
[ Fixed the SOB. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:17:00 +02:00
Andy Lutomirski
b7b7c7821d perf/x86: Add an MSR PMU driver
This patch adds an MSR PMU to support free running MSR counters. Such
as time and freq related counters includes TSC, IA32_APERF, IA32_MPERF
and IA32_PPERF, but also SMI_COUNT.

The events are exposed in sysfs for use by perf stat and other tools.
The files are under /sys/devices/msr/events/

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kan Liang <kan.liang@intel.com>
[ s/freq/msr/, added SMI_COUNT, fixed bugs. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: adrian.hunter@intel.com
Cc: dsahern@gmail.com
Cc: eranian@google.com
Cc: jolsa@kernel.org
Cc: mark.rutland@arm.com
Cc: namhyung@kernel.org
Link: http://lkml.kernel.org/r/1437407346-31186-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:17:00 +02:00
Kan Liang
070e98873c perf/x86/intel/uncore: Add Broadwell-DE uncore support
The uncore subsystem for Broadwell-DE is similar to Haswell-EP.  There
are some differences in pci device IDs, box number and constraints.

Please refer to the public document:

  http://www.intel.com/content/www/us/en/processors/xeon/xeon-d-1500-uncore-performance-monitoring.html

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1435839172-15114-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:17:00 +02:00
Andi Kleen
8c4fe7095d perf/x86/intel: Use 0x11 as extra reg test value
The next patch adds a new perf extra register where 0x1ff is not a valid
value. Use 0x11 instead.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435707205-6676-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:59 +02:00
Andi Kleen
47732d8863 perf/x86: Make merge_attr() global to use from perf_event_intel
merge_attr() allows to merge two sysfs attribute tables.
Export it to be usable by other files too.

Next patch is going to use that to extend the sysfs format
attributes for a CPU.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1435612935-24425-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:59 +02:00
Andi Kleen
90405aa022 perf/x86/intel/lbr: Limit LBR accesses to TOS in callstack mode
In callstack mode the LBR is not a ring buffer, but a stack that grows up
and down. This means in  this case we don't need to access all LBRs, only the
ones up to TOS. Do this optimization for the normal LBR read, and the context
switch save/restore code. For save/restore it can be done unconditionally, as
it only runs when call stack mode is active.

This recovers some of the cost of going to 32 LBRs on Skylake.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: eranian@google.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1432786398-23861-6-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:59 +02:00
Andi Kleen
e0573364b8 perf/x86/intel/lbr: Use correct index to save/restore LBR_INFO with call stack
Use the correct index to save/restore the LBR_INFO_x MSR in
callstack mode. This is more a cleanup, as even with the wrong
index the register was correctly saved/restored, and also
LBR callgraph mode in perf tools do not really need anything in
LBR_INFO. But still better to use the right index.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: eranian@google.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1432786398-23861-5-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:59 +02:00
Andi Kleen
9a92e16fd7 perf/x86/intel: Add Intel Skylake PMU support
Add perf core PMU support for future Intel Skylake CPU cores.

The code is based on Haswell/Broadwell.

There is a new cache event list, based on the updated Haswell
event list.

Skylake has removed most counter constraints on basic
events, so the basic constraints table now only has a single
entry (plus the fixed counters).

TSX support and various other setups are all shared with Haswell.

Skylake has 32 LBR entries. Add a new LBR init function
to set this up. The filters are all the same as Haswell.

It also has a new LBR format with a separate LBR_INFO_* MSR,
but that has been already added earlier.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1431285767-27027-7-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:58 +02:00
Andi Kleen
425507fa5f perf/x86/intel/lbr: Optimize v4 LBR unfreezing
In Arch perfmon v4 the GLOBAL_STATUS reset automatically unfreezes
LBRs. So no need to do it manually in the LBR code. Add a check
to skip it.

v2: Move test up to beginning of function.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1431285767-27027-9-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:58 +02:00
Andi Kleen
0f29e573dd perf/x86/intel: Move PMU ACK to after LBR read
With Arch Perfmon v4 the PMU ack unfreezes the LBRs. So we need to do
the PMU ack after the LBR reading, otherwise the LBRs would be polluted
by the PMI handler.

This is a minimal change. In principle the ACK could be moved much later.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1431285767-27027-10-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:58 +02:00
Andi Kleen
d8020bee1d perf/x86/intel: Handle new arch perfmon v4 status bits
ArchPerfmon v4 has some new status bits in GLOBAL_STATUS.

These need to be ignored when deciding whether a NMI
was an NMI, to avoid eating all NMIs when they
stay set, see:

    b292d7a104 ("perf/x86/intel: ignore CondChgd bit to avoid false NMI handling")

This patch ignores the new ASIF bit, which indicates
that SGX interfered with the PMU, and also the new
LBR freezing bits, which are set when the LBRs get
frozen, plus the existing CondChange (set by JTAG
debuggers and some buggy BIOSes)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1431285767-27027-8-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:57 +02:00
Andi Kleen
50eab8f6ec perf/x86/intel/lbr: Add support for LBRv5
Add support for the new LBRv5 format used on Intel Skylake CPUs.

The flags for mispredict, abort, in_tx etc. moved to range of separate
LBR_INFO_* MSRs. Teach the LBR code to read those. The original
LBR registers stay the same, except they have full sign
extension now.

LBR_INFO also reports a cycle count to the last branch.
Report the cycle information using the new "cycles" branch_info
output field.

In addition we have to context switch and clear the new INFO
MSRs to avoid any information leaks.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1431285767-27027-6-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:57 +02:00
Andi Kleen
a7b58d211b perf/x86/intel/lbr: Allow time stamp for free running PEBSv3
With PEBSv3 the PEBS record contains a time stamp. That means we can allow
free-running PEBS without a PMI even if the user program requested a time stamp.
This avoids the need to use -T to get free running PEBS, and also avoids
any problems with mis-identifying MMAPs later.

Move the free_running_flags state into a variable in x86_pmu and use it.
This only works when no explicit clock_id is set.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: eranian@google.com
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1432786398-23861-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:56 +02:00
Andi Kleen
2f7ebf2ec2 perf/x86/intel: Add support for PEBSv3 profiling
PEBSv3 is the same as the existing PEBSv2 used on Haswell,
but it adds a new TSC field. Add support to the generic
PEBS handler to handle the new format, and overwrite
the perf time stamp using the new native_sched_clock_from_tsc().

Right now the time stamp is just slightly more accurate,
as it is nearer the actual event trigger point. With
the PEBS threshold > 1 patchkit it will be much more accurate,
avoid the problems with MMAP mismatches earlier.
The accurate time stamping is only implemented for
the default trace clock for now.

v2: Use _skl prefix. Check for default clock_id.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1431285767-27027-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:56 +02:00
Andi Kleen
a94cab2376 perf/x86: Add a native_perf_sched_clock_from_tsc()
PEBSv3 has a raw TSC time stamp in its memory buffer that
later needs to to be converted to perf_clock.

Add a native_sched_clock_from_tsc() that works the same
as native_sched_clock(), but starts with an already given
TSC value.

Paravirt is ignored, it will just get the native clock.
But there isn't a para virtualized PEBS anyway.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1431285767-27027-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:55 +02:00
Alexander Shishkin
b1bf72d669 perf/x86/intel/pt: Add new timing packet enables
Intel PT chapter in the new Intel Architecture SDM adds several packets
corresponding enable bits and registers that control packet generation.
Also, additional bits in the Intel PT CPUID leaf were added to enumerate
presence and parameters of these new packets and features.

The packets and enables are:

  * CYC: cycle accurate mode, provides the number of cycles elapsed since
    previous CYC packet; its presence and available threshold values are
    enumerated via CPUID;

  * MTC: mini time counter packets, used for tracking TSC time between
    full TSC packets; its presence and available resolution options are
    enumerated via CPUID;

  * PSB packet period is now configurable, available period values are
    enumerated via CPUID.

This patch adds corresponding bit and register definitions, pmu driver
capabilities based on CPUID enumeration, new attribute format bits for
the new featurens and extends event configuration validation function
to take these into account.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1438262131-12725-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:55 +02:00
Alexander Shishkin
9a6694cfa2 perf/x86/intel/pt: Do not force sync packets on every schedule-in
Currently, the PT driver zeroes out the status register every time before
starting the event. However, all the writable bits are already taken care
of in pt_handle_status() function, except the new PacketByteCnt field,
which in new versions of PT contains the number of packet bytes written
since the last sync (PSB) packet. Zeroing it out before enabling PT forces
a sync packet to be written. This means that, with the existing code, a
sync packet (PSB and PSBEND, 18 bytes in total) will be generated every
time a PT event is scheduled in.

To avoid these unnecessary syncs and save a WRMSR in the fast path, this
patch changes the default behavior to not clear PacketByteCnt field, so
that the sync packets will be generated with the period specified as
"psb_period" attribute config field. This has little impact on the trace
data as the other packets that are normally sent within PSB+ (between PSB
and PSBEND) have their own generation scenarios which do not depend on the
sync packets.

One exception where we do need to force PSB like this when tracing starts,
so that the decoder has a clear sync point in the trace. For this purpose
we aready have hw::itrace_started flag, which we are currently using to
output PERF_RECORD_ITRACE_START. This patch moves setting itrace_started
from perf core to the pmu::start, where it should still be 0 on the very
first run.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1438264104-16189-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:55 +02:00
Andy Lutomirski
27747f8bc3 perf/x86/hw_breakpoints: Fix check for kernel-space breakpoints
The check looked wrong, although I think it was actually safe.  TASK_SIZE
is unnecessarily small for compat tasks, and it wasn't possible to make
a range breakpoint so large it started in user space and ended in kernel
space.

Nonetheless, let's fix up the check for the benefit of future
readers.  A breakpoint is in the kernel if either end is in the
kernel.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/136be387950e78f18cea60e9d1bef74465d0ee8f.1438312874.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:55 +02:00
Andy Lutomirski
ab513927ab perf/x86/hw_breakpoints: Improve range breakpoint validation
Range breakpoints will do the wrong thing if the address isn't
aligned.  While we're there, add comments about why it's safe for
instruction breakpoints.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ae25d14d61f2f43b78e0a247e469f3072df7e201.1438312874.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:54 +02:00
Andy Lutomirski
e5779e8e12 perf/x86/hw_breakpoints: Disallow kernel breakpoints unless kprobe-safe
Code on the kprobe blacklist doesn't want unexpected int3
exceptions. It probably doesn't want unexpected debug exceptions
either. Be safe: disallow breakpoints in nokprobes code.

On non-CONFIG_KPROBES kernels, there is no kprobe blacklist.  In
that case, disallow kernel breakpoints entirely.

It will be particularly important to keep hw breakpoints out of the
entry and NMI code once we move debug exceptions off the IST stack.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e14b152af99640448d895e3c2a8c2d5ee19a1325.1438312874.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:54 +02:00
Kan Liang
ae3f011fc2 perf/x86/intel: Fix SLM MSR_OFFCORE_RSP1 valid_mask
AVG_LATENCY(bit 38) is only available on MSR_OFFCORE_RSP0.
So the bit should be removed from RSP1 valid_mask.

Since RSP0 and RSP1 may have different valid_mask, intel_alt_er should
validate the config on the alternate offcore reg before replacing it.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435170215-5017-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:54 +02:00
Alexander Shishkin
c749b3e963 perf/x86/intel/lbr: Kill off intel_pmu_needs_lbr_smpl for good
The x86_lbr_exclusive commit (4807034248 "perf/x86: Mark Intel PT and
LBR/BTS as mutually exclusive") mistakenly moved intel_pmu_needs_lbr_smpl()
to perf_event.h, while another commit (a46a230001 "perf: Simplify the
branch stack check") removed it in favor of needs_branch_stack().

This patch gets rid of intel_pmu_needs_lbr_smpl() for good.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1435140349-32588-3-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:53 +02:00
Alexander Shishkin
e9b3bd379c perf/x86/intel/bts: Drop redundant declarations
Both intel_pmu_enable_bts() and intel_pmu_disable_bts() are in perf_event.h
header file, no need to have them declared again in the driver.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1435140349-32588-2-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:53 +02:00
Andi Kleen
3a999587b4 perf/x86/intel/uncore: Use Sandy Bridge client PMU on Haswell/Broadwell
Haswell and Broadwell have the same uncore CBOX/ARB PMU as Sandy Bridge.
Add the respective model numbers to enable the SNB uncore PMU.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1434347862-28490-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:53 +02:00
Andi Kleen
e3a13192d8 perf/x86/intel/uncore: Add support for ARB uncore PMU on Sandy/IvyBridge
Add a new "ARB" uncore PMU that is used to monitor the uncore queue
arbiter. This is useful to measure uncore queue occupancy and similar
statistics. The registers all have the same format as the
existing CBOX PMU.

Also move the event constraints from the CBOX to ARB. The 0x80+
events are ARB events and cannot be scheduled on a CBOX PMU.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1434347862-28490-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:52 +02:00
Vaishali Thakkar
070a7cdfa4 perf/x86/intel/uncore: Remove use of macro DEFINE_PCI_DEVICE_TABLE()
The DEFINE_PCI_DEVICE_TABLE() macro is deprecated. Use
'struct pci_device_id' instead of DEFINE_PCI_DEVICE_TABLE(),
with the goal of getting rid of this macro completely.

This Coccinelle semantic patch performs this transformation:

@@
identifier a;
declarer name DEFINE_PCI_DEVICE_TABLE;
initializer i;
@@
- DEFINE_PCI_DEVICE_TABLE(a)
+ const struct pci_device_id a[] = i;

Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150717052759.GA6265@vaishali-Ideapad-Z570
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:52 +02:00
Dasaratharaman Chandramouli
3a2a779732 perf/x86/intel/rapl: Add support for Knights Landing (KNL)
Knights Landing DRAM RAPL supports PKG and DRAM RAPL domains.
DRAM RAPL has a different fixed energy unit (2^-16J) similar to
that of HSW.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Stephane Eranian <eranian@google.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jacob Pan Jun <jacob.jun.pan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nikhil Rao <nikhil.rao@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/aa63b4a3af3160152fea1a10c807f4200527280c.1432665809.git.dasaratharaman.chandramouli@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04 10:16:52 +02:00
Peter Zijlstra
3bbfafb77a x86, tsc, locking/static_keys: Employ static_branch_likely()
Because of the static_key restrictions we had to take an unconditional
jump for the most likely case, causing $I bloat.

Rewrite to use the new primitives.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-03 11:34:16 +02:00
Peter Zijlstra
76b235c6bc jump_label: Rename JUMP_LABEL_{EN,DIS}ABLE to JUMP_LABEL_{JMP,NOP}
Since we've already stepped away from ENABLE is a JMP and DISABLE is a
NOP with the branch_default bits, and are going to make it even worse,
rename it to make it all clearer.

This way we don't mix multiple levels of logic attributes, but have a
plain 'physical' name for what the current instruction patching status
of a jump label is.

This is a first step in removing the naming confusion that has led to
a stream of avoidable bugs such as:

  a833581e37 ("x86, perf: Fix static_key bug in load_mm_cr4()")

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
[ Beefed up the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-03 11:34:12 +02:00
Brian Gerst
decd275e62 x86/vm86: Rename vm86->v86flags and v86mask
Rename v86flags to veflags, and v86mask to veflags_mask.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-9-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:31:11 +02:00
Brian Gerst
1342635638 x86/vm86: Rename vm86->vm86_info to user_vm86
Make it clearer that this is the pointer to the userspace vm86
state area.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-8-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:31:11 +02:00
Brian Gerst
ba3e127ec1 x86/vm86: Clean up vm86.h includes
vm86.h was being implicitly included in alot of places via
processor.h, which in turn got it from math_emu.h.  Break that
chain and explicitly include vm86.h in all files that need it.
Also remove unused vm86 field from math_emu_info.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-7-git-send-email-brgerst@gmail.com
[ Fixed build failure. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:31:10 +02:00
Brian Gerst
5ed92a8ab7 x86/vm86: Use the normal pt_regs area for vm86
Change to use the normal pt_regs area to enter and exit vm86
mode.  This is done by increasing the padding at the top of the
stack to make room for the extra vm86 segment slots in the IRET
frame.  It then saves the 32-bit regs in the off-stack vm86
data, and copies in the vm86 regs.  Exiting back to 32-bit mode
does the reverse.  This allows removing the hacks to jump
directly into the exit asm code due to having to change the
stack pointer.  Returning normally from the vm86 syscall and the
exception handlers allows things like ptrace and auditing to work properly.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-5-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:31:09 +02:00
Brian Gerst
90c6085a24 x86/vm86: Eliminate 'struct kernel_vm86_struct'
Now there is no vm86-specific data left on the kernel stack
while in userspace, except for the 32-bit regs.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-4-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:31:08 +02:00
Brian Gerst
d4ce0f26c7 x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86'
Move the non-regs fields to the off-stack data.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-3-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:31:08 +02:00
Brian Gerst
9fda6a0681 x86/vm86: Move vm86 fields out of 'thread_struct'
Allocate a separate structure for the vm86 fields.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-2-git-send-email-brgerst@gmail.com
[ Build fixes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:31:07 +02:00
Andy Lutomirski
a5b9e5a2f1 x86/ldt: Make modify_ldt() optional
The modify_ldt syscall exposes a large attack surface and is
unnecessary for modern userspace.  Make it optional.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/a605166a771c343fd64802dece77a903507333bd.1438291540.git.luto@kernel.org
[ Made MATH_EMULATION dependent on MODIFY_LDT_SYSCALL. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 13:30:45 +02:00
Oleg Nesterov
db087ef69a uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) more clever
The previous change documents that cleanup_return_instances()
can't always detect the dead frames, the stack can grow. But
there is one special case which imho worth fixing:
arch_uretprobe_is_alive() can return true when the stack didn't
actually grow, but the next "call" insn uses the already
invalidated frame.

Test-case:

	#include <stdio.h>
	#include <setjmp.h>

	jmp_buf jmp;
	int nr = 1024;

	void func_2(void)
	{
		if (--nr == 0)
			return;
		longjmp(jmp, 1);
	}

	void func_1(void)
	{
		setjmp(jmp);
		func_2();
	}

	int main(void)
	{
		func_1();
		return 0;
	}

If you ret-probe func_1() and func_2() prepare_uretprobe() hits
the MAX_URETPROBE_DEPTH limit and "return" from func_2() is not
reported.

When we know that the new call is not chained, we can do the
more strict check. In this case "sp" points to the new ret-addr,
so every frame which uses the same "sp" must be dead. The only
complication is that arch_uretprobe_is_alive() needs to know was
it chained or not, so we add the new RP_CHECK_CHAIN_CALL enum
and change prepare_uretprobe() to pass RP_CHECK_CALL only if
!chained.

Note: arch_uretprobe_is_alive() could also re-read *sp and check
if this word is still trampoline_vaddr. This could obviously
improve the logic, but I would like to avoid another
copy_from_user() especially in the case when we can't avoid the
false "alive == T" positives.

Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134028.GA4786@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 10:38:06 +02:00
Oleg Nesterov
86dcb702e7 uprobes: Add the "enum rp_check ctx" arg to arch_uretprobe_is_alive()
arch/x86 doesn't care (so far), but as Pratyush Anand pointed
out other architectures might want why arch_uretprobe_is_alive()
was called and use different checks depending on the context.
Add the new argument to distinguish 2 callers.

Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134026.GA4779@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 10:38:06 +02:00
Oleg Nesterov
7b868e4802 uprobes/x86: Reimplement arch_uretprobe_is_alive()
Add the x86 specific version of arch_uretprobe_is_alive()
helper. It returns true if the stack frame mangled by
prepare_uretprobe() is still on stack. So if it returns false,
we know that the probed function has already returned.

We add the new return_instance->stack member and change the
generic code to initialize it in prepare_uretprobe, but it
should be equally useful for other architectures.

TODO: this assumes that the probed application can't use
      multiple stacks (say sigaltstack). We will try to improve
      this logic later.

Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134018.GA4766@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 10:38:05 +02:00
Ingo Molnar
5b929bd11d Merge branch 'x86/urgent' into x86/asm, before applying dependent patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 10:23:35 +02:00
Andy Lutomirski
37868fe113 x86/ldt: Make modify_ldt synchronous
modify_ldt() has questionable locking and does not synchronize
threads.  Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

This fixes some fallout from the CVE-2015-5157 fixes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 10:23:23 +02:00
Viresh Kumar
c8b5db7de6 x86/hpet: Migrate to new set_state interface
Migrate hpet driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.

This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.

Forward definition of 'hpet_clockevent' wasn't required and so it is
placed after all the callback are defined, to avoid forward declaring
all the callbacks.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: http://lkml.kernel.org/r/8cc9864b6d6342dfac28f270cf69f4cba46fffae.1437042675.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-30 21:28:25 +02:00
Jiang Liu
646c4b7549 x86/irq: Use the caller provided polarity setting in mp_check_pin_attr()
Commit d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical
irqdomain interfaces") introduced a regression which causes
malfunction of interrupt lines.

The reason is that the conversion of mp_check_pin_attr() missed to
update the polarity selection of the interrupt pin with the caller
provided setting and instead uses a stale attribute value. That in
turn results in chosing the wrong interrupt flow handler.

Use the caller supplied setting to configure the pin correctly which
also choses the correct interrupt flow handler.

This restores the original behaviour and on the affected
machine/driver (Surface Pro 3, i2c controller) all IOAPIC IRQ
configuration are identical to v4.1.

Fixes: d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Reported-and-tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Reported-and-tested-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Chen Yu <yu.c.chen@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1438242695-23531-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-30 21:15:29 +02:00
Thomas Gleixner
c948c26048 x86/apic: Drop local_irq_save/restore in timer callbacks
These callbacks are called with interrupts disabled from the core
code. Fixup the local caller to disable interrupts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
2015-07-30 00:51:47 +02:00
Viresh Kumar
b23d8e5278 x86/apic: Migrate apic timer to new set_state interface
Migrate apic driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.

This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.

We weren't doing anything while switching to resume mode and so that
callback isn't implemented.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Bandan Das <bsd@redhat.com>
Link: http://lkml.kernel.org/r/1896ac5989d27f2ac37f4786af9bd537e1921b83.1437042675.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-30 00:51:47 +02:00
Frederic Weisbecker
031a7f456a apm32: Fix cputime == jiffies assumption
That code wrongly assumes that cputime_t wraps jiffies_t. Lets use
the correct accessors/mutators.

No real harm now as that code can't be used with full dynticks.

Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc; John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-07-29 15:44:58 +02:00
Linus Torvalds
2579d019ad Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Thomas Gleixner:
 "A single fix for the intel cqm perf facility to prevent IPIs from
  interrupt context"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/cqm: Return cached counter value from IRQ context
2015-07-26 11:46:32 -07:00
Matt Fleming
2c534c0da0 perf/x86/intel/cqm: Return cached counter value from IRQ context
Peter reported the following potential crash which I was able to
reproduce with his test program,

[  148.765788] ------------[ cut here ]------------
[  148.765796] WARNING: CPU: 34 PID: 2840 at kernel/smp.c:417 smp_call_function_many+0xb6/0x260()
[  148.765797] Modules linked in:
[  148.765800] CPU: 34 PID: 2840 Comm: perf Not tainted 4.2.0-rc1+ #4
[  148.765803]  ffffffff81cdc398 ffff88085f105950 ffffffff818bdfd5 0000000000000007
[  148.765805]  0000000000000000 ffff88085f105990 ffffffff810e413a 0000000000000000
[  148.765807]  ffffffff82301080 0000000000000022 ffffffff8107f640 ffffffff8107f640
[  148.765809] Call Trace:
[  148.765810]  <NMI>  [<ffffffff818bdfd5>] dump_stack+0x45/0x57
[  148.765818]  [<ffffffff810e413a>] warn_slowpath_common+0x8a/0xc0
[  148.765822]  [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
[  148.765824]  [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
[  148.765825]  [<ffffffff810e422a>] warn_slowpath_null+0x1a/0x20
[  148.765827]  [<ffffffff811613f6>] smp_call_function_many+0xb6/0x260
[  148.765829]  [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
[  148.765831]  [<ffffffff81161748>] on_each_cpu_mask+0x28/0x60
[  148.765832]  [<ffffffff8107f6ef>] intel_cqm_event_count+0x7f/0xe0
[  148.765836]  [<ffffffff811cdd35>] perf_output_read+0x2a5/0x400
[  148.765839]  [<ffffffff811d2e5a>] perf_output_sample+0x31a/0x590
[  148.765840]  [<ffffffff811d333d>] ? perf_prepare_sample+0x26d/0x380
[  148.765841]  [<ffffffff811d3497>] perf_event_output+0x47/0x60
[  148.765843]  [<ffffffff811d36c5>] __perf_event_overflow+0x215/0x240
[  148.765844]  [<ffffffff811d4124>] perf_event_overflow+0x14/0x20
[  148.765847]  [<ffffffff8107e7f4>] intel_pmu_handle_irq+0x1d4/0x440
[  148.765849]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765853]  [<ffffffff81219bad>] ? vunmap_page_range+0x19d/0x2f0
[  148.765854]  [<ffffffff81219d11>] ? unmap_kernel_range_noflush+0x11/0x20
[  148.765859]  [<ffffffff814ce6fe>] ? ghes_copy_tofrom_phys+0x11e/0x2a0
[  148.765863]  [<ffffffff8109e5db>] ? native_apic_msr_write+0x2b/0x30
[  148.765865]  [<ffffffff8109e44d>] ? x2apic_send_IPI_self+0x1d/0x20
[  148.765869]  [<ffffffff81065135>] ? arch_irq_work_raise+0x35/0x40
[  148.765872]  [<ffffffff811c8d86>] ? irq_work_queue+0x66/0x80
[  148.765875]  [<ffffffff81075306>] perf_event_nmi_handler+0x26/0x40
[  148.765877]  [<ffffffff81063ed9>] nmi_handle+0x79/0x100
[  148.765879]  [<ffffffff81064422>] default_do_nmi+0x42/0x100
[  148.765880]  [<ffffffff81064563>] do_nmi+0x83/0xb0
[  148.765884]  [<ffffffff818c7c0f>] end_repeat_nmi+0x1e/0x2e
[  148.765886]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765888]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765890]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765891]  <<EOE>>  [<ffffffff8110ab66>] finish_task_switch+0x156/0x210
[  148.765898]  [<ffffffff818c1671>] __schedule+0x341/0x920
[  148.765899]  [<ffffffff818c1c87>] schedule+0x37/0x80
[  148.765903]  [<ffffffff810ae1af>] ? do_page_fault+0x2f/0x80
[  148.765905]  [<ffffffff818c1f4a>] schedule_user+0x1a/0x50
[  148.765907]  [<ffffffff818c666c>] retint_careful+0x14/0x32
[  148.765908] ---[ end trace e33ff2be78e14901 ]---

The CQM task events are not safe to be called from within interrupt
context because they require performing an IPI to read the counter value
on all sockets. And performing IPIs from within IRQ context is a
"no-no".

Make do with the last read counter value currently event in
event->count when we're invoked in this context.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Will Auld <will.auld@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1437490509-15373-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-26 10:22:29 +02:00
Paul E. McKenney
f78f5b90c4 rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()
This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for
consistency with the WARN() series of macros.  This also requires
inverting the sense of the conditional, which this commit also does.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2015-07-22 15:27:32 -07:00
Paolo Pisati
949163015c x86/boot: Obsolete the MCA sys_desc_table
The kernel does not support the MCA bus anymroe, so mark sys_desc_table
as obsolete: remove any reference from the code together with the remaining
of MCA logic.

bloat-o-meter output:

  add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-55 (-55)
  function                                     old     new   delta
  i386_start_kernel                            128     119      -9
  setup_arch                                  1421    1375     -46

Signed-off-by: Paolo Pisati <p.pisati@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437409430-8491-1-git-send-email-p.pisati@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-21 10:55:11 +02:00
Brian Gerst
ed0b2edb61 x86/entry/vm86: Move userspace accesses to do_sys_vm86()
Move the userspace accesses down into the common function in
preparation for the next set of patches.  Also change to copying
the fields explicitly instead of assuming a fixed order in
pt_regs and the kernel data structures.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437354550-25858-4-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-21 09:12:24 +02:00
Brian Gerst
df1ae9a5dc x86/entry/vm86: Preserve 'orig_ax'
There is no legitimate reason for usermode to modify the 'orig_ax'
field on entry to vm86 mode, so copy it from the 32-bit regs.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437354550-25858-3-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-21 09:12:23 +02:00
Brian Gerst
0233606ce5 x86/entry/vm86: Clean up saved_fs/gs
There is no need to save FS and non-lazy GS outside the 32-bit
regs.  Lazy GS still needs to be saved because it wasn't saved
on syscall entry.  Save it in the gs slot of regs32, which is
present but unused.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437354550-25858-2-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-21 09:12:23 +02:00
Jan Beulich
5bc016f1ab x86/fpu: Disable dependent CPU features on "noxsave"
Complete the set of dependent features that need disabling at
once: XSAVEC, AVX-512 and all currently known to the kernel
extensions to it, as well as MPX need to be disabled too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/55ACC40D0200007800092E6C@mail.emea.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-21 08:20:42 +02:00
Andy Lutomirski
bf9f2ee28d x86/nmi: Remove the 'b2b' parameter from nmi_handle()
It has never had any effect. Remove it for comprehensibility.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c91fa38507760d9e54a4b8737fa6409bde896b33.1437418322.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-21 08:02:32 +02:00
Laura Abbott
b51ef52df7 x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume
MSR_IA32_ENERGY_PERF_BIAS is lost after suspend/resume:

	x86_energy_perf_policy -r before

	cpu0: 0x0000000000000006
	cpu1: 0x0000000000000006
	cpu2: 0x0000000000000006
	cpu3: 0x0000000000000006
	cpu4: 0x0000000000000006
	cpu5: 0x0000000000000006
	cpu6: 0x0000000000000006
	cpu7: 0x0000000000000006

	after

	cpu0: 0x0000000000000000
	cpu1: 0x0000000000000006
	cpu2: 0x0000000000000006
	cpu3: 0x0000000000000006
	cpu4: 0x0000000000000006
	cpu5: 0x0000000000000006
	cpu6: 0x0000000000000006
	cpu7: 0x0000000000000006

Resulting in inconsistent energy policy settings across CPUs.

This register is set via init_intel() at bootup. During resume,
the secondary CPUs are brought online again and init_intel() is
callled which re-initializes the register. The boot CPU however
never reinitializes the register.

Add a syscore callback to reinitialize the register for the boot CPU.

Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437428878-4105-1-git-send-email-labbott@fedoraproject.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-21 07:51:38 +02:00
Mathias Krause
4daa832d99 x86: Drop bogus __ref / __refdata annotations
The __ref / __refdata annotations used to be needed because of
referencing functions / variables annotated __cpuinit /
__cpuinitdata.

But those annotations vanished during the development of v3.11.

Therefore most of the __ref / __refdata annotations are not needed
anymore. As they may hide legitimate sections mismatches, we
better get rid of them.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437409973-8927-1-git-send-email-minipli@googlemail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-20 18:57:20 +02:00
Linus Torvalds
0e1dbccd8f Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two families of fixes:

   - Fix an FPU context related boot crash on newer x86 hardware with
     larger context sizes than what most people test.  To fix this
     without ugly kludges or extensive reverts we had to touch core task
     allocator, to allow x86 to determine the task size dynamically, at
     boot time.

     I've tested it on a number of x86 platforms, and I cross-built it
     to a handful of architectures:

                                        (warns)               (warns)
       testing     x86-64:  -git:  pass (    0),  -tip:  pass (    0)
       testing     x86-32:  -git:  pass (    0),  -tip:  pass (    0)
       testing        arm:  -git:  pass ( 1359),  -tip:  pass ( 1359)
       testing       cris:  -git:  pass ( 1031),  -tip:  pass ( 1031)
       testing       m32r:  -git:  pass ( 1135),  -tip:  pass ( 1135)
       testing       m68k:  -git:  pass ( 1471),  -tip:  pass ( 1471)
       testing       mips:  -git:  pass ( 1162),  -tip:  pass ( 1162)
       testing    mn10300:  -git:  pass ( 1058),  -tip:  pass ( 1058)
       testing     parisc:  -git:  pass ( 1846),  -tip:  pass ( 1846)
       testing      sparc:  -git:  pass ( 1185),  -tip:  pass ( 1185)

     ... so I hope the cross-arch impact 'none', as intended.

     (by Dave Hansen)

   - Fix various NMI handling related bugs unearthed by the big asm code
     rewrite and generally make the NMI code more robust and more
     maintainable while at it.  These changes are a bit late in the
     cycle, I hope they are still acceptable.

     (by Andy Lutomirski)"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
  x86/fpu, sched: Dynamically allocate 'struct fpu'
  x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code
  x86/nmi/64: Make the "NMI executing" variable more consistent
  x86/nmi/64: Minor asm simplification
  x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection
  x86/nmi/64: Reorder nested NMI checks
  x86/nmi/64: Improve nested NMI comments
  x86/nmi/64: Switch stacks on userspace NMI entry
  x86/nmi/64: Remove asm code that saves CR2
  x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
2015-07-18 10:49:57 -07:00
Ingo Molnar
5aaeb5c01c x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.

Also optimize the x86 code a bit by caching task_struct_size.

Acked-and-Tested-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-18 03:42:51 +02:00
Dave Hansen
0c8c0f03e3 x86/fpu, sched: Dynamically allocate 'struct fpu'
The FPU rewrite removed the dynamic allocations of 'struct fpu'.
But, this potentially wastes massive amounts of memory (2k per
task on systems that do not have AVX-512 for instance).

Instead of having a separate slab, this patch just appends the
space that we need to the 'task_struct' which we dynamically
allocate already.  This saves from doing an extra slab
allocation at fork().

The only real downside here is that we have to stick everything
and the end of the task_struct.  But, I think the
BUILD_BUG_ON()s I stuck in there should keep that from being too
fragile.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-18 03:42:35 +02:00
Russell King
4d7489ffba nmi: x86: convert to generic nmi handler
Convert x86 to use the generic nmi handler code which can be shared
between architectures.

Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-07-17 12:23:30 +01:00
Andy Lutomirski
0b22930eba x86/nmi/64: Improve nested NMI comments
I found the nested NMI documentation to be difficult to follow.
Improve the comments.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-17 12:50:11 +02:00
Andy Lutomirski
9d05041679 x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
32-bit kernels handle nested NMIs in C.  Enable the exact same
handling on 64-bit kernels as well.  This isn't currently
necessary, but it will become necessary once the asm code starts
allowing limited nesting.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-17 12:50:10 +02:00
Andy Shevchenko
23ae2a16bb x86/platform/iosf_mbi: Move to dedicated folder
Move the driver to arch/x86/platform/intel since it is not a core
kernel code and it is related to many Intel SoCs from different
groups: Atom, MID, etc.

There is no functional change.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: David E . Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1436366709-17683-2-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-16 17:48:47 +02:00
Thomas Gleixner
ce0d3c0a6f genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now
Boris reported that the sparse_irq protection around __cpu_up() in the
generic code causes a regression on Xen. Xen allocates interrupts and
some more in the xen_cpu_up() function, so it deadlocks on the
sparse_irq_lock.

There is no simple fix for this and we really should have the
protection for all architectures, but for now the only solution is to
move it to x86 where actual wreckage due to the lack of protection has
been observed.

Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Fixes: a899418167 'hotplug: Prevent alloc/free of irq descriptors during cpu up/down'
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: xiao jin <jin.xiao@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>
2015-07-15 10:39:17 +02:00
Jiang Liu
c149e4cd08 x86/irq: Use access helper irq_data_get_affinity_mask()
This is a preparatory patch for moving irq_data struct members.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-13 21:22:47 +02:00
Jiang Liu
ff96b4d033 x86/irq: Use accessor irq_data_get_irq_handler_data()
Use accessor function irq_data_get_irq_handler_data() to hide irq_desc
implementation details.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-13 21:22:46 +02:00
Jiang Liu
5f2dbbc517 x86/irq: Use accessor irq_data_get_node()
Use accessor irq_data_get_node() to hide struct irq_data
implementation detail, so we can move node to irq_data_common later.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-13 21:22:46 +02:00
Vitaly Kuznetsov
9d87cd61a6 x86/irq: Hide 'HYP:' line in /proc/interrupts when not on Xen/Hyper-V
Hypervisor callback interrupts are only accounted on
Xen/Hyper-V. There is no point in having always-zero HYP: line
on other hypervisors or bare metal. Print the line only if
HYPERVISOR_CALLBACK_VECTOR was allocated.

Reported-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Andrew Jones <drjones@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1436286373-11908-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-08 11:18:34 +02:00
Thomas Gleixner
09cf92b784 x86/irq: Retrieve irq data after locking irq_desc
irq_data is protected by irq_desc->lock, so retrieving the irq chip
from irq_data outside the lock is racy vs. an concurrent update. Move
it into the lock held region.

While at it add a comment why the vector walk does not require
vector_lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: xiao jin <jin.xiao@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.331320612@linutronix.de
2015-07-07 11:54:04 +02:00
Thomas Gleixner
cbb24dc761 x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable()
It's unsafe to examine fields in the irq descriptor w/o holding the
descriptor lock. Add proper locking.

While at it add a comment why the vector check can run lock less

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: xiao jin <jin.xiao@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.236544164@linutronix.de
2015-07-07 11:54:04 +02:00
Thomas Gleixner
5a3f75e3f0 x86/irq: Plug irq vector hotplug race
Jin debugged a nasty cpu hotplug race which results in leaking a irq
vector on the newly hotplugged cpu.

cpu N				cpu M
native_cpu_up                   device_shutdown
  do_boot_cpu			  free_msi_irqs
  start_secondary                   arch_teardown_msi_irqs
    smp_callin                        default_teardown_msi_irqs
       setup_vector_irq                  arch_teardown_msi_irq
        __setup_vector_irq		   native_teardown_msi_irq
          lock(vector_lock)		     destroy_irq 
          install vectors
          unlock(vector_lock)
					       lock(vector_lock)
--->                                  	       __clear_irq_vector
                                    	       unlock(vector_lock)
    lock(vector_lock)
    set_cpu_online
    unlock(vector_lock)

This leaves the irq vector(s) which are torn down on CPU M stale in
the vector array of CPU N, because CPU M does not see CPU N online
yet. There is a similar issue with concurrent newly setup interrupts.

The alloc/free protection of irq descriptors does not prevent the
above race, because it merily prevents interrupt descriptors from
going away or changing concurrently.

Prevent this by moving the call to setup_vector_irq() into the
vector_lock held region which protects set_cpu_online():

cpu N				cpu M
native_cpu_up                   device_shutdown
  do_boot_cpu			  free_msi_irqs
  start_secondary                   arch_teardown_msi_irqs
    smp_callin                        default_teardown_msi_irqs
       lock(vector_lock)                arch_teardown_msi_irq
       setup_vector_irq()
        __setup_vector_irq		   native_teardown_msi_irq
          install vectors		     destroy_irq 
       set_cpu_online
       unlock(vector_lock)
					       lock(vector_lock)
                                  	       __clear_irq_vector
                                    	       unlock(vector_lock)

So cpu M either sees the cpu N online before clearing the vector or
cpu N installs the vectors after cpu M has cleared it.

Reported-by: xiao jin <jin.xiao@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.141898931@linutronix.de
2015-07-07 11:54:04 +02:00
Andy Lutomirski
0333a209cb x86/irq, context_tracking: Document how IRQ context tracking works and add an RCU assertion
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/e8bdc4ed0193fb2fd130f3d6b7b8023e2ec1ab62.1435952415.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-07 10:59:10 +02:00
Andy Lutomirski
8c84014f3b x86/entry: Remove exception_enter() from most trap handlers
On 64-bit kernels, we don't need it any more: we handle context
tracking directly on entry from user mode and exit to user mode.

On 32-bit kernels, we don't support context tracking at all, so
these callbacks had no effect.

Note: this doesn't change do_page_fault().  Before we do that,
we need to make sure that there is no code that can page fault
from kernel mode with CONTEXT_USER.  The 32-bit fast system call
stack argument code is the only offender I'm aware of right now.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/ae22f4dfebd799c916574089964592be218151f9.1435952415.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-07 10:59:09 +02:00
Andy Lutomirski
02fdcd5eac x86/traps, context_tracking: Assert that we're in CONTEXT_KERNEL in exception entries
Other than the super-atomic exception entries, all exception
entries are supposed to switch our context tracking state to
CONTEXT_KERNEL. Assert that they do.  These assertions appear
trivial at this point, as exception_enter() is the function
responsible for switching context, but I'm planning on reworking
x86's exception context tracking, and these assertions will help
make sure that all of this code keeps working.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20fa1ee2d943233a184aaf96ff75394d3b34dfba.1435952415.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-07 10:59:05 +02:00
Andy Lutomirski
1f484aa690 x86/entry: Move C entry and exit code to arch/x86/entry/common.c
The entry and exit C helpers were confusingly scattered between
ptrace.c and signal.c, even though they aren't specific to
ptrace or signal handling.  Move them together in a new file.

This change just moves code around.  It doesn't change anything.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/324d686821266544d8572423cc281f961da445f4.1435952415.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-07 10:59:05 +02:00
Andy Shevchenko
91780c41a9 x86/platform/intel/pmc_atom: Move the PMC-Atom code to arch/x86/platform/atom
This is specific driver for Intel Atom SoCs like BayTrail and
Braswell. Let's move it to dedicated folder and alleviate a
arch/x86/kernel burden.

There is no functional change.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Kumar P Mahesh <mahesh.kumar.p@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1436192944-56496-6-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 18:39:38 +02:00
Andy Shevchenko
2b8f8eddaf x86/platform/intel/pmc_atom: Add Cherrytrail PMC interface
The patch adds CHT PMC interface. This exposes all the South IP
device power states and S0ix states for CHT. The bit map of
FUNC_DIS and D3_STS_0 registers for SoCs are consistent. The
D3_STS_1 and FUNC_DIS_2 registers, however, are not aligned.
This is fixed by splitting a common mapping on per register basis.

(Originally based on code from Kumar P Mahesh.)

Originally-from: Kumar P Mahesh <mahesh.kumar.p@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1436192944-56496-5-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 18:39:38 +02:00
Andy Shevchenko
940406d1cf x86/platform/intel/pmc_atom: Supply register mappings via PMC object
The patch converts the functions to use the register mappings
provided by PMC object. It would help in case of mappings on
different platforms.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Kumar P Mahesh <mahesh.kumar.p@intel.com>
Link: http://lkml.kernel.org/r/1436192944-56496-4-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 17:42:46 +02:00
Andy Shevchenko
c3c65aa6d4 x86/platform/intel/pmc_atom: Print index of device in loop
The register mapping may change from one platform to another.
Thus, indices might be not the same on different platforms. The
patch makes the code to print the device index dynamically at
run time.

The patch also changes the for loop to iterate over the map
until a terminator is found.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Kumar P Mahesh <mahesh.kumar.p@intel.com>
Link: http://lkml.kernel.org/r/1436192944-56496-3-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 17:42:46 +02:00
Andy Shevchenko
68872eb9b1 x86/platform/intel/pmc_atom: Export accessors to PMC registers
Export the pmc_atom_read() and pmc_atom_write() accessors to the PMC
registers. On early initcall stages the functions will return
-ENODEV, and caller has to wait when it will be available.

Additionally make absence of debugfs a non-fatal error.

The patch will be useful for the upcoming fixes regarding to the
LPSS block found on Intel BayTrail-T and Braswell.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Kumar P Mahesh <mahesh.kumar.p@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1436192944-56496-2-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 17:42:45 +02:00
Steven Rostedt
827a82ff39 x86/earlyprintk: Allow early_printk() to use console style parameters like '115200n8'
When I enable early_printk on a kernel, I cut and paste the
console= input and add to earlyprintk parameter. But I notice
recently that ktest has not been detecting triple faults. The
way it detects it, is by seeing the kernel banner "Linux version
.." with a different kernel version pop up. Then I noticed that
early printk was no longer working on my console, which was why
ktest was not seeing it.

I bisected it down and it was added to 4.0 with this commit:

  ea9e9d8029 ("Specify PCI based UART for earlyprintk")

because it converted the simple_strtoul() that converts the baud
number into a kstrtoul(). The problem with this is, I had as my
baud rate, 115200n8 (acceptable for console=ttyS0), but because
of the "n8", the kstrtoul() doesn't parse the baud rate and
returns an error, which sets the baud rate to the default 9600.
This explains the garbage on my screen.

Now, earlyprintk= kernel parameter does not say it accepts that
format. Thus, one answer would simply be me changing my kernel
parameters to remove the "n8" since it isn't parsed anyway. But
I wonder if other people run into this, and it seems strange
that the two consoles for serial accepts different input.

I could also extend this to have earlyprintk do something with
that "n8" or whatever it has and have it match the console
parsing (which, BTW, still uses simple_strtoul(), as I guess it
has to).

This patch just makes my old kernel parameter parsing work like
it use to.

Although, simple_strtoull() is considered obsolete, it is the
only standard string parsing function that parses a number that
is attached to text. Ironically, commit ea9e9d8029 also added
several calls to simple_strtoul()!

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stuart R. Anderson <stuart.r.anderson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150706101434.5f6a351b@gandalf.local.home
[ Cleaned it up a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 17:33:47 +02:00
Brian Gerst
5e2aad2460 x86/compat: Remove unneeded #include
Including sys_ia32.h is not needed in signal.c.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1434974121-32575-10-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:28:57 +02:00
Brian Gerst
10ed34935e x86/compat, x86/perf: Don't build perf_callchain_user32() on x32
perf_callchain_user32() is not needed for x32.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1434974121-32575-9-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:28:57 +02:00
Brian Gerst
601275c3e0 x86/compat: Factor out ia32 compat code from compat_arch_ptrace()
Move the ia32-specific code in compat_arch_ptrace() into its
own function.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1434974121-32575-6-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:28:56 +02:00
Brian Gerst
7da770785f x86/compat: Rename 'start_thread_ia32' to 'compat_start_thread'
This function is shared between the 32-bit compat and x32 ABIs.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1434974121-32575-5-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:28:56 +02:00
Brian Gerst
c0bfd26e13 x86/compat: Move copy_siginfo_*_user32() to signal_compat.c
copy_siginfo_to_user32() and copy_siginfo_from_user32() are used
by both the 32-bit compat and x32 ABIs.  Move them to
signal_compat.c.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1434974121-32575-2-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:28:55 +02:00
Andy Lutomirski
27c634054a x86/asm/tsc: Use rdtsc_ordered() in read_tsc() instead of get_cycles()
There are two logical changes here.  First, this removes a check
for cpu_has_tsc.  That check is unnecessary, as we don't
register the TSC as a clocksource on systems that have no TSC.

Second, it adds a barrier, thus preventing observable
non-monotonicity.

I suspect that the missing barrier was never a problem in
practice because system calls themselves were heavy enough
barriers to prevent user code from observing time warps due to
speculation. (Without the corresponding barrier in the vDSO,
however, non-monotonicity is easy to detect.)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/c6ff621a053127a65b70f175443578db7a0711be.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:23:29 +02:00
Andy Lutomirski
eee6946e44 x86/asm/tsc/sync: Use rdtsc_ordered() in check_tsc_warp() and drop extra barriers
Using get_cycles was unnecessary: check_tsc_warp() is not called
on TSC-less systems. Replace rdtsc_barrier(); get_cycles() with
rdtsc_ordered().

While we're at it, make the somewhat more dangerous change of
removing barrier_before_rdtsc after RDTSC in the TSC warp check
code. This should be okay, though -- the vDSO TSC code doesn't
have that barrier, so, if removing the barrier from the warp
check would cause us to detect a warp that we otherwise wouldn't
detect, then we have a genuine bug.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/387c4c3a75f875bcde6cd68cee013273a744f364.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:23:29 +02:00
Andy Lutomirski
03b9730b76 x86/asm/tsc: Add rdtsc_ordered() and use it in trivial call sites
rdtsc_barrier(); rdtsc() is an unnecessary mouthful and requires
more thought than should be necessary. Add an rdtsc_ordered()
helper and replace the trivial call sites with it.

This should not change generated code. The duplication of the
fence asm is temporary.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/dddbf98a2af53312e9aa73a5a2b1622fe5d6f52b.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:23:29 +02:00
Andy Lutomirski
4ea1636b04 x86/asm/tsc: Rename native_read_tsc() to rdtsc()
Now that there is no paravirt TSC, the "native" is
inappropriate. The function does RDTSC, so give it the obvious
name: rdtsc().

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org
[ Ported it to v4.2-rc1. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:23:28 +02:00
Andy Lutomirski
3796366614 x86/asm/tsc, x86/cpu/amd: Use the full 64-bit TSC to detect the 2.6.2 bug
This code is timing 100k indirect calls, so the added overhead
of counting the number of cycles elapsed as a 64-bit number
should be insignificant.  Drop the optimization of using a
32-bit count.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/d58f339a9c0dd8352b50d2f7a216f67ec2844f20.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:23:27 +02:00
Andy Lutomirski
87be28aaf1 x86/asm/tsc: Replace rdtscll() with native_read_tsc()
Now that the ->read_tsc() paravirt hook is gone, rdtscll() is
just a wrapper around native_read_tsc(). Unwrap it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/d2449ae62c1b1fb90195bcfb19ef4a35883a04dc.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:23:26 +02:00
Andy Lutomirski
9261e050b6 x86/asm/tsc, x86/paravirt: Remove read_tsc() and read_tscp() paravirt hooks
We've had ->read_tsc() and ->read_tscp() paravirt hooks since
the very beginning of paravirt, i.e.,

  d3561b7fa0 ("[PATCH] paravirt: header and stubs for paravirtualisation").

AFAICT, the only paravirt guest implementation that ever
replaced these calls was vmware, and it's gone. Arguably even
vmware shouldn't have hooked RDTSC -- we fully support systems
that don't have a TSC at all, so there's no point for a paravirt
implementation to pretend that we have a TSC but to replace it.

I also doubt that these hooks actually worked. Calls to rdtscl()
and rdtscll(), which respected the hooks, were used seemingly
interchangeably with native_read_tsc(), which did not.

Just remove them. If anyone ever needs them again, they can try
to make a case for why they need them.

Before, on a paravirt config:
  text    	data     bss     dec     hex filename
  12618257      1816384 1093632 15528273 ecf151 vmlinux

After:
  text		data     bss     dec     hex filename
  12617207      1816384 1093632 15527223 eced37 vmlinux

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/d08a2600fb298af163681e5efd8e599d889a5b97.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:23:26 +02:00
Andy Lutomirski
c6e5ca35c4 x86/asm/tsc: Inline native_read_tsc() and remove __native_read_tsc()
In the following commit:

  cdc7957d19 ("x86: move native_read_tsc() offline")

... native_read_tsc() was moved out of line, presumably for some
now-obsolete vDSO-related reason. Undo it.

The entire rdtsc, shl, or sequence is only 11 bytes, and calls
via rdtscl() and similar helpers were already inlined.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/d05ffe2aaf8468ca475ebc00efad7b2fa174af19.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:23:25 +02:00
Zhu Guihua
20d5e4a9cd x86/espfix: Init espfix on the boot CPU side
As we alloc pages with GFP_KERNEL in init_espfix_ap() which is
called before we enable local irqs, so the lockdep sub-system
would (correctly) trigger a warning about the potentially
blocking API.

So we allocate them on the boot CPU side when the secondary CPU is
brought up by the boot CPU, and hand them over to the secondary
CPU.

And we use alloc_pages_node() with the secondary CPU's node, to
make sure the espfix stack is NUMA-local to the CPU that is
going to use it.

Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Cc: <bp@alien8.de>
Cc: <luto@amacapital.net>
Cc: <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c97add2670e9abebb90095369f0cfc172373ac94.1435824469.git.zhugh.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:00:34 +02:00
Zhu Guihua
1db875631f x86/espfix: Add 'cpu' parameter to init_espfix_ap()
Add a CPU index parameter to init_espfix_ap(), so that the
parameter could be propagated to the function for espfix
page allocation.

Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Cc: <bp@alien8.de>
Cc: <luto@amacapital.net>
Cc: <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/cde3fcf1b3211f3f03feb1a995bce3fee850f0fc.1435824469.git.zhugh.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:00:33 +02:00
Alexander Popov
5d5aa3cfca x86/kasan: Fix KASAN shadow region page tables
Currently KASAN shadow region page tables created without
respect of physical offset (phys_base). This causes kernel halt
when phys_base is not zero.

So let's initialize KASAN shadow region page tables in
kasan_early_init() using __pa_nodebug() which considers
phys_base.

This patch also separates x86_64_start_kernel() from KASAN low
level details by moving kasan_map_early_shadow(init_level4_pgt)
into kasan_early_init().

Remove the comment before clear_bss() which stopped bringing
much profit to the code readability. Otherwise describing all
the new order dependencies would be too verbose.

Signed-off-by: Alexander Popov <alpopov@ptsecurity.com>
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: <stable@vger.kernel.org> # 4.0+
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435828178-10975-3-git-send-email-a.ryabinin@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 14:53:13 +02:00
Andrey Ryabinin
d0f77d4d04 x86/init: Clear 'init_level4_pgt' earlier
Currently x86_64_start_kernel() has two KASAN related
function calls. The first call maps shadow to early_level4_pgt,
the second maps shadow to init_level4_pgt.

If we move clear_page(init_level4_pgt) earlier, we could hide
KASAN low level detail from generic x86_64 initialization code.
The next patch will do it.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: <stable@vger.kernel.org> # 4.0+
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435828178-10975-2-git-send-email-a.ryabinin@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 14:53:13 +02:00
Adrian Hunter
5aac644a99 x86/tsc: Let high latency PIT fail fast in quick_pit_calibrate()
If it takes longer than 12us to read the PIT counter lsb/msb,
then the error margin will never fall below 500ppm within 50ms,
and Fast TSC calibration will always fail.

This patch detects when that will happen and fails fast. Note
the failure message is not printed in that case because:
1. it will always happen on that class of hardware
2. the absence of the message is more informative than its
presence

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/556EB717.9070607@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-06 09:41:00 +02:00
Linus Torvalds
b1be9ead13 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two FPU rewrite related fixes.  This addresses all known x86
  regressions at this stage.  Also some other misc fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Fix boot crash in the early FPU code
  x86/asm/entry/64: Update path names
  x86/fpu: Fix FPU related boot regression when CPUID masking BIOS feature is enabled
  x86/boot/setup: Clean up the e820_reserve_setup_data() code
  x86/kaslr: Fix typo in the KASLR_FLAG documentation
2015-07-04 08:58:50 -07:00
Linus Torvalds
c1776a18e3 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "This tree includes an x86 PMU scheduling fix, but most changes are
  late breaking tooling fixes and updates:

  User visible fixes:

   - Create config.detected into OUTPUT directory, fixing parallel
     builds sharing the same source directory (Aaro Kiskinen)

   - Allow to specify custom linker command, fixing some MIPS64 builds.
     (Aaro Kiskinen)

   - Fix to show proper convergence stats in 'perf bench numa' (Srikar
     Dronamraju)

  User visible changes:

   - Validate syscall list passed via -e argument to 'perf trace'.
     (Arnaldo Carvalho de Melo)

   - Introduce 'perf stat --per-thread' (Jiri Olsa)

   - Check access permission for --kallsyms and --vmlinux (Li Zhang)

   - Move toggling event logic from 'perf top' and into hists browser,
     allowing freeze/unfreeze with event lists with more than one entry
     (Namhyung Kim)

   - Add missing newlines when dumping PERF_RECORD_FINISHED_ROUND and
     showing the Aggregated stats in 'perf report -D' (Adrian Hunter)

  Infrastructure fixes:

   - Add missing break for PERF_RECORD_ITRACE_START, which caused those
     events samples to be parsed as well as PERF_RECORD_LOST_SAMPLES.
     ITRACE_START only appears when Intel PT or BTS are present, so..
     (Jiri Olsa)

   - Call the perf_session destructor when bailing out in the inject,
     kmem, report, kvm and mem tools (Taeung Song)

  Infrastructure changes:

   - Move stuff out of 'perf stat' and into the lib for further use
     (Jiri Olsa)

   - Reference count the cpu_map and thread_map classes (Jiri Olsa)

   - Set evsel->{cpus,threads} from the evlist, if not set, allowing the
     generalization of some 'perf stat' functions that previously were
     accessing private static evlist variable (Jiri Olsa)

   - Delete an unnecessary check before the calling free_event_desc()
     (Markus Elfring)

   - Allow auxtrace data alignment (Adrian Hunter)

   - Allow events with dot (Andi Kleen)

   - Fix failure to 'perf probe' events on arm (He Kuang)

   - Add testing for Makefile.perf (Jiri Olsa)

   - Add test for make install with prefix (Jiri Olsa)

   - Fix single target build dependency check (Jiri Olsa)

   - Access thread_map entries via accessors, prep patch to hold more
     info per entry, for ongoing 'perf stat --per-thread' work (Jiri
     Olsa)

   - Use __weak definition from compiler.h (Sukadev Bhattiprolu)

   - Split perf_pmu__new_alias() (Sukadev Bhattiprolu)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  perf tools: Allow to specify custom linker command
  perf tools: Create config.detected into OUTPUT directory
  perf mem: Fill in the missing session freeing after an error occurs
  perf kvm: Fill in the missing session freeing after an error occurs
  perf report: Fill in the missing session freeing after an error occurs
  perf kmem: Fill in the missing session freeing after an error occurs
  perf inject: Fill in the missing session freeing after an error occurs
  perf tools: Add missing break for PERF_RECORD_ITRACE_START
  perf/x86: Fix 'active_events' imbalance
  perf symbols: Check access permission when reading symbol files
  perf stat: Introduce --per-thread option
  perf stat: Introduce print_counters function
  perf stat: Using init_stats instead of memset
  perf stat: Rename print_interval to process_interval
  perf stat: Remove perf_evsel__read_cb function
  perf stat: Move perf_stat initialization counter process code
  perf stat: Move zero_per_pkg into counter process code
  perf stat: Separate counters reading and processing
  perf stat: Introduce read_counters function
  perf stat: Introduce perf_evsel__read function
  ...
2015-07-04 08:17:29 -07:00
Ingo Molnar
b96fecbfa8 x86/fpu: Fix boot crash in the early FPU code
Jan Kara and Thomas Gleixner reported boot crashes in the FPU
code:

  general protection fault: 0000 [#1] SMP
  RIP: 0010:[<ffffffff81048a6c>]  [<ffffffff81048a6c>] mxcsr_feature_mask_init+0x1c/0x40

  2b:*  0f ae 85 00 fe ff ff    fxsave -0x200(%rbp)

and bisected it down to the following FPU commit:

   91a8c2a5b4 ("x86/fpu: Clean up and fix MXCSR handling")

The reason is that the on-stack FPU registers state variable,
used by the FXSAVE instruction, did not have the required
minimum alignment of 16 bytes, causing the general protection
fault.

This is most likely a GCC bug in older GCC versions, but the
offending commit also added a bogus extra 32-byte alignment
(which GCC ignored too).

So fix this bug by making the variable static again, but also
mark it __initdata this time, because fpu__init_system_mxcsr()
is now an __init function.

Reported-and-bisected-by: Jan Kara <jack@suse.cz>
Reported-bisected-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150704075819.GA9201@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04 10:05:56 +02:00
Linus Torvalds
a611fb75d0 Fixup various init.h misuses that are fragile wrt code moving to module.h
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVkPNDAAoJEOvOhAQsB9HWTNwP/1xtv8s2f7dY1JOV9T3oad7K
 FJYOnFRu1CbXqtOGgJQlsY5eUc3liC+UEkqMFmvX008GIoIGi/aq1alzM4ySlu45
 c8QttAS9aFFHwsNQUFA8rNN2Lz1xmhKi3ovc/+BBN9stgX0W0fJHX8A7TYtBsVFa
 YqfkNP/4XGH+Taz4B7Id6Mv3RJfB+9TWMlHJ4oKl1NhT+fU+Ce2888K7y5llHGIz
 Y9yDt7hMUv/7ysOpiHbvSKy3XnitTNx9JbN8CDQV22krpgsU1k0nYloxOVj5K0h0
 vxcjpQ1Wmjlc7RO826tciMi3ZD880GK5n8NHuI87d/N/egXRP0Tsy1iy9eGK0R7i
 udXR2y4RP5gD7SPuMJCUCrBTxkfp+rxQ775Keo/R9r4v/KzpKX6e0LcEDjiLsk88
 5UHUZNdPgXxw85O354QwX05jAucPIs6Eq8PR324F+R+FU8x5EI6GWtFts0K4YI7j
 ebsgaQR/aqvRlr859iJBFGBwEu0YWcbkVb6kKdMSjE4x0a3YxhFe6aXXll0g+iIZ
 wGR54nRpBUUvh+qqlrSFTc3VA4f1KPdhylcfEmfSH2iNjARvDR61vzkLW1Nt6u0I
 aM6ZYcfbGhGHt+pycqe6LAydS3qRyWDA6QTu6+TFZid/Ay6NBEI+Ubbx+eLNf8vr
 +trFtqFvEfIMuT1BvOXo
 =TR34
 -----END PGP SIGNATURE-----

Merge tag 'module-misc-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

Pull init.h/module.h fragility fixes from Paul Gortmaker:
 "Fixup various init.h misuses that are fragile wrt code moving to
  module.h

  What started as a removal of no longer required include <linux/init.h>
  due to the earlier __cpuinit and __devinit removal led to the
  observation that some module specfic support was living in init.h
  itself, thus preventing the full removal from introducing compile
  regressions.

  This series includes a few final fixups needed prior to the relocation
  of the modular init code from <init.h> to <module.h>.  These are
  things that weren't easily categorized into any of the other previous
  series categories already requested for pull.

  That said, each fixup branch (including this one) is independent and
  there are no ordering constraints.  Only the final code relocation
  (which is NOT in this pull) requires that all my cleanup branches be
  merged first"

* tag 'module-misc-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  tile: add init.h to usb.c to avoid compile failure
  arm: fix implicit #include <linux/init.h> in entry asm.
  x86: replace __init_or_module with __init in non-modular vsmp_64.c
2015-07-02 11:07:27 -07:00
Linus Torvalds
9d90f03531 Replace module_init with appropriate alternate initcall in non modules.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVkO6nAAoJEOvOhAQsB9HWpHMP/Aknc+lmX2dZeIn96gdkP+UK
 1qL24C5oq2sm/9yTZLdoXbyApLaaTbAJHS9O4kolaOU6uOs3JrgtXqL1697PVp1R
 qV4f4DOzXmmEHaE2oO21afAri3tXIVQNqA2NQl2TmKfwz0Atu01Vj5RJPu/ZOBPl
 dONXcFnE6nO2p7AEFRP/GfDZwkng4xALyZPhwL7tJDAeGaBpqG/n2hCuq+Szn9g8
 wjTFACBdad/mRrYsL6YsWZ1e+LKI8vsArQbdPTam+jPaEUlK7yjFReFKCJVzL2JP
 xfQoTcCgFztzTUV0JTGR9sqeYA3WH9AkJOFDxNE/eIili4xiTh789WbEpHLVECSX
 1LsW025I3DkRWBPT4L+9ZP805ha71kNXDFc5N3XJkzrCYaFvD2BgsUzxi6FXj7aC
 9lEVKt6xO04FFG5SwTKnO0f8PEhPemZH3BDnVvjBDWQYLjUcPSNz7bfyHUhif0G5
 ulOGVB0ncJJF9iP8PyZs1RA/F8kKxXWnhYMIHzvl0f0vLUA7rAKsACnhBgq8s9ZQ
 uM5YjzU91Z/4pe5C2E5MmQIZ84b79ZPsee1lF0GJdjK5W3PDvnCjIdXfQ5M/f3S8
 76cssXWNhS78/P+19YqirLeb0u7Zw0jf73m9t9ywRgcByWfY5ZUDm0DFpQnWKkoR
 QY/aFO/yHKTO3VHj8Ril
 =KDJO
 -----END PGP SIGNATURE-----

Merge tag 'module_init-alternate_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

Pull module_init replacement part two from Paul Gortmaker:
 "Replace module_init with appropriate alternate initcall in non
  modules.

  This series converts non-modular code that is using the module_init()
  call to hook itself into the system to instead use one of our
  alternate priority initcalls.

  Unlike the previous series that used device_initcall and hence was a
  runtime no-op, these commits change to one of the alternate initcalls,
  because (a) we have them and (b) it seems like the right thing to do.

  For example, it would seem logical to use arch_initcall for arch
  specific setup code and fs_initcall for filesystem setup code.

  This does mean however, that changes in the init ordering will be
  taking place, and so there is a small risk that some kind of implicit
  init ordering issue may lie uncovered.  But I think it is still better
  to give these ones sensible priorities than to just assign them all to
  device_initcall in order to exactly preserve the old ordering.

  Thad said, we have already made similar changes in core kernel code in
  commit c96d6660dc ("kernel: audit/fix non-modular users of
  module_init in core code") without any regressions reported, so this
  type of change isn't without precedent.  It has also got the same
  local testing and linux-next coverage as all the other pull requests
  that I'm sending for this merge window have got.

  Once again, there is an unused module_exit function removal that shows
  up as an outlier upon casual inspection of the diffstat"

* tag 'module_init-alternate_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  x86: perf_event_intel_pt.c: use arch_initcall to hook in enabling
  x86: perf_event_intel_bts.c: use arch_initcall to hook in enabling
  mm/page_owner.c: use late_initcall to hook in enabling
  lib/list_sort: use late_initcall to hook in self tests
  arm: use subsys_initcall in non-modular pl320 IPC code
  powerpc: don't use module_init for non-modular core hugetlb code
  powerpc: use subsys_initcall for Freescale Local Bus
  x86: don't use module_init for non-modular core bootflag code
  netfilter: don't use module_init/exit in core IPV4 code
  fs/notify: don't use module_init for non-modular inotify_user code
  mm: replace module_init usages with subsys_initcall in nommu.c
2015-07-02 10:36:29 -07:00
Linus Torvalds
2d4407079c Replace module_init with equivalent device_initcall in non modules.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVkO5XAAoJEOvOhAQsB9HWe4cQAJcsmSXIDN2O6oxvgH8Wilof
 EIEMvT13uwBdsjQdYUY6A6B3iUV9wzEEgoosg/JRgpz5/b1FTDMIO4arUPD3Lcak
 5bmyVO2qAT+yaLAWSgn6I8DMplXrKiEuK+TkH/mW3p9TdvElLdG3Vg6UI407hSWv
 W0QbVwkNtv8XmzshV9F2YdmflT8j1PgYxIu/tEkVOWn37DNW+Fp2OVBrdTIYp3AJ
 X6bYZPEcQDCrWWW/s2GmIDrNgryiebasns+CAgGY21262jAYaRcFOJmR47AsTqW7
 DSZXIlLc/gJca++hfxqV15RZ4NRHxrebCypTsPtZUV7ZiYHI726eeUZzxsp/9itu
 mvhmi+aQUTTUP3dDhiv05f4syAKEb4zslT6SLwcna6oi09M97HfCeQsHqhcFq/MG
 KnS2JJoJQToQtJvMUXMQzp5hyHjNlOclIvCxEiL32EZU54PeJOKasy/mptNGEctk
 TxACWvoXBQglRaVN+1wIjjr0BaHJSuJa9CUnIfM4WZdSHiMQMx00XLTkZcTnSM6R
 12pE54vVolrXswGPJhy4W/Sf1yPSW1tkWSVBbkKLyCIrlAWJtu68rXhvwhG/nz6E
 3g6QrDEQGlk6bzUH4CJCEqXLPRN1bNS0XjdkEFh60Lury3Ns5yHKZXPW5vCQ5csr
 FQNUyBs595CWbJNfbn1n
 =0BDx
 -----END PGP SIGNATURE-----

Merge tag 'module_init-device_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

Pull module_init replacement part one from Paul Gortmaker:
 "Replace module_init with equivalent device_initcall in non modules.

  This series of commits converts non-modular code that is using the
  module_init() call to hook itself into the system to instead use
  device_initcall().

  The conversion is a runtime no-op, since module_init actually becomes
  __initcall in the non-modular case, and that in turn gets mapped onto
  device_initcall.  A couple files show a larger negative diffstat,
  representing ones that had a module_exit function that we remove here
  vs previously relying on the linker to dispose of it.

  We make this conversion now, so that we can relocate module_init from
  init.h into module.h in the future.

  The files changed here are just limited to those that would otherwise
  have to add module.h to obviously non-modular code, in order to avoid
  a compile fail, as testing has shown"

* tag 'module_init-device_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  MIPS: don't use module_init in non-modular cobalt/mtd.c file
  drivers/leds: don't use module_init in non-modular leds-cobalt-raq.c
  cris: don't use module_init for non-modular core eeprom.c code
  tty/metag_da: Avoid module_init/module_exit in non-modular code
  drivers/clk: don't use module_init in clk-nomadik.c which is non-modular
  xtensa: don't use module_init for non-modular core network.c code
  sh: don't use module_init in non-modular psw.c code
  mn10300: don't use module_init in non-modular flash.c code
  parisc64: don't use module_init for non-modular core perf code
  parisc: don't use module_init for non-modular core pdc_cons code
  cris: don't use module_init for non-modular core intmem.c code
  ia64: don't use module_init in non-modular sim/simscsi.c code
  ia64: don't use module_init for non-modular core kernel/mca.c code
  arm: don't use module_init in non-modular mach-vexpress/spc.c code
  powerpc: don't use module_init in non-modular 83xx suspend code
  powerpc: use device_initcall for registering rtc devices
  x86: don't use module_init in non-modular devicetree.c code
  x86: don't use module_init in non-modular intel_mid_vrtc.c
2015-07-02 10:30:48 -07:00
Josh Triplett
c1bd55f922 x86: opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit
For 32-bit userspace on a 64-bit kernel, this requires modifying
stub32_clone to actually swap the appropriate arguments to match
CONFIG_CLONE_BACKWARDS, rather than just leaving the C argument for tls
broken.

Patch co-authored by Josh Triplett and Thiago Macieira.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thiago Macieira <thiago.macieira@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30 19:45:01 -07:00
KarimAllah Ahmed
a846f47962 x86/kexec: prepend elfcorehdr instead of appending it to the crash-kernel command-line.
Any parameter passed after '--' in the kernel command-line will not be
parsed by the kernel at all, instead it will be passed directly to init
process.

Currently the kernel appends elfcorehdr=<paddr> to the cmdline passed from
kexec load, and if this command-line is used to pass parameters to init
process this means that 'elfcorehdr' will not be parsed as a kernel
parameter at all which will be a problem for vmcore subsystem since it
will know nothing about the location of the ELF structure!

Prepending 'elfcorehdr' instead of appending it fixes this problem since
it ensures that it always comes before '--' and so it's always parsed as a
kernel command-line parameter.

Even with this patch things can still go wrong if 'CONFIG_CMDLINE' was
also used to embedd a command-line to the crash dump kernel and this
command-line contains '--' since the current behavior of the kernel is to
actually append the boot loader command-line to the embedded command-line.

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30 19:44:57 -07:00
Peter Zijlstra
93472aff80 perf/x86: Fix 'active_events' imbalance
Commit 1b7b938f18 ("perf/x86/intel: Fix PMI handling for Intel PT") conditionally
increments active_events in x86_add_exclusive() but unconditionally decrements in
x86_del_exclusive().

These extra decrements can lead to the situation where
active_events is zero and thus the PMI handler is 'disabled'
while we have active events on the PMU generating PMIs.

This leads to a truckload of:

  Uhhuh. NMI received for unknown reason 21 on CPU 28.
  Do you have a strange power saving mode enabled?
  Dazed and confused, but trying to continue

messages and generally messes up perf.

Remove the condition on the increment, double increment balanced
by a double decrement is perfectly fine.

Restructure the code a little bit to make the unconditional inc
a bit more natural.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: alexander.shishkin@linux.intel.com
Cc: brgerst@gmail.com
Cc: dvlasenk@redhat.com
Cc: luto@amacapital.net
Cc: oleg@redhat.com
Fixes: 1b7b938f18 ("perf/x86/intel: Fix PMI handling for Intel PT")
Link: http://lkml.kernel.org/r/20150624144750.GJ18673@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-30 13:08:46 +02:00
Ingo Molnar
dc5fb575df Merge branch 'x86/boot' into x86/urgent
Merge branch that got ready.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-30 07:57:04 +02:00
Ingo Molnar
db52ef74b3 x86/fpu: Fix FPU related boot regression when CPUID masking BIOS feature is enabled
Mike Galbraith reported:

  " My i7-4790 box is having one hell of a time with this merge
    window, dead in the water.

    BIOS setting "Limit CPUID Maximum" upsets new fpu code
    mightily. "

It turns out that Linux does a double workaround here, as per:

  066941bd4e ("x86: unmask CPUID levels on Intel CPUs")

it undoes the BIOS workaround - but as a side effect the CPUID
state is not completely constant during early init anymore,
and the new FPU init code did not take this into account.

So what happened is that the xstate init code did not have full
CPUID available, which broke subsequent attempts to use xstate
features.

Fix this by ordering the early FPU init code to after we've
stabilized the CPUID state.

Reported-bisected-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150627082514.GA10894@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-30 07:22:10 +02:00
Linus Torvalds
88793e5c77 The libnvdimm sub-system introduces, in addition to the libnvdimm-core,
4 drivers / enabling modules:
 
 NFIT:
 Instantiates an "nvdimm bus" with the core and registers memory devices
 (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware Interface
 table).  After registering NVDIMMs the NFIT driver then registers
 "region" devices.  A libnvdimm-region defines an access mode and the
 boundaries of persistent memory media.  A region may span multiple
 NVDIMMs that are interleaved by the hardware memory controller.  In
 turn, a libnvdimm-region can be carved into a "namespace" device and
 bound to the PMEM or BLK driver which will attach a Linux block device
 (disk) interface to the memory.
 
 PMEM:
 Initially merged in v4.1 this driver for contiguous spans of persistent
 memory address ranges is re-worked to drive PMEM-namespaces emitted by
 the libnvdimm-core.  In this update the PMEM driver, on x86, gains the
 ability to assert that writes to persistent memory have been flushed all
 the way through the caches and buffers in the platform to persistent
 media.  See memcpy_to_pmem() and wmb_pmem().
 
 BLK:
 This new driver enables access to persistent memory media through "Block
 Data Windows" as defined by the NFIT.  The primary difference of this
 driver to PMEM is that only a small window of persistent memory is
 mapped into system address space at any given point in time.  Per-NVDIMM
 windows are reprogrammed at run time, per-I/O, to access different
 portions of the media.  BLK-mode, by definition, does not support DAX.
 
 BTT:
 This is a library, optionally consumed by either PMEM or BLK, that
 converts a byte-accessible namespace into a disk with atomic sector
 update semantics (prevents sector tearing on crash or power loss).  The
 sinister aspect of sector tearing is that most applications do not know
 they have a atomic sector dependency.  At least today's disk's rarely
 ever tear sectors and if they do one almost certainly gets a CRC error
 on access.  NVDIMMs will always tear and always silently.  Until an
 application is audited to be robust in the presence of sector-tearing
 the usage of BTT is recommended.
 
 Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
 Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
 Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
 Wysocki, and Bob Moore.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVjZGBAAoJEB7SkWpmfYgC4fkP/j+k6HmSRNU/yRYPyo7CAWvj
 3P5P1i6R6nMZZbjQrQArAXaIyLlFk4sEQDYsciR6dmslhhFZAkR2eFwVO5rBOyx3
 QN0yxEpyjJbroRFUrV/BLaFK4cq2oyJAFFHs0u7/pLHBJ4MDMqfRKAMtlnBxEkTE
 LFcqXapSlvWitSbjMdIBWKFEvncaiJ2mdsFqT4aZqclBBTj00eWQvEG9WxleJLdv
 +tj7qR/vGcwOb12X5UrbQXgwtMYos7A6IzhHbqwQL8IrOcJ6YB8NopJUpLDd7ZVq
 KAzX6ZYMzNueN4uvv6aDfqDRLyVL7qoxM9XIjGF5R8SV9sF2LMspm1FBpfowo1GT
 h2QMr0ky1nHVT32yspBCpE9zW/mubRIDtXxEmZZ53DIc4N6Dy9jFaNVmhoWtTAqG
 b9pndFnjUzzieCjX5pCvo2M5U6N0AQwsnq76/CasiWyhSa9DNKOg8MVDRg0rbxb0
 UvK0v8JwOCIRcfO3qiKcx+02nKPtjCtHSPqGkFKPySRvAdb+3g6YR26CxTb3VmnF
 etowLiKU7HHalLvqGFOlDoQG6viWes9Zl+ZeANBOCVa6rL2O7ZnXJtYgXf1wDQee
 fzgKB78BcDjXH4jHobbp/WBANQGN/GF34lse8yHa7Ym+28uEihDvSD1wyNLnefmo
 7PJBbN5M5qP5tD0aO7SZ
 =VtWG
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm

Pull libnvdimm subsystem from Dan Williams:
 "The libnvdimm sub-system introduces, in addition to the
  libnvdimm-core, 4 drivers / enabling modules:

  NFIT:
    Instantiates an "nvdimm bus" with the core and registers memory
    devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware
    Interface table).

    After registering NVDIMMs the NFIT driver then registers "region"
    devices.  A libnvdimm-region defines an access mode and the
    boundaries of persistent memory media.  A region may span multiple
    NVDIMMs that are interleaved by the hardware memory controller.  In
    turn, a libnvdimm-region can be carved into a "namespace" device and
    bound to the PMEM or BLK driver which will attach a Linux block
    device (disk) interface to the memory.

  PMEM:
    Initially merged in v4.1 this driver for contiguous spans of
    persistent memory address ranges is re-worked to drive
    PMEM-namespaces emitted by the libnvdimm-core.

    In this update the PMEM driver, on x86, gains the ability to assert
    that writes to persistent memory have been flushed all the way
    through the caches and buffers in the platform to persistent media.
    See memcpy_to_pmem() and wmb_pmem().

  BLK:
    This new driver enables access to persistent memory media through
    "Block Data Windows" as defined by the NFIT.  The primary difference
    of this driver to PMEM is that only a small window of persistent
    memory is mapped into system address space at any given point in
    time.

    Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access
    different portions of the media.  BLK-mode, by definition, does not
    support DAX.

  BTT:
    This is a library, optionally consumed by either PMEM or BLK, that
    converts a byte-accessible namespace into a disk with atomic sector
    update semantics (prevents sector tearing on crash or power loss).

    The sinister aspect of sector tearing is that most applications do
    not know they have a atomic sector dependency.  At least today's
    disk's rarely ever tear sectors and if they do one almost certainly
    gets a CRC error on access.  NVDIMMs will always tear and always
    silently.  Until an application is audited to be robust in the
    presence of sector-tearing the usage of BTT is recommended.

  Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
  Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
  Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
  Wysocki, and Bob Moore"

* tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm: (33 commits)
  arch, x86: pmem api for ensuring durability of persistent memory updates
  libnvdimm: Add sysfs numa_node to NVDIMM devices
  libnvdimm: Set numa_node to NVDIMM devices
  acpi: Add acpi_map_pxm_to_online_node()
  libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only
  pmem: flag pmem block devices as non-rotational
  libnvdimm: enable iostat
  pmem: make_request cleanups
  libnvdimm, pmem: fix up max_hw_sectors
  libnvdimm, blk: add support for blk integrity
  libnvdimm, btt: add support for blk integrity
  fs/block_dev.c: skip rw_page if bdev has integrity
  libnvdimm: Non-Volatile Devices
  tools/testing/nvdimm: libnvdimm unit test infrastructure
  libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory
  nd_btt: atomic sector updates
  libnvdimm: infrastructure for btt devices
  libnvdimm: write blk label set
  libnvdimm: write pmem label set
  libnvdimm: blk labels and namespace instantiation
  ...
2015-06-29 10:34:42 -07:00
Linus Torvalds
099bfbfc7f Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "This is the main drm pull request for v4.2.

  I've one other new driver from freescale on my radar, it's been posted
  and reviewed, I'd just like to get someone to give it a last look, so
  maybe I'll send it or maybe I'll leave it.

  There is no major nouveau changes in here, Ben was working on
  something big, and we agreed it was a bit late, there wasn't anything
  else he considered urgent to merge.

  There might be another msm pull for some bits that are waiting on
  arm-soc, I'll see how we time it.

  This touches some "of" stuff, acks are in place except for the fixes
  to the build in various configs,t hat I just applied.

  Summary:

  New drivers:
      - virtio-gpu:
                KMS only pieces of driver for virtio-gpu in qemu.
                This is just the first part of this driver, enough to run
                unaccelerated userspace on. As qemu merges more we'll start
                adding the 3D features for the virgl 3d work.
      - amdgpu:
                a new driver from AMD to driver their newer GPUs. (VI+)
                It contains a new cleaner userspace API, and is a clean
                break from radeon moving forward, that AMD are going to
                concentrate on. It also contains a set of register headers
                auto generated from AMD internal database.

  core:
      - atomic modesetting API completed, enabled by default now.
      - Add support for mode_id blob to atomic ioctl to complete interface.
      - bunch of Displayport MST fixes
      - lots of misc fixes.

  panel:
      - new simple panels
      - fix some long-standing build issues with bridge drivers

  radeon:
      - VCE1 support
      - add a GPU reset counter for userspace
      - lots of fixes.

  amdkfd:
      - H/W debugger support module
      - static user-mode queues
      - support killing all the waves when a process terminates
      - use standard DECLARE_BITMAP

  i915:
      - Add Broxton support
      - S3, rotation support for Skylake
      - RPS booting tuning
      - CPT modeset sequence fixes
      - ns2501 dither support
      - enable cmd parser on haswell
      - cdclk handling fixes
      - gen8 dynamic pte allocation
      - lots of atomic conversion work

  exynos:
      - Add atomic modesetting support
      - Add iommu support
      - Consolidate drm driver initialization
      - and MIC, DECON and MIPI-DSI support for exynos5433

  omapdrm:
      - atomic modesetting support (fixes lots of things in rewrite)

  tegra:
      - DP aux transaction fixes
      - iommu support fix

  msm:
      - adreno a306 support
      - various dsi bits
      - various 64-bit fixes
      - NV12MT support

  rcar-du:
      - atomic and misc fixes

  sti:
      - fix HDMI timing complaince

  tilcdc:
      - use drm component API to access tda998x driver
      - fix module unloading

  qxl:
      - stability fixes"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (872 commits)
  drm/nouveau: Pause between setting gpu to D3hot and cutting the power
  drm/dp/mst: close deadlock in connector destruction.
  drm: Always enable atomic API
  drm/vgem: Set unique to "vgem"
  of: fix a build error to of_graph_get_endpoint_by_regs function
  drm/dp/mst: take lock around looking up the branch device on hpd irq
  drm/dp/mst: make sure mst_primary mstb is valid in work function
  of: add EXPORT_SYMBOL for of_graph_get_endpoint_by_regs
  ARM: dts: rename the clock of MIPI DSI 'pll_clk' to 'sclk_mipi'
  drm/atomic: Don't set crtc_state->enable manually
  drm/exynos: dsi: do not set TE GPIO direction by input
  drm/exynos: dsi: add support for MIC driver as a bridge
  drm/exynos: dsi: add support for Exynos5433
  drm/exynos: dsi: make use of array for clock access
  drm/exynos: dsi: make use of driver data for static values
  drm/exynos: dsi: add macros for register access
  drm/exynos: dsi: rename pll_clk to sclk_clk
  drm/exynos: mic: add MIC driver
  of: add helper for getting endpoint node of specific identifiers
  drm/exynos: add Exynos5433 decon driver
  ...
2015-06-26 13:18:51 -07:00
Toshi Kani
41d7a6d637 libnvdimm: Set numa_node to NVDIMM devices
ACPI NFIT table has System Physical Address Range Structure entries that
describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
set in the flags.

Change acpi_nfit_register_region() to map a proximity ID to its node ID,
and set it to a new numa_node field of nd_region_desc, which is then
conveyed to the nd_region device.

The device core arranges for btt and namespace devices to inherit their
node from their parent region.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
[djbw: move set_dev_node() from region.c to bus.c]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-26 11:23:38 -04:00
Dan Williams
9f53f9fa4a libnvdimm, pmem: add libnvdimm support to the pmem driver
nd_pmem attaches to persistent memory regions and namespaces emitted by
the libnvdimm subsystem, and, same as the original pmem driver, presents
the system-physical-address range as a block device.

The existing e820-type-12 to pmem setup is converted to an nvdimm_bus
that emits an nd_namespace_io device.

Note that the X in 'pmemX' is now derived from the parent region.  This
provides some stability to the pmem devices names from boot-to-boot.
The minor numbers are also more predictable by passing 0 to
alloc_disk().

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Tony Luck
b05b9f5f9d x86, mirror: x86 enabling - find mirrored memory ranges
UEFI GetMemoryMap() uses a new attribute bit to mark mirrored memory
address ranges.  See UEFI 2.5 spec pages 157-158:

  http://www.uefi.org/sites/default/files/resources/UEFI%202_5.pdf

On EFI enabled systems scan the memory map and tell memblock about any
mirrored ranges.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Xiexiuqi <xiexiuqi@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Tony Luck
fc6daaf931 mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute
Some high end Intel Xeon systems report uncorrectable memory errors as a
recoverable machine check.  Linux has included code for some time to
process these and just signal the affected processes (or even recover
completely if the error was in a read only page that can be replaced by
reading from disk).

But we have no recovery path for errors encountered during kernel code
execution.  Except for some very specific cases were are unlikely to ever
be able to recover.

Enter memory mirroring. Actually 3rd generation of memory mirroing.

Gen1: All memory is mirrored
	Pro: No s/w enabling - h/w just gets good data from other side of the
	     mirror
	Con: Halves effective memory capacity available to OS/applications

Gen2: Partial memory mirror - just mirror memory begind some memory controllers
	Pro: Keep more of the capacity
	Con: Nightmare to enable. Have to choose between allocating from
	     mirrored memory for safety vs. NUMA local memory for performance

Gen3: Address range partial memory mirror - some mirror on each memory
      controller
	Pro: Can tune the amount of mirror and keep NUMA performance
	Con: I have to write memory management code to implement

The current plan is just to use mirrored memory for kernel allocations.
This has been broken into two phases:

1) This patch series - find the mirrored memory, use it for boot time
   allocations

2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the
   unused mirrored memory from mm/memblock.c and only give it out to
   select kernel allocations (this is still being scoped because
   page_alloc.c is scary).

This patch (of 3):

Add extra "flags" to memblock to allow selection of memory based on
attribute.  No functional changes

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Xiexiuqi <xiexiuqi@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:44 -07:00
Linus Torvalds
4e241557fc The bulk of the changes here is for x86. And for once it's not
for silicon that no one owns: these are really new features for
 everyone.
 
 * ARM: several features are in progress but missed the 4.2 deadline.
 So here is just a smattering of bug fixes, plus enabling the VFIO
 integration.
 
 * s390: Some fixes/refactorings/optimizations, plus support for
 2GB pages.
 
 * x86: 1) host and guest support for marking kvmclock as a stable
 scheduler clock. 2) support for write combining. 3) support for
 system management mode, needed for secure boot in guests. 4) a bunch
 of cleanups required for 2+3.  5) support for virtualized performance
 counters on AMD; 6) legacy PCI device assignment is deprecated and
 defaults to "n" in Kconfig; VFIO replaces it.  On top of this there are
 also bug fixes and eager FPU context loading for FPU-heavy guests.
 
 * Common code: Support for multiple address spaces; for now it is
 used only for x86 SMM but the s390 folks also have plans.
 
 There are some x86 conflicts, one with the rc8 pull request and
 the rest with Ingo's FPU rework.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJViYzhAAoJEL/70l94x66Dda0H/1IepMbfEy+o849d5G71fNTs
 F8Y8qUP2GZuL7T53FyFUGSBw+AX7kimu9ia4gR/PmDK+QYsdosYeEjwlsolZfTBf
 sHuzNtPoJhi5o1o/ur4NGameo0WjGK8f1xyzr+U8z74QDQyQv/QYCdK/4isp4BJL
 ugHNHkuROX6Zng4i7jc9rfaSRg29I3GBxQUYpMkEnD3eMYMUBWGm6Rs8pHgGAMvL
 vqzntgW00WNxehTqcAkmD/Wv+txxhkvIadZnjgaxH49e9JeXeBKTIR5vtb7Hns3s
 SuapZUyw+c95DIipXq4EznxxaOrjbebOeFgLCJo8+XMXZum8RZf/ob24KroYad0=
 =YsAR
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull first batch of KVM updates from Paolo Bonzini:
 "The bulk of the changes here is for x86.  And for once it's not for
  silicon that no one owns: these are really new features for everyone.

  Details:

   - ARM:
        several features are in progress but missed the 4.2 deadline.
        So here is just a smattering of bug fixes, plus enabling the
        VFIO integration.

   - s390:
        Some fixes/refactorings/optimizations, plus support for 2GB
        pages.

   - x86:
        * host and guest support for marking kvmclock as a stable
          scheduler clock.
        * support for write combining.
        * support for system management mode, needed for secure boot in
          guests.
        * a bunch of cleanups required for the above
        * support for virtualized performance counters on AMD
        * legacy PCI device assignment is deprecated and defaults to "n"
          in Kconfig; VFIO replaces it

        On top of this there are also bug fixes and eager FPU context
        loading for FPU-heavy guests.

   - Common code:
        Support for multiple address spaces; for now it is used only for
        x86 SMM but the s390 folks also have plans"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
  KVM: s390: clear floating interrupt bitmap and parameters
  KVM: x86/vPMU: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRs
  KVM: x86/vPMU: Implement AMD vPMU code for KVM
  KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch
  KVM: x86/vPMU: introduce kvm_pmu_msr_idx_to_pmc
  KVM: x86/vPMU: reorder PMU functions
  KVM: x86/vPMU: whitespace and stylistic adjustments in PMU code
  KVM: x86/vPMU: use the new macros to go between PMC, PMU and VCPU
  KVM: x86/vPMU: introduce pmu.h header
  KVM: x86/vPMU: rename a few PMU functions
  KVM: MTRR: do not map huge page for non-consistent range
  KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type
  KVM: MTRR: introduce mtrr_for_each_mem_type
  KVM: MTRR: introduce fixed_mtrr_addr_* functions
  KVM: MTRR: sort variable MTRRs
  KVM: MTRR: introduce var_mtrr_range
  KVM: MTRR: introduce fixed_mtrr_segment table
  KVM: MTRR: improve kvm_mtrr_get_guest_memory_type
  KVM: MTRR: do not split 64 bits MSR content
  KVM: MTRR: clean up mtrr default type
  ...
2015-06-24 09:36:49 -07:00
Linus Torvalds
43c9fad942 Power management and ACPI material for v4.2-rc1
- ACPICA update to upstream revision 20150515 including basic
    support for ACPI 6 features: new ACPI tables introduced by
    ACPI 6 (STAO, XENV, WPBT, NFIT, IORT), changes related to the
    other tables (DTRM, FADT, LPIT, MADT), new predefined names
    (_BTH, _CR3, _DSD, _LPI, _MTL, _PRR, _RDI, _RST, _TFP, _TSN),
    fixes and cleanups (Bob Moore, Lv Zheng).
 
  - ACPI device power management core code update to follow ACPI 6
    which reflects the ACPI device power management implementation
    in Windows (Rafael J Wysocki).
 
  - Rework of the backlight interface selection logic to reduce the
    number of kernel command line options and improve the handling
    of DMI quirks that may be involved in that and to make the
    code generally more straightforward (Hans de Goede).
 
  - Fixes for the ACPI Embedded Controller (EC) driver related to
    the handling of EC transactions (Lv Zheng).
 
  - Fix for a regression related to the ACPI resources management
    and resulting from a recent change of ACPI initialization code
    ordering (Rafael J Wysocki).
 
  - Fix for a system initialization regression related to ACPI
    introduced during the 3.14 cycle and caused by running the
    code that switches the platform over to the ACPI mode too
    early in the initialization sequence (Rafael J Wysocki).
 
  - Support for the ACPI _CCA device configuration object related
    to DMA cache coherence (Suravee Suthikulpanit).
 
  - ACPI/APEI fixes and cleanups (Jiri Kosina, Borislav Petkov).
 
  - ACPI battery driver cleanups (Luis Henriques, Mathias Krause).
 
  - ACPI processor driver cleanups (Hanjun Guo).
 
  - Cleanups and documentation update related to the ACPI device
    properties interface based on _DSD (Rafael J Wysocki).
 
  - ACPI device power management fixes (Rafael J Wysocki).
 
  - Assorted cleanups related to ACPI (Dominik Brodowski. Fabian
    Frederick, Lorenzo Pieralisi, Mathias Krause, Rafael J Wysocki).
 
  - Fix for a long-standing issue causing General Protection Faults
    to be generated occasionally on return to user space after resume
    from ACPI-based suspend-to-RAM on 32-bit x86 (Ingo Molnar).
 
  - Fix to make the suspend core code return -EBUSY consistently in
    all cases when system suspend is aborted due to wakeup detection
    (Ruchi Kandoi).
 
  - Support for automated device wakeup IRQ handling allowing drivers
    to make their PM support more starightforward (Tony Lindgren).
 
  - New tracepoints for suspend-to-idle tracing and rework of the
    prepare/complete callbacks tracing in the PM core (Todd E Brandt,
    Rafael J Wysocki).
 
  - Wakeup sources framework enhancements (Jin Qian).
 
  - New macro for noirq system PM callbacks (Grygorii Strashko).
 
  - Assorted cleanups related to system suspend (Rafael J Wysocki).
 
  - cpuidle core cleanups to make the code more efficient (Rafael J
    Wysocki).
 
  - powernv/pseries cpuidle driver update (Shilpasri G Bhat).
 
  - cpufreq core fixes related to CPU online/offline that should
    reduce the overhead of these operations quite a bit, unless the
    CPU in question is physically going away (Viresh Kumar, Saravana
    Kannan).
 
  - Serialization of cpufreq governor callbacks to avoid race
    conditions in some cases (Viresh Kumar).
 
  - intel_pstate driver fixes and cleanups (Doug Smythies, Prarit
    Bhargava, Joe Konno).
 
  - cpufreq driver (arm_big_little, cpufreq-dt, qoriq) updates (Sudeep
    Holla, Felipe Balbi, Tang Yuantian).
 
  - Assorted cleanups in cpufreq drivers and core (Shailendra Verma,
    Fabian Frederick, Wang Long).
 
  - New Device Tree bindings for representing Operating Performance
    Points (Viresh Kumar).
 
  - Updates for the common clock operations support code in the PM
    core (Rajendra Nayak, Geert Uytterhoeven).
 
  - PM domains core code update (Geert Uytterhoeven).
 
  - Intel Knights Landing support for the RAPL (Running Average Power
    Limit) power capping driver (Dasaratharaman Chandramouli).
 
  - Fixes related to the floor frequency setting on Atom SoCs in the
    RAPL power capping driver (Ajay Thomas).
 
  - Runtime PM framework documentation update (Ben Dooks).
 
  - cpupower tool fix (Herton R Krzesinski).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJViJdWAAoJEILEb/54YlRx/9gP/3gHoFevNRycvn0VpKqdufCI
 Mxy2LBBLlfyW2uD3+NvqvA2WWSo0Cs/LgXa04eAVxPdU7k48s8w+54U23wSouzjW
 gfwAmuHxzDR8v0h8X3h6BxNzmkIQHtmDcQlA/cZdHejY/UUw01yxRGNUUZDNbxlm
 WXn2nmlBLmGqXTYq0fpBV+3jicUghJqHHsBCqa3VR2yQioHMJG01F4UZMqYTZunN
 OIvDUghxByKz6alzdCqlLl1Y0exV6vwWUAzBsl1qHqmHu/bWFSZn3ujNNVrjqHhw
 Kl7/8dC2pQkv3Zo3gEVvfQ0onotwWZxGHzPQRdvmxvRnBunQVCi/wynx90yABX/r
 PPb/iBNV0mZskbF0zb0GZT3ZZWGA8Z0p3o5JQv2jV4m62qTzx8w50Y5kbn9N1WT+
 5bre7AVbVAlGonWszcS9iE+6TOboRz9OD1CCwPFXHItFutlBkau+1hHfFoLM0o9n
 LhpGuyszT/EUa1BHkLzuCckFqO2DpbF3N2CKmuTekw0CdgdsvRL2pRByuerk3j7R
 WQhlcvBq5YH6j43AuoEZKp8r1iN8oG/iqlrMYQaYWrW9hJaoQOoU8dGJxp/e7gKN
 r/qeYjETI+tIsjCbtH5WQzzxDI3gPISAYAtfqs7G34EEo+Lwp6kyRUAF4kDot2V3
 ZIyuKMmTu4cdwDETr/O+
 =7jTj
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael Wysocki:
 "The rework of backlight interface selection API from Hans de Goede
  stands out from the number of commits and the number of affected
  places perspective.  The cpufreq core fixes from Viresh Kumar are
  quite significant too as far as the number of commits goes and because
  they should reduce CPU online/offline overhead quite a bit in the
  majority of cases.

  From the new featues point of view, the ACPICA update (to upstream
  revision 20150515) adding support for new ACPI 6 material to ACPICA is
  the one that matters the most as some new significant features will be
  based on it going forward.  Also included is an update of the ACPI
  device power management core to follow ACPI 6 (which in turn reflects
  the Windows' device PM implementation), a PM core extension to support
  wakeup interrupts in a more generic way and support for the ACPI _CCA
  device configuration object.

  The rest is mostly fixes and cleanups all over and some documentation
  updates, including new DT bindings for Operating Performance Points.

  There is one fix for a regression introduced in the 4.1 cycle, but it
  adds quite a number of lines of code, it wasn't really ready before
  Thursday and you were on vacation, so I refrained from pushing it on
  the last minute for 4.1.

  Specifics:

   - ACPICA update to upstream revision 20150515 including basic support
     for ACPI 6 features: new ACPI tables introduced by ACPI 6 (STAO,
     XENV, WPBT, NFIT, IORT), changes related to the other tables (DTRM,
     FADT, LPIT, MADT), new predefined names (_BTH, _CR3, _DSD, _LPI,
     _MTL, _PRR, _RDI, _RST, _TFP, _TSN), fixes and cleanups (Bob Moore,
     Lv Zheng).

   - ACPI device power management core code update to follow ACPI 6
     which reflects the ACPI device power management implementation in
     Windows (Rafael J Wysocki).

   - rework of the backlight interface selection logic to reduce the
     number of kernel command line options and improve the handling of
     DMI quirks that may be involved in that and to make the code
     generally more straightforward (Hans de Goede).

   - fixes for the ACPI Embedded Controller (EC) driver related to the
     handling of EC transactions (Lv Zheng).

   - fix for a regression related to the ACPI resources management and
     resulting from a recent change of ACPI initialization code ordering
     (Rafael J Wysocki).

   - fix for a system initialization regression related to ACPI
     introduced during the 3.14 cycle and caused by running the code
     that switches the platform over to the ACPI mode too early in the
     initialization sequence (Rafael J Wysocki).

   - support for the ACPI _CCA device configuration object related to
     DMA cache coherence (Suravee Suthikulpanit).

   - ACPI/APEI fixes and cleanups (Jiri Kosina, Borislav Petkov).

   - ACPI battery driver cleanups (Luis Henriques, Mathias Krause).

   - ACPI processor driver cleanups (Hanjun Guo).

   - cleanups and documentation update related to the ACPI device
     properties interface based on _DSD (Rafael J Wysocki).

   - ACPI device power management fixes (Rafael J Wysocki).

   - assorted cleanups related to ACPI (Dominik Brodowski, Fabian
     Frederick, Lorenzo Pieralisi, Mathias Krause, Rafael J Wysocki).

   - fix for a long-standing issue causing General Protection Faults to
     be generated occasionally on return to user space after resume from
     ACPI-based suspend-to-RAM on 32-bit x86 (Ingo Molnar).

   - fix to make the suspend core code return -EBUSY consistently in all
     cases when system suspend is aborted due to wakeup detection (Ruchi
     Kandoi).

   - support for automated device wakeup IRQ handling allowing drivers
     to make their PM support more starightforward (Tony Lindgren).

   - new tracepoints for suspend-to-idle tracing and rework of the
     prepare/complete callbacks tracing in the PM core (Todd E Brandt,
     Rafael J Wysocki).

   - wakeup sources framework enhancements (Jin Qian).

   - new macro for noirq system PM callbacks (Grygorii Strashko).

   - assorted cleanups related to system suspend (Rafael J Wysocki).

   - cpuidle core cleanups to make the code more efficient (Rafael J
     Wysocki).

   - powernv/pseries cpuidle driver update (Shilpasri G Bhat).

   - cpufreq core fixes related to CPU online/offline that should reduce
     the overhead of these operations quite a bit, unless the CPU in
     question is physically going away (Viresh Kumar, Saravana Kannan).

   - serialization of cpufreq governor callbacks to avoid race
     conditions in some cases (Viresh Kumar).

   - intel_pstate driver fixes and cleanups (Doug Smythies, Prarit
     Bhargava, Joe Konno).

   - cpufreq driver (arm_big_little, cpufreq-dt, qoriq) updates (Sudeep
     Holla, Felipe Balbi, Tang Yuantian).

   - assorted cleanups in cpufreq drivers and core (Shailendra Verma,
     Fabian Frederick, Wang Long).

   - new Device Tree bindings for representing Operating Performance
     Points (Viresh Kumar).

   - updates for the common clock operations support code in the PM core
     (Rajendra Nayak, Geert Uytterhoeven).

   - PM domains core code update (Geert Uytterhoeven).

   - Intel Knights Landing support for the RAPL (Running Average Power
     Limit) power capping driver (Dasaratharaman Chandramouli).

   - fixes related to the floor frequency setting on Atom SoCs in the
     RAPL power capping driver (Ajay Thomas).

   - runtime PM framework documentation update (Ben Dooks).

   - cpupower tool fix (Herton R Krzesinski)"

* tag 'pm+acpi-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (194 commits)
  cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state
  x86: Load __USER_DS into DS/ES after resume
  PM / OPP: Add binding for 'opp-suspend'
  PM / OPP: Allow multiple OPP tables to be passed via DT
  PM / OPP: Add new bindings to address shortcomings of existing bindings
  ACPI: Constify ACPI device IDs in documentation
  ACPI / enumeration: Document the rules regarding the PRP0001 device ID
  ACPI / video: Make acpi_video_unregister_backlight() private
  acpi-video-detect: Remove old API
  toshiba-acpi: Port to new backlight interface selection API
  thinkpad-acpi: Port to new backlight interface selection API
  sony-laptop: Port to new backlight interface selection API
  samsung-laptop: Port to new backlight interface selection API
  msi-wmi: Port to new backlight interface selection API
  msi-laptop: Port to new backlight interface selection API
  intel-oaktrail: Port to new backlight interface selection API
  ideapad-laptop: Port to new backlight interface selection API
  fujitsu-laptop: Port to new backlight interface selection API
  eeepc-laptop: Port to new backlight interface selection API
  dell-wmi: Port to new backlight interface selection API
  ...
2015-06-23 14:18:07 -07:00
Linus Torvalds
0faef837e4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fixes from Jiri Kosina:

 - symbol lookup locking fix, from Miroslav Benes

 - error handling improvements in case of failure of the module coming
   notifier, from Minfei Huang

 - we were too pessimistic when kASLR has been enabled on x86 and were
   dropping address hints on the floor unnecessarily in such case.  Fix
   from Jiri Kosina

 - a few other small fixes and cleanups

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: add module locking around kallsyms calls
  livepatch: annotate klp_init() with __init
  livepatch: introduce patch/func-walking helpers
  livepatch: make kobject in klp_object statically allocated
  livepatch: Prevent patch inconsistencies if the coming module notifier fails
  livepatch: match return value to function signature
  x86: kaslr: fix build due to missing ALIGN definition
  livepatch: x86: make kASLR logic more accurate
  x86: introduce kaslr_offset()
2015-06-23 14:07:26 -07:00
Linus Torvalds
d8133356e9 PCI changes for the v4.2 merge window:
Enumeration
     - Move pci_ari_enabled() to global header (Alex Williamson)
     - Account for ARI in _PRT lookups (Alex Williamson)
     - Remove unused pci_scan_bus_parented() (Yijing Wang)
 
   Resource management
     - Use host bridge _CRS info on systems with >32 bit addressing (Bjorn Helgaas)
     - Use host bridge _CRS info on Foxconn K8M890-8237A (Bjorn Helgaas)
     - Fix pci_address_to_pio() conversion of CPU address to I/O port (Zhichang Yuan)
     - Add pci_bus_addr_t (Yinghai Lu)
 
   PCI device hotplug
     - Wait for pciehp command completion where necessary (Alex Williamson)
     - Drop pointless ACPI-based "slot detection" check (Rafael J. Wysocki)
     - Check ignore_hotplug for all downstream devices (Rafael J. Wysocki)
     - Propagate the "ignore hotplug" setting to parent (Rafael J. Wysocki)
     - Inline pciehp "handle event" functions into the ISR (Bjorn Helgaas)
     - Clean up pciehp debug logging (Bjorn Helgaas)
 
   Power management
     - Remove redundant PCIe port type checking (Yijing Wang)
     - Add dev->has_secondary_link to track downstream PCIe links (Yijing Wang)
     - Use dev->has_secondary_link to find downstream links for ASPM (Yijing Wang)
     - Drop __pci_disable_link_state() useless "force" parameter (Bjorn Helgaas)
     - Simplify Clock Power Management setting (Bjorn Helgaas)
 
   Virtualization
     - Add ACS quirks for Intel 9-series PCH root ports (Alex Williamson)
     - Add function 1 DMA alias quirk for Marvell 9120 (Sakari Ailus)
 
   MSI
     - Disable MSI at enumeration even if kernel doesn't support MSI (Michael S. Tsirkin)
     - Remove unused pci_msi_off() (Bjorn Helgaas)
     - Rename msi_set_enable(), msix_clear_and_set_ctrl() (Michael S.  Tsirkin)
     - Export pci_msi_set_enable(), pci_msix_clear_and_set_ctrl() (Michael S. Tsirkin)
     - Drop pci_msi_off() calls during probe (Michael S. Tsirkin)
 
   APM X-Gene host bridge driver
     - Add APM X-Gene v1 PCIe MSI/MSIX termination driver (Duc Dang)
     - Add APM X-Gene PCIe MSI DTS nodes (Duc Dang)
     - Disable Configuration Request Retry Status for v1 silicon (Duc Dang)
     - Allow config access to Root Port even when link is down (Duc Dang)
 
   Broadcom iProc host bridge driver
     - Allow override of device tree IRQ mapping function (Hauke Mehrtens)
     - Add BCMA PCIe driver (Hauke Mehrtens)
     - Directly add PCI resources (Hauke Mehrtens)
     - Free resource list after registration (Hauke Mehrtens)
 
   Freescale i.MX6 host bridge driver
     - Add speed change timeout message (Troy Kisky)
     - Rename imx6_pcie_start_link() to imx6_pcie_establish_link() (Bjorn Helgaas)
 
   Freescale Layerscape host bridge driver
     - Use dw_pcie_link_up() consistently (Bjorn Helgaas)
     - Factor out ls_pcie_establish_link() (Bjorn Helgaas)
 
   Marvell MVEBU host bridge driver
     - Remove mvebu_pcie_scan_bus() (Yijing Wang)
 
   NVIDIA Tegra host bridge driver
     - Remove tegra_pcie_scan_bus() (Yijing Wang)
 
   Synopsys DesignWare host bridge driver
     - Consolidate outbound iATU programming functions (Jisheng Zhang)
     - Use iATU0 for cfg and IO, iATU1 for MEM (Jisheng Zhang)
     - Add support for x8 links (Zhou Wang)
     - Wait for link to come up with consistent style (Bjorn Helgaas)
     - Use pci_scan_root_bus() for simplicity (Yijing Wang)
 
   TI DRA7xx host bridge driver
     - Use dw_pcie_link_up() consistently (Bjorn Helgaas)
 
   Miscellaneous
     - Include <linux/pci.h>, not <asm/pci.h> (Bjorn Helgaas)
     - Remove unnecessary #includes of <asm/pci.h> (Bjorn Helgaas)
     - Remove unused pcibios_select_root() (again) (Bjorn Helgaas)
     - Remove unused pci_dma_burst_advice() (Bjorn Helgaas)
     - xen/pcifront: Don't use deprecated function pci_scan_bus_parented() (Arnd Bergmann)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViCSWAAoJEFmIoMA60/r8zX8P/1DPNnk+8xSQe3dYjnG8VW3P
 GPxeCqLMkjiF3ffxcLDzsgrHMjZEb8Co67WePs0k5V0lbZevoIwUo48+oO9B5jhc
 H5DuPZHyTHeOvaZv4GUY5vq/1DBh4JXmJc2V/BkaJ6qhXckF+SCam9C+s0p4950o
 QX/ifOjg/VHzmhaiL7wymJOzuniZmIttl+y+nzkl3AUJ+T6ZtQbUhz+8GZ3lj7Ma
 F+7JHhvm9K8Ljajxb6BLWTw4xgHA6ZN5PtYEx+Sl9QBYSsGfL7LnqyYD3KhJ7KV5
 4AHNJGEVhzNwSuyh+VQx1tNm7OHOqkAaTsYdCVUZRow+6CPd8P75QOMtpl+SmPJB
 RV1BAO75OTGqKg0B9IDg855y4Nh+4/dKoZlBPzpp7+cKw3ylaRAsNnaZ9ik5D62v
 RR06CFgWGHwDXSObgbRm4v0HwfAIHWWJzrPqAZmElh2dzb1Lv1I3AbB1SClCN6sl
 fnAu6CAwA47A5GT8xW3L0oQXdcSmdNUdNzZrsfDnOBIQWMsF+zBFKr6sTABVgyxp
 /WEJaNlvx8Zlq0bZlhGDdsGSbFNFzhX4avWZtXhvdcqFzH0KaVghYSayYvJE9Haq
 oakWqS+GZ3x40j+rdrgLg98AWRVraE1MvV1A7N9TIGjuuKqqbZfSP8kvX3QRQQhO
 Z2+X5hMM0s/tdYtADYu/
 =Qw+j
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "PCI changes for the v4.2 merge window:

  Enumeration
    - Move pci_ari_enabled() to global header (Alex Williamson)
    - Account for ARI in _PRT lookups (Alex Williamson)
    - Remove unused pci_scan_bus_parented() (Yijing Wang)

  Resource management
    - Use host bridge _CRS info on systems with >32 bit addressing (Bjorn Helgaas)
    - Use host bridge _CRS info on Foxconn K8M890-8237A (Bjorn Helgaas)
    - Fix pci_address_to_pio() conversion of CPU address to I/O port (Zhichang Yuan)
    - Add pci_bus_addr_t (Yinghai Lu)

  PCI device hotplug
    - Wait for pciehp command completion where necessary (Alex Williamson)
    - Drop pointless ACPI-based "slot detection" check (Rafael J. Wysocki)
    - Check ignore_hotplug for all downstream devices (Rafael J. Wysocki)
    - Propagate the "ignore hotplug" setting to parent (Rafael J. Wysocki)
    - Inline pciehp "handle event" functions into the ISR (Bjorn Helgaas)
    - Clean up pciehp debug logging (Bjorn Helgaas)

  Power management
    - Remove redundant PCIe port type checking (Yijing Wang)
    - Add dev->has_secondary_link to track downstream PCIe links (Yijing Wang)
    - Use dev->has_secondary_link to find downstream links for ASPM (Yijing Wang)
    - Drop __pci_disable_link_state() useless "force" parameter (Bjorn Helgaas)
    - Simplify Clock Power Management setting (Bjorn Helgaas)

  Virtualization
    - Add ACS quirks for Intel 9-series PCH root ports (Alex Williamson)
    - Add function 1 DMA alias quirk for Marvell 9120 (Sakari Ailus)

  MSI
    - Disable MSI at enumeration even if kernel doesn't support MSI (Michael S. Tsirkin)
    - Remove unused pci_msi_off() (Bjorn Helgaas)
    - Rename msi_set_enable(), msix_clear_and_set_ctrl() (Michael S.  Tsirkin)
    - Export pci_msi_set_enable(), pci_msix_clear_and_set_ctrl() (Michael S. Tsirkin)
    - Drop pci_msi_off() calls during probe (Michael S. Tsirkin)

  APM X-Gene host bridge driver
    - Add APM X-Gene v1 PCIe MSI/MSIX termination driver (Duc Dang)
    - Add APM X-Gene PCIe MSI DTS nodes (Duc Dang)
    - Disable Configuration Request Retry Status for v1 silicon (Duc Dang)
    - Allow config access to Root Port even when link is down (Duc Dang)

  Broadcom iProc host bridge driver
    - Allow override of device tree IRQ mapping function (Hauke Mehrtens)
    - Add BCMA PCIe driver (Hauke Mehrtens)
    - Directly add PCI resources (Hauke Mehrtens)
    - Free resource list after registration (Hauke Mehrtens)

  Freescale i.MX6 host bridge driver
    - Add speed change timeout message (Troy Kisky)
    - Rename imx6_pcie_start_link() to imx6_pcie_establish_link() (Bjorn Helgaas)

  Freescale Layerscape host bridge driver
    - Use dw_pcie_link_up() consistently (Bjorn Helgaas)
    - Factor out ls_pcie_establish_link() (Bjorn Helgaas)

  Marvell MVEBU host bridge driver
    - Remove mvebu_pcie_scan_bus() (Yijing Wang)

  NVIDIA Tegra host bridge driver
    - Remove tegra_pcie_scan_bus() (Yijing Wang)

  Synopsys DesignWare host bridge driver
    - Consolidate outbound iATU programming functions (Jisheng Zhang)
    - Use iATU0 for cfg and IO, iATU1 for MEM (Jisheng Zhang)
    - Add support for x8 links (Zhou Wang)
    - Wait for link to come up with consistent style (Bjorn Helgaas)
    - Use pci_scan_root_bus() for simplicity (Yijing Wang)

  TI DRA7xx host bridge driver
    - Use dw_pcie_link_up() consistently (Bjorn Helgaas)

  Miscellaneous
    - Include <linux/pci.h>, not <asm/pci.h> (Bjorn Helgaas)
    - Remove unnecessary #includes of <asm/pci.h> (Bjorn Helgaas)
    - Remove unused pcibios_select_root() (again) (Bjorn Helgaas)
    - Remove unused pci_dma_burst_advice() (Bjorn Helgaas)
    - xen/pcifront: Don't use deprecated function pci_scan_bus_parented() (Arnd Bergmann)"

* tag 'pci-v4.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (58 commits)
  PCI: pciehp: Inline the "handle event" functions into the ISR
  PCI: pciehp: Rename queue_interrupt_event() to pciehp_queue_interrupt_event()
  PCI: pciehp: Make queue_interrupt_event() void
  PCI: xgene: Allow config access to Root Port even when link is down
  PCI: xgene: Disable Configuration Request Retry Status for v1 silicon
  PCI: pciehp: Clean up debug logging
  x86/PCI: Use host bridge _CRS info on systems with >32 bit addressing
  PCI: imx6: Add #define PCIE_RC_LCSR
  PCI: imx6: Use "u32", not "uint32_t"
  PCI: Remove unused pci_scan_bus_parented()
  xen/pcifront: Don't use deprecated function pci_scan_bus_parented()
  PCI: imx6: Add speed change timeout message
  PCI/ASPM: Simplify Clock Power Management setting
  PCI: designware: Wait for link to come up with consistent style
  PCI: layerscape: Factor out ls_pcie_establish_link()
  PCI: layerscape: Use dw_pcie_link_up() consistently
  PCI: dra7xx: Use dw_pcie_link_up() consistently
  x86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237A
  PCI: pciehp: Wait for hotplug command completion where necessary
  PCI: Remove unused pci_dma_burst_advice()
  ...
2015-06-23 13:41:24 -07:00
Linus Torvalds
43224b96af Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "A rather largish update for everything time and timer related:

   - Cache footprint optimizations for both hrtimers and timer wheel

   - Lower the NOHZ impact on systems which have NOHZ or timer migration
     disabled at runtime.

   - Optimize run time overhead of hrtimer interrupt by making the clock
     offset updates smarter

   - hrtimer cleanups and removal of restrictions to tackle some
     problems in sched/perf

   - Some more leap second tweaks

   - Another round of changes addressing the 2038 problem

   - First step to change the internals of clock event devices by
     introducing the necessary infrastructure

   - Allow constant folding for usecs/msecs_to_jiffies()

   - The usual pile of clockevent/clocksource driver updates

  The hrtimer changes contain updates to sched, perf and x86 as they
  depend on them plus changes all over the tree to cleanup API changes
  and redundant code, which got copied all over the place.  The y2038
  changes touch s390 to remove the last non 2038 safe code related to
  boot/persistant clock"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  clocksource: Increase dependencies of timer-stm32 to limit build wreckage
  timer: Minimize nohz off overhead
  timer: Reduce timer migration overhead if disabled
  timer: Stats: Simplify the flags handling
  timer: Replace timer base by a cpu index
  timer: Use hlist for the timer wheel hash buckets
  timer: Remove FIFO "guarantee"
  timers: Sanitize catchup_timer_jiffies() usage
  hrtimer: Allow hrtimer::function() to free the timer
  seqcount: Introduce raw_write_seqcount_barrier()
  seqcount: Rename write_seqcount_barrier()
  hrtimer: Fix hrtimer_is_queued() hole
  hrtimer: Remove HRTIMER_STATE_MIGRATE
  selftest: Timers: Avoid signal deadlock in leap-a-day
  timekeeping: Copy the shadow-timekeeper over the real timekeeper last
  clockevents: Check state instead of mode in suspend/resume path
  selftests: timers: Add leap-second timer edge testing to leap-a-day.c
  ntp: Do leapsecond adjustment in adjtimex read path
  time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge
  ntp: Introduce and use SECS_PER_DAY macro instead of 86400
  ...
2015-06-22 18:57:44 -07:00
Linus Torvalds
d70b3ef54c Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Ingo Molnar:
 "There were so many changes in the x86/asm, x86/apic and x86/mm topics
  in this cycle that the topical separation of -tip broke down somewhat -
  so the result is a more traditional architecture pull request,
  collected into the 'x86/core' topic.

  The topics were still maintained separately as far as possible, so
  bisectability and conceptual separation should still be pretty good -
  but there were a handful of merge points to avoid excessive
  dependencies (and conflicts) that would have been poorly tested in the
  end.

  The next cycle will hopefully be much more quiet (or at least will
  have fewer dependencies).

  The main changes in this cycle were:

   * x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas
     Gleixner)

     - This is the second and most intrusive part of changes to the x86
       interrupt handling - full conversion to hierarchical interrupt
       domains:

          [IOAPIC domain]   -----
                                 |
          [MSI domain]      --------[Remapping domain] ----- [ Vector domain ]
                                 |   (optional)          |
          [HPET MSI domain] -----                        |
                                                         |
          [DMAR domain]     -----------------------------
                                                         |
          [Legacy domain]   -----------------------------

       This now reflects the actual hardware and allowed us to distangle
       the domain specific code from the underlying parent domain, which
       can be optional in the case of interrupt remapping.  It's a clear
       separation of functionality and removes quite some duct tape
       constructs which plugged the remap code between ioapic/msi/hpet
       and the vector management.

     - Intel IOMMU IRQ remapping enhancements, to allow direct interrupt
       injection into guests (Feng Wu)

   * x86/asm changes:

     - Tons of cleanups and small speedups, micro-optimizations.  This
       is in preparation to move a good chunk of the low level entry
       code from assembly to C code (Denys Vlasenko, Andy Lutomirski,
       Brian Gerst)

     - Moved all system entry related code to a new home under
       arch/x86/entry/ (Ingo Molnar)

     - Removal of the fragile and ugly CFI dwarf debuginfo annotations.
       Conversion to C will reintroduce many of them - but meanwhile
       they are only getting in the way, and the upstream kernel does
       not rely on them (Ingo Molnar)

     - NOP handling refinements. (Borislav Petkov)

   * x86/mm changes:

     - Big PAT and MTRR rework: making the code more robust and
       preparing to phase out exposing direct MTRR interfaces to drivers -
       in favor of using PAT driven interfaces (Toshi Kani, Luis R
       Rodriguez, Borislav Petkov)

     - New ioremap_wt()/set_memory_wt() interfaces to support
       Write-Through cached memory mappings.  This is especially
       important for good performance on NVDIMM hardware (Toshi Kani)

   * x86/ras changes:

     - Add support for deferred errors on AMD (Aravind Gopalakrishnan)

       This is an important RAS feature which adds hardware support for
       poisoned data.  That means roughly that the hardware marks data
       which it has detected as corrupted but wasn't able to correct, as
       poisoned data and raises an APIC interrupt to signal that in the
       form of a deferred error.  It is the OS's responsibility then to
       take proper recovery action and thus prolonge system lifetime as
       far as possible.

     - Add support for Intel "Local MCE"s: upcoming CPUs will support
       CPU-local MCE interrupts, as opposed to the traditional system-
       wide broadcasted MCE interrupts (Ashok Raj)

     - Misc cleanups (Borislav Petkov)

   * x86/platform changes:

     - Intel Atom SoC updates

  ... and lots of other cleanups, fixlets and other changes - see the
  shortlog and the Git log for details"

* 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits)
  x86/hpet: Use proper hpet device number for MSI allocation
  x86/hpet: Check for irq==0 when allocating hpet MSI interrupts
  x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled
  x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled
  x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail
  genirq: Prevent crash in irq_move_irq()
  genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain
  iommu, x86: Properly handle posted interrupts for IOMMU hotplug
  iommu, x86: Provide irq_remapping_cap() interface
  iommu, x86: Setup Posted-Interrupts capability for Intel iommu
  iommu, x86: Add cap_pi_support() to detect VT-d PI capability
  iommu, x86: Avoid migrating VT-d posted interrupts
  iommu, x86: Save the mode (posted or remapped) of an IRTE
  iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
  iommu: dmar: Provide helper to copy shared irte fields
  iommu: dmar: Extend struct irte for VT-d Posted-Interrupts
  iommu: Add new member capability to struct irq_remap_ops
  x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code
  x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation
  x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry()
  ...
2015-06-22 17:59:09 -07:00
Linus Torvalds
650ec5a6bd Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 warning fixlet from Ingo Molnar:
 "A build fix for certain (rare) variants of binutils that did not make
  it into v4.1"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Fix overflow warning with 32-bit binutils
2015-06-22 17:51:59 -07:00
Linus Torvalds
35ffccdb7e Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pul x86 microcode updates from Ingo Molnar:
 "x86 microcode loader updates from Borislav Petkov:

   - early parsing of the built-in microcode

   - cleanups

   - misc smaller fixes"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Correct CPU family related variable types
  x86/microcode: Disable builtin microcode loading on 32-bit for now
  x86/microcode/intel: Rename get_matching_sig()
  x86/microcode/intel: Simplify get_matching_sig()
  x86/microcode/intel: Simplify update_match_cpu()
  x86/microcode/intel: Rename get_matching_microcode
  x86/cpu/microcode: Zap changelog
  x86/microcode: Parse built-in microcode early
  x86/microcode/intel: Remove unused @rev arg of get_matching_sig()
  x86/microcode/intel: Get rid of revision_is_newer()
2015-06-22 17:46:14 -07:00
Linus Torvalds
e2172d8fd5 Merge branch 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 kdump updates from Ingo Molnar:
 "Three kdump robustness related improvements (Joerg Roedel)"

* 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/crash: Allocate enough low memory when crashkernel=high
  x86/swiotlb: Try coherent allocations with __GFP_NOWARN
  swiotlb: Warn on allocation failure in swiotlb_alloc_coherent()
2015-06-22 17:40:55 -07:00
Linus Torvalds
e75c73ad64 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU updates from Ingo Molnar:
 "This tree contains two main changes:

   - The big FPU code rewrite: wide reaching cleanups and reorganization
     that pulls all the FPU code together into a clean base in
     arch/x86/fpu/.

     The resulting code is leaner and faster, and much easier to
     understand.  This enables future work to further simplify the FPU
     code (such as removing lazy FPU restores).

     By its nature these changes have a substantial regression risk: FPU
     code related bugs are long lived, because races are often subtle
     and bugs mask as user-space failures that are difficult to track
     back to kernel side backs.  I'm aware of no unfixed (or even
     suspected) FPU related regression so far.

   - MPX support rework/fixes.  As this is still not a released CPU
     feature, there were some buglets in the code - should be much more
     robust now (Dave Hansen)"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (250 commits)
  x86/fpu: Fix double-increment in setup_xstate_features()
  x86/mpx: Allow 32-bit binaries on 64-bit kernels again
  x86/mpx: Do not count MPX VMAs as neighbors when unmapping
  x86/mpx: Rewrite the unmap code
  x86/mpx: Support 32-bit binaries on 64-bit kernels
  x86/mpx: Use 32-bit-only cmpxchg() for 32-bit apps
  x86/mpx: Introduce new 'directory entry' to 'addr' helper function
  x86/mpx: Add temporary variable to reduce masking
  x86: Make is_64bit_mm() widely available
  x86/mpx: Trace allocation of new bounds tables
  x86/mpx: Trace the attempts to find bounds tables
  x86/mpx: Trace entry to bounds exception paths
  x86/mpx: Trace #BR exceptions
  x86/mpx: Introduce a boot-time disable flag
  x86/mpx: Restrict the mmap() size check to bounds tables
  x86/mpx: Remove redundant MPX_BNDCFG_ADDR_MASK
  x86/mpx: Clean up the code by not passing a task pointer around when unnecessary
  x86/mpx: Use the new get_xsave_field_ptr()API
  x86/fpu/xstate: Wrap get_xsave_addr() to make it safer
  x86/fpu/xstate: Fix up bad get_xsave_addr() assumptions
  ...
2015-06-22 17:16:11 -07:00
Linus Torvalds
b3ba283d83 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CPU features from Ingo Molnar:
 "Various CPU feature support related changes: in particular the
  /proc/cpuinfo model name sanitization change should be monitored, it
  has a chance to break stuff.  (but really shouldn't and there are no
  regression reports)"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/amd: Give access to the number of nodes in a physical package
  x86/cpu: Trim model ID whitespace
  x86/cpu: Strip any /proc/cpuinfo model name field whitespace
  x86/cpu/amd: Set X86_FEATURE_EXTD_APICID for future processors
  x86/gart: Check for GART support before accessing GART registers
2015-06-22 16:43:01 -07:00
Linus Torvalds
d43e4f44ba Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Clean up types in xlate_dev_mem_ptr() some more
  x86: Deinline dma_free_attrs()
  x86: Deinline dma_alloc_attrs()
  x86: Remove unused TI_cpu
  x86: Merge common 32-bit values in asm-offsets.c
2015-06-22 16:23:00 -07:00
Linus Torvalds
23b7776290 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes are:

   - lockless wakeup support for futexes and IPC message queues
     (Davidlohr Bueso, Peter Zijlstra)

   - Replace spinlocks with atomics in thread_group_cputimer(), to
     improve scalability (Jason Low)

   - NUMA balancing improvements (Rik van Riel)

   - SCHED_DEADLINE improvements (Wanpeng Li)

   - clean up and reorganize preemption helpers (Frederic Weisbecker)

   - decouple page fault disabling machinery from the preemption
     counter, to improve debuggability and robustness (David
     Hildenbrand)

   - SCHED_DEADLINE documentation updates (Luca Abeni)

   - topology CPU masks cleanups (Bartosz Golaszewski)

   - /proc/sched_debug improvements (Srikar Dronamraju)"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (79 commits)
  sched/deadline: Remove needless parameter in dl_runtime_exceeded()
  sched: Remove superfluous resetting of the p->dl_throttled flag
  sched/deadline: Drop duplicate init_sched_dl_class() declaration
  sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target
  sched/deadline: Make init_sched_dl_class() __init
  sched/deadline: Optimize pull_dl_task()
  sched/preempt: Add static_key() to preempt_notifiers
  sched/preempt: Fix preempt notifiers documentation about hlist_del() within unsafe iteration
  sched/stop_machine: Fix deadlock between multiple stop_two_cpus()
  sched/debug: Add sum_sleep_runtime to /proc/<pid>/sched
  sched/debug: Replace vruntime with wait_sum in /proc/sched_debug
  sched/debug: Properly format runnable tasks in /proc/sched_debug
  sched/numa: Only consider less busy nodes as numa balancing destinations
  Revert 095bebf61a ("sched/numa: Do not move past the balance point if unbalanced")
  sched/fair: Prevent throttling in early pick_next_task_fair()
  preempt: Reorganize the notrace definitions a bit
  preempt: Use preempt_schedule_context() as the official tracing preemption point
  sched: Make preempt_schedule_context() function-tracing safe
  x86: Remove cpu_sibling_mask() and cpu_core_mask()
  x86: Replace cpu_**_mask() with topology_**_cpumask()
  ...
2015-06-22 15:52:04 -07:00
Linus Torvalds
6bc4c3ad36 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "These are the left over fixes from the v4.1 cycle"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Fix build breakage if prefix= is specified
  perf/x86: Honor the architectural performance monitoring version
  perf/x86/intel: Fix PMI handling for Intel PT
  perf/x86/intel/bts: Fix DS area sharing with x86_pmu events
  perf/x86: Add more Broadwell model numbers
  perf: Fix ring_buffer_attach() RCU sync, again
2015-06-22 15:45:41 -07:00
Linus Torvalds
c58267e9fa Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes mostly consist of work on x86 PMU drivers:

   - x86 Intel PT (hardware CPU tracer) improvements (Alexander
     Shishkin)

   - x86 Intel CQM (cache quality monitoring) improvements (Thomas
     Gleixner)

   - x86 Intel PEBSv3 support (Peter Zijlstra)

   - x86 Intel PEBS interrupt batching support for lower overhead
     sampling (Zheng Yan, Kan Liang)

   - x86 PMU scheduler fixes and improvements (Peter Zijlstra)

  There's too many tooling improvements to list them all - here are a
  few select highlights:

  'perf bench':

      - Introduce new 'perf bench futex' benchmark: 'wake-parallel', to
        measure parallel waker threads generating contention for kernel
        locks (hb->lock). (Davidlohr Bueso)

  'perf top', 'perf report':

      - Allow disabling/enabling events dynamicaly in 'perf top':
        a 'perf top' session can instantly become a 'perf report'
        one, i.e. going from dynamic analysis to a static one,
        returning to a dynamic one is possible, to toogle the
        modes, just press 'f' to 'freeze/unfreeze' the sampling. (Arnaldo Carvalho de Melo)

      - Make Ctrl-C stop processing on TUI, allowing interrupting the load of big
        perf.data files (Namhyung Kim)

  'perf probe': (Masami Hiramatsu)

      - Support glob wildcards for function name
      - Support $params special probe argument: Collect all function arguments
      - Make --line checks validate C-style function name.
      - Add --no-inlines option to avoid searching inline functions
      - Greatly speed up 'perf probe --list' by caching debuginfo.
      - Improve --filter support for 'perf probe', allowing using its arguments
        on other commands, as --add, --del, etc.

  'perf sched':

      - Add option in 'perf sched' to merge like comms to lat output (Josef Bacik)

  Plus tons of infrastructure work - in particular preparation for
  upcoming threaded perf report support, but also lots of other work -
  and fixes and other improvements.  See (much) more details in the
  shortlog and in the git log"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (305 commits)
  perf tools: Configurable per thread proc map processing time out
  perf tools: Add time out to force stop proc map processing
  perf report: Fix sort__sym_cmp to also compare end of symbol
  perf hists browser: React to unassigned hotkey pressing
  perf top: Tell the user how to unfreeze events after pressing 'f'
  perf hists browser: Honour the help line provided by builtin-{top,report}.c
  perf hists browser: Do not exit when 'f' is pressed in 'report' mode
  perf top: Replace CTRL+z with 'f' as hotkey for enable/disable events
  perf annotate: Rename source_line_percent to source_line_samples
  perf annotate: Display total number of samples with --show-total-period
  perf tools: Ensure thread-stack is flushed
  perf top: Allow disabling/enabling events dynamicly
  perf evlist: Add toggle_enable() method
  perf trace: Fix race condition at the end of started workloads
  perf probe: Speed up perf probe --list by caching debuginfo
  perf probe: Show usage even if the last event is skipped
  perf tools: Move libtraceevent dynamic list to separated LDFLAGS variable
  perf tools: Fix a problem when opening old perf.data with different byte order
  perf tools: Ignore .config-detected in .gitignore
  perf probe: Fix to return error if no probe is added
  ...
2015-06-22 15:19:21 -07:00
Linus Torvalds
1bf7067c6e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes are:

   - 'qspinlock' support, enabled on x86: queued spinlocks - these are
     now the spinlock variant used by x86 as they outperform ticket
     spinlocks in every category.  (Waiman Long)

   - 'pvqspinlock' support on x86: paravirtualized variant of queued
     spinlocks.  (Waiman Long, Peter Zijlstra)

   - 'qrwlock' support, enabled on x86: queued rwlocks.  Similar to
     queued spinlocks, they are now the variant used by x86:

       CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
       CONFIG_QUEUED_SPINLOCKS=y
       CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
       CONFIG_QUEUED_RWLOCKS=y

   - various lockdep fixlets

   - various locking primitives cleanups, further WRITE_ONCE()
     propagation"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  locking/lockdep: Remove hard coded array size dependency
  locking/qrwlock: Don't contend with readers when setting _QW_WAITING
  lockdep: Do not break user-visible string
  locking/arch: Rename set_mb() to smp_store_mb()
  locking/arch: Add WRITE_ONCE() to set_mb()
  rtmutex: Warn if trylock is called from hard/softirq context
  arch: Remove __ARCH_HAVE_CMPXCHG
  locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG
  locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS
  locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS
  locking/pvqspinlock: Replace xchg() by the more descriptive set_mb()
  locking/pvqspinlock, x86: Enable PV qspinlock for Xen
  locking/pvqspinlock, x86: Enable PV qspinlock for KVM
  locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching
  locking/pvqspinlock: Implement simple paravirt support for the qspinlock
  locking/qspinlock: Revert to test-and-set on hypervisors
  locking/qspinlock: Use a simple write to grab the lock
  locking/qspinlock: Optimize for smaller NR_CPUS
  locking/qspinlock: Extract out code snippets for the next patch
  locking/qspinlock: Add pending bit
  ...
2015-06-22 14:54:22 -07:00
Linus Torvalds
fc934d4017 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:

 - Continued initialization/Kconfig updates: hide most Kconfig options
   from unsuspecting users.

   There's now a single high level configuration option:

        *
        * RCU Subsystem
        *
        Make expert-level adjustments to RCU configuration (RCU_EXPERT) [N/y/?] (NEW)

   Which if answered in the negative, leaves us with a single
   interactive configuration option:

        Offload RCU callback processing from boot-selected CPUs (RCU_NOCB_CPU) [N/y/?] (NEW)

   All the rest of the RCU options are configured automatically.  Later
   on we'll remove this single leftover configuration option as well.

 - Remove all uses of RCU-protected array indexes: replace the
   rcu_[access|dereference]_index_check() APIs with READ_ONCE() and
   rcu_lockdep_assert()

 - RCU CPU-hotplug cleanups

 - Updates to Tiny RCU: a race fix and further code shrinkage.

 - RCU torture-testing updates: fixes, speedups, cleanups and
   documentation updates.

 - Miscellaneous fixes

 - Documentation updates

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  rcutorture: Allow repetition factors in Kconfig-fragment lists
  rcutorture: Display "make oldconfig" errors
  rcutorture: Update TREE_RCU-kconfig.txt
  rcutorture: Make rcutorture scripts force RCU_EXPERT
  rcutorture: Update configuration fragments for rcutree.rcu_fanout_exact
  rcutorture: TASKS_RCU set directly, so don't explicitly set it
  rcutorture: Test SRCU cleanup code path
  rcutorture: Replace barriers with smp_store_release() and smp_load_acquire()
  locktorture: Change longdelay_us to longdelay_ms
  rcutorture: Allow negative values of nreaders to oversubscribe
  rcutorture: Exchange TREE03 and TREE08 NR_CPUS, speed up CPU hotplug
  rcutorture: Exchange TREE03 and TREE04 geometries
  locktorture: fix deadlock in 'rw_lock_irq' type
  rcu: Correctly handle non-empty Tiny RCU callback list with none ready
  rcutorture: Test both RCU-sched and RCU-bh for Tiny RCU
  rcu: Further shrink Tiny RCU by making empty functions static inlines
  rcu: Conditionally compile RCU's eqs warnings
  rcu: Remove prompt for RCU implementation
  rcu: Make RCU able to tolerate undefined CONFIG_RCU_KTHREAD_PRIO
  rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT_LEAF
  ...
2015-06-22 14:01:01 -07:00
Rafael J. Wysocki
3bcda76d9d Merge branch 'pm-sleep'
* pm-sleep:
  x86: Load __USER_DS into DS/ES after resume
2015-06-22 14:40:28 +02:00
Ingo Molnar
ffa64eff95 x86: Load __USER_DS into DS/ES after resume
Srinivas Pandruvada reported a problem with system resume from
suspend-to-RAM on 32-bit x86 systems where the DS register of
the CPU is set to __KERNEL_DS instead of __USER_DS on return
to user space which cases a General Protection Fault to occur.

The issue is that DS is set to __KERNEL_DS by the ACPI resume code
path while the SYSEXIT path never reloads DS/ES.  It assumes they
are still __USER_DS set at the SYSENTER time (Brian Gerst), so if
the return to user space happens to be through SYSEXIT, it will lead
to the reported GPF.

Fix the problem by setting the DS and ES registers to __USER_DS
as expected by the SYSEXIT path.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=61781
Link: http://marc.info/?l=linux-pm&m=143406648920385&w=2
Acked-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-22 14:40:03 +02:00
Ingo Molnar
7ef3d7d58d Merge branches 'x86/apic', 'x86/asm', 'x86/mm' and 'x86/platform' into x86/core, to merge last updates
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-22 09:15:03 +02:00
Thomas Gleixner
cb17b2a674 x86/hpet: Use proper hpet device number for MSI allocation
hpet_assign_irq() is called with hpet_device->num as "hardware
interrupt number", but hpet_device->num is initialized after the
interrupt has been assigned, so it's always 0. As a consequence only
the first MSI allocation succeeds, the following ones fail because the
"hardware interrupt number" already exists.

Move the initialization of dev->num and other fields before the call
to hpet_assign_irq(), which is the ordering before the offending
commit which introduced that regression.

Fixes: "3cb96f0c9733 x86/hpet: Enhance HPET IRQ to support hierarchical irqdomains"
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1506211635010.4107@nanos
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
2015-06-21 16:38:40 +02:00
Jiang Liu
bafac298fb x86/hpet: Check for irq==0 when allocating hpet MSI interrupts
irq == 0 is not a valid irq for a irqdomain MSI allocation, but hpet
code checks only for negative return values.

Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/558447AF.30703@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-20 12:00:58 +02:00
Borislav Petkov
04c17341b4 x86/boot: Fix overflow warning with 32-bit binutils
When building the kernel with 32-bit binutils built with support
only for the i386 target, we get the following warning:

  arch/x86/kernel/head_32.S:66: Warning: shift count out of range (32 is not between 0 and 31)

The problem is that in that case, binutils' internal type
representation is 32-bit wide and the shift range overflows.

In order to fix this, manipulate the shift expression which
creates the 4GiB constant to not overflow the shift count.

Suggested-by: Michael Matz <matz@suse.de>
Reported-and-tested-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19 16:03:26 +02:00
Palik, Imre
2c33645d36 perf/x86: Honor the architectural performance monitoring version
Architectural performance monitoring, version 1, doesn't support fixed counters.

Currently, even if a hypervisor advertises support for architectural
performance monitoring version 1, perf may still try to use the fixed
counters, as the constraints are set up based on the CPU model.

This patch ensures that perf honors the architectural performance monitoring
version returned by CPUID, and it only uses the fixed counters for version 2
and above.

(Some of the ideas in this patch came from Peter Zijlstra.)

Signed-off-by: Imre Palik <imrep@amazon.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Anthony Liguori <aliguori@amazon.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433767609-1039-1-git-send-email-imrep.amz@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19 09:38:48 +02:00
Alexander Shishkin
1b7b938f18 perf/x86/intel: Fix PMI handling for Intel PT
Intel PT is a separate PMU and it is not using any of the x86_pmu
code paths, which means in particular that the active_events counter
remains intact when new PT events are created.

However, PT uses the generic x86_pmu PMI handler for its PMI handling needs.

The problem here is that the latter checks active_events and in case of it
being zero, exits without calling the actual x86_pmu.handle_nmi(), which
results in unknown NMI errors and massive data loss for PT.

The effect is not visible if there are other perf events in the system
at the same time that keep active_events counter non-zero, for instance
if the NMI watchdog is running, so one needs to disable it to reproduce
the problem.

At the same time, the active_events counter besides doing what the name
suggests also implicitly serves as a PMC hardware and DS area reference
counter.

This patch adds a separate reference counter for the PMC hardware, leaving
active_events for actually counting the events and makes sure it also
counts PT and BTS events.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Link: http://lkml.kernel.org/r/87k2v92t0s.fsf@ashishki-desk.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19 09:38:47 +02:00
Alexander Shishkin
6b099d9b04 perf/x86/intel/bts: Fix DS area sharing with x86_pmu events
Currently, the intel_bts driver relies on the DS area allocated by the x86_pmu
code in its event_init() path, which is a bug: creating a BTS event while
no x86_pmu events are present results in a NULL pointer dereference.

The same DS area is also used by PEBS sampling, which makes it quite a bit
trickier to have a separate one for intel_bts' purposes.

This patch makes intel_bts driver use the same DS allocation and reference
counting code as x86_pmu to make sure it is always present when either
intel_bts or x86_pmu need it.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Link: http://lkml.kernel.org/r/1434024837-9916-2-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19 09:38:47 +02:00
Andi Kleen
4b36f1a413 perf/x86: Add more Broadwell model numbers
This patch adds additional model numbers for Broadwell to perf.
Support for Broadwell with Iris Pro (Intel Core i7-57xxC)
and support for Broadwell Server Xeon.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1434055942-28253-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19 09:38:46 +02:00
Aravind Gopalakrishnan
cc2749e409 x86/cpu/amd: Give access to the number of nodes in a physical package
Stash the number of nodes in a physical processor package
locally and add an accessor to be called by interested parties.
The first user is the MCE injection module which uses it to find
the node base core in a package for injecting a certain type of
errors.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Rewrote the commit message, merged it with the accessor patch and unified naming. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@osg.samsung.com
Link: http://lkml.kernel.org/r/1433868317-18417-2-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-18 11:16:06 +02:00
Feng Tang
b58d930750 x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail
This question has been asked many times, and finally I found the
official document which explains the problem of HPET on Baytrail,
that it will halt in deep idle states.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john.stultz@linaro.org
Cc: len.brown@intel.com
Cc: matthew.lee@intel.com
Link: http://lkml.kernel.org/r/1434361201-31743-1-git-send-email-feng.tang@intel.com
[ Prettified things a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-18 10:57:38 +02:00
Paul Gortmaker
70c4f78b23 x86: replace __init_or_module with __init in non-modular vsmp_64.c
The __init_or_module is from commit 05e12e1c4c
("x86: fix 27-rc crash on vsmp due to paravirt during module load").

But as of commit 70511134f6
("Revert "x86: don't compile vsmp_64 for 32bit") this file became
obj-y and hence is now only for built-in.  That makes any
"_or_module" support no longer necessary.

We need to distinguish between the two in order to do some header
reorganization between init.h and module.h and we don't want to
be including module.h in non-modular code.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2015-06-16 14:12:41 -04:00
Paul Gortmaker
5b00c1eb94 x86: perf_event_intel_pt.c: use arch_initcall to hook in enabling
This was using module_init, but the current Kconfig situation is
as follows:

In arch/x86/kernel/cpu/Makefile:

  obj-$(CONFIG_CPU_SUP_INTEL)    += perf_event_intel_pt.o perf_event_intel_bts.o

and in arch/x86/Kconfig.cpu:

  config CPU_SUP_INTEL
        default y
        bool "Support Intel processors" if PROCESSOR_SELECT

So currently, the end user can not build this code into a module.
If in the future, there is desire for this to be modular, then
it can be changed to include <linux/module.h> and use module_init.

But currently, in the non-modular case, a module_init becomes a
device_initcall.  But this really isn't a device, so we should
choose a more appropriate initcall bucket to put it in.

The obvious choice here seems to be arch_initcall, but that does
make it earlier than it was currently through device_initcall.
As long as perf_pmu_register() is functional, we should be OK.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2015-06-16 14:12:35 -04:00
Paul Gortmaker
ca41d24cf5 x86: perf_event_intel_bts.c: use arch_initcall to hook in enabling
This was using module_init, but the current Kconfig situation is
as follows:

In arch/x86/kernel/cpu/Makefile:

  obj-$(CONFIG_CPU_SUP_INTEL)    += perf_event_intel_pt.o perf_event_intel_bts.o

and in arch/x86/Kconfig.cpu:

  config CPU_SUP_INTEL
        default y
        bool "Support Intel processors" if PROCESSOR_SELECT

So currently, the end user can not build this code into a module.
If in the future, there is desire for this to be modular, then
it can be changed to include <linux/module.h> and use module_init.

But currently, in the non-modular case, a module_init becomes a
device_initcall.  But this really isn't a device, so we should
choose a more appropriate initcall bucket to put it in.

The obvious choice here seems to be arch_initcall, but that does
make it earlier than it was currently through device_initcall.
As long as perf_pmu_register() is functional, we should be OK.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2015-06-16 14:12:35 -04:00
Paul Gortmaker
1206f53589 x86: don't use module_init for non-modular core bootflag code
The bootflag.o is obj-y (always built in).  It will never be
modular, so using module_init as an alias for __initcall is
somewhat misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of arch_initcall (which
makes sense for arch code) will thus change this registration
from level 6-device to level 3-arch (i.e. slightly earlier).
However no observable impact of that small difference has
been observed during testing, or is expected.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2015-06-16 14:12:34 -04:00
Paul Gortmaker
d54b675a6b x86: don't use module_init in non-modular devicetree.c code
The devicetree.o is built for "OF" -- which is bool, and hence
this code is either present or absent.  It will never be modular,
so using module_init as an alias for __initcall can be somewhat
misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of device_initcall
directly in this change means that the runtime impact is
zero -- it will remain at level 6 in initcall ordering.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2015-06-16 14:12:29 -04:00
Dave Hansen
a842400367 x86/fpu: Fix double-increment in setup_xstate_features()
I noticed that my MPX tracepoints were producing garbage for the
lower and upper bounds:

	mpx_bounds_register_exception: address referenced: 0x00007fffffffccb7 bounds: lower: 0x0 ~upper: 0xffffffffffffffff
	mpx_bounds_register_exception: address referenced: 0x00007fffffffccbf bounds: lower: 0x0 ~upper: 0xffffffffffffffff

This is, of course, bogus because 0x00007fffffffccbf is *within*
the bounds.  I assumed that my instruction decoder was bad and
went looking at it.  But I eventually realized that I was
getting a '0' offset back from xstate_offsets[BNDREGS].

It was being skipped in the initialization, which is obviously
bogus, so remove the extra leaf++.

This also goes an initializes xstate_offsets/sizes[] to -1 so
so that bugs like this will oops instead of silently failing
in interesting ways.

This was introduced by:

	39f1acd ("x86/fpu/xstate: Don't assume the first zero xfeatures zero bit means the end")

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@sr71.net
Link: http://lkml.kernel.org/r/20150611193400.2E0B00DB@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-12 10:48:12 +02:00
Joerg Roedel
94fb933418 x86/crash: Allocate enough low memory when crashkernel=high
When the crash kernel is loaded above 4GiB in memory, the
first kernel allocates only 72MiB of low-memory for the DMA
requirements of the second kernel. On systems with many
devices this is not enough and causes device driver
initialization errors and failed crash dumps. Testing by
SUSE and Redhat has shown that 256MiB is a good default
value for now and the discussion has lead to this value as
well. So set this default value to 256MiB to make sure there
is enough memory available for DMA.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
[ Reflow comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/1433500202-25531-4-git-send-email-joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-11 08:28:39 +02:00
Joerg Roedel
186dfc9d69 x86/swiotlb: Try coherent allocations with __GFP_NOWARN
When we boot a kdump kernel in high memory, there is by
default only 72MB of low memory available. The swiotlb code
takes 64MB of it (by default) so that there are only 8MB
left to allocate from. On systems with many devices this
causes page allocator warnings from
dma_generic_alloc_coherent():

  systemd-udevd: page allocation failure: order:0, mode:0x280d4
  CPU: 0 PID: 197 Comm: systemd-udevd Tainted: G        W
  3.12.28-4-default #1 Hardware name: HP ProLiant DL980 G7, BIOS
  P66 07/30/2012  ffff8800781335e0 ffffffff8150b1db 00000000000280d4 ffffffff8113af90
   0000000000000000 0000000000000000 ffff88007efdbb00 0000000100000000
   0000000000000000 0000000000000000 0000000000000000 0000000000000001
  Call Trace:
    dump_trace+0x7d/0x2d0
    show_stack_log_lvl+0x94/0x170
    show_stack+0x21/0x50
    dump_stack+0x41/0x51
    warn_alloc_failed+0xf0/0x160
    __alloc_pages_slowpath+0x72f/0x796
    __alloc_pages_nodemask+0x1ea/0x210
    dma_generic_alloc_coherent+0x96/0x140
    x86_swiotlb_alloc_coherent+0x1c/0x50
    ttm_dma_pool_alloc_new_pages+0xab/0x320 [ttm]
    ttm_dma_populate+0x3ce/0x640 [ttm]
    ttm_tt_bind+0x36/0x60 [ttm]
    ttm_bo_handle_move_mem+0x55f/0x5c0 [ttm]
    ttm_bo_move_buffer+0x105/0x130 [ttm]
    ttm_bo_validate+0xc1/0x130 [ttm]
    ttm_bo_init+0x24b/0x400 [ttm]
    radeon_bo_create+0x16c/0x200 [radeon]
    radeon_ring_init+0x11e/0x2b0 [radeon]
    r100_cp_init+0x123/0x5b0 [radeon]
    r100_startup+0x194/0x230 [radeon]
    r100_init+0x223/0x410 [radeon]
    radeon_device_init+0x6af/0x830 [radeon]
    radeon_driver_load_kms+0x89/0x180 [radeon]
    drm_get_pci_dev+0x121/0x2f0 [drm]
    local_pci_probe+0x39/0x60
    pci_device_probe+0xa9/0x120
    driver_probe_device+0x9d/0x3d0
    __driver_attach+0x8b/0x90
    bus_for_each_dev+0x5b/0x90
    bus_add_driver+0x1f8/0x2c0
    driver_register+0x5b/0xe0
    do_one_initcall+0xf2/0x1a0
    load_module+0x1207/0x1c70
    SYSC_finit_module+0x75/0xa0
    system_call_fastpath+0x16/0x1b
    0x7fac533d2788

After these warnings the code enters a fall-back path and
allocated directly from the swiotlb aperture in the end.
So remove these warnings as this is not a fatal error.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
[ Simplify, reflow comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/1433500202-25531-3-git-send-email-joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-11 08:28:38 +02:00
Dave Hansen
b0e9b09b3b x86: Make is_64bit_mm() widely available
The uprobes code has a nice helper, is_64bit_mm(), that consults
both the runtime and compile-time flags for 32-bit support.
Instead of reinventing the wheel, pull it in to an x86 header so
we can use it for MPX.

I prefer passing the 'mm' around to test_thread_flag(TIF_IA32)
because it makes it explicit where the context is coming from.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150607183704.F0209999@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09 12:24:32 +02:00
Dave Hansen
e7126cf5f1 x86/mpx: Trace #BR exceptions
This is the first in a series of MPX tracing patches.
I've found these extremely useful in the process of
debugging applications and the kernel code itself.

This exception hooks in to the bounds (#BR) exception
very early and allows capturing the key registers which
would influence how the exception is handled.

Note that bndcfgu/bndstatus are technically still
64-bit registers even in 32-bit mode.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150607183703.5FE2619A@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09 12:24:31 +02:00
Dave Hansen
8c3641e957 x86/mpx: Introduce a boot-time disable flag
MPX has the _potential_ to cause some issues.  Say part of your
init system tried to protect one of its components from buffer
overflows with MPX.  If there were a false positive, it's
possible that MPX could keep a system from booting.

MPX could also potentially cause performance issues since it is
present in hot paths like the unmap path.

Allow it to be disabled at boot time.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150607183702.2E8B77AB@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09 12:24:31 +02:00
Dave Hansen
46a6e0cf1c x86/mpx: Clean up the code by not passing a task pointer around when unnecessary
The MPX code can only work on the current task.  You can not,
for instance, enable MPX management in another process or
thread. You can also not handle a fault for another process or
thread.

Despite this, we pass a task_struct around prolifically.  This
patch removes all of the task struct passing for code paths
where the code can not deal with another task (which turns out
to be all of them).

This has no functional changes.  It's just a cleanup.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bp@alien8.de
Link: http://lkml.kernel.org/r/20150607183702.6A81DA2C@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09 12:24:30 +02:00
Dave Hansen
a84eeaa96b x86/mpx: Use the new get_xsave_field_ptr()API
The MPX registers (bndcsr/bndcfgu/bndstatus) are not directly
accessible via normal instructions.  They essentially act as
if they were floating point registers and are saved/restored
along with those registers.

There are two main paths in the MPX code where we care about
the contents of these registers:

	1. #BR (bounds) faults
	2. the prctl() code where we are setting MPX up

Both of those paths _might_ be called without the FPU having
been used.  That means that 'tsk->thread.fpu.state' might
never be allocated.

Also, fpu_save_init() is not preempt-safe.  It was a bug to
call it without disabling preemption.  The new
get_xsave_addr() calls unlazy_fpu() instead and properly
disables preemption.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave@sr71.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: bp@alien8.de
Link: http://lkml.kernel.org/r/20150607183701.BC0D37CF@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09 12:24:30 +02:00
Dave Hansen
04cd027bcb x86/fpu/xstate: Wrap get_xsave_addr() to make it safer
The MPX code appears is calling a low-level FPU function
(copy_fpregs_to_fpstate()).  This function is not able to
be called in all contexts, although it is safe to call
directly in some cases.

Although probably correct, the current code is ugly and
potentially error-prone.  So, add a wrapper that calls
the (slightly) higher-level fpu__save() (which is preempt-
safe) and also ensures that we even *have* an FPU context
(in the case that this was called when in lazy FPU mode).

Ingo had this to say about the details about when we need
preemption disabled:

> it's indeed generally unsafe to access/copy FPU registers with preemption enabled,
> for two reasons:
>
>   - on older systems that use FSAVE the instruction destroys FPU register
>     contents, which has to be handled carefully
>
>   - even on newer systems if we copy to FPU registers (which this code doesn't)
>     then we don't want a context switch to occur in the middle of it, because a
>     context switch will write to the fpstate, potentially overwriting our new data
>     with old FPU state.
>
> But it's safe to access FPU registers with preemption enabled in a couple of
> special cases:
>
>   - potentially destructively saving FPU registers: the signal handling code does
>     this in copy_fpstate_to_sigframe(), because it can rely on the signal restore
>     side to restore the original FPU state.
>
>   - reading FPU registers on modern systems: we don't do this anywhere at the
>     moment, mostly to keep symmetry with older systems where FSAVE is
>     destructive.
>
>   - initializing FPU registers on modern systems: fpu__clear() does this. Here
>     it's safe because we don't copy from the fpstate.
>
>   - directly writing FPU registers from user-space memory (!). We do this in
>     fpu__restore_sig(), and it's safe because neither context switches nor
>     irq-handler FPU use can corrupt the source context of the copy (which is
>     user-space memory).
>
> Note that the MPX code's current use of copy_fpregs_to_fpstate() was safe I think,
> because:
>
>  - MPX is predicated on eagerfpu, so the destructive F[N]SAVE instruction won't be
>    used.
>
>  - the code was only reading FPU registers, and was doing it only in places that
>    guaranteed that an FPU state was already active (i.e. didn't do it in
>    kthreads)

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave@sr71.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: bp@alien8.de
Link: http://lkml.kernel.org/r/20150607183700.AA881696@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09 12:24:29 +02:00
Dave Hansen
0c4109bec0 x86/fpu/xstate: Fix up bad get_xsave_addr() assumptions
get_xsave_addr() assumes that if an xsave bit is present in the
hardware (pcntxt_mask) that it is present in a given xsave
buffer.  Due to an bug in the xsave code on all of the systems
that have MPX (and thus all the users of this code), that has
been a true assumption.

But, the bug is getting fixed, so our assumption is not going
to hold any more.

It's quite possible (and normal) for an enabled state to be
present on 'pcntxt_mask', but *not* in 'xstate_bv'.  We need
to consult 'xstate_bv'.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150607183700.1E739B34@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09 12:24:29 +02:00
Ingo Molnar
15c1247953 Revert "perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization"
This reverts commit c05199e5a5.

Vince Weaver reported the following crash while perf fuzzing:

[   79.473121] kernel BUG at mm/vmalloc.c:1335!
[   79.694391] Call Trace:
[   79.696997]  <IRQ>
[   79.699090]  [<ffffffff811b2130>] get_vm_area_caller+0x40/0x50
[   79.705505]  [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90
[   79.712414]  [<ffffffff810635e5>] __ioremap_caller+0x195/0x350
[   79.718610]  [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90
[   79.725462]  [<ffffffff81427f6b>] ? debug_object_activate+0x14b/0x1e0
[   79.732346]  [<ffffffff810637b7>] ioremap_nocache+0x17/0x20
[   79.738283]  [<ffffffff81039f4d>] snb_uncore_imc_init_box+0x6d/0x90
[   79.744945]  [<ffffffff81039cf7>] snb_uncore_imc_event_start+0xb7/0x110
[   79.752020]  [<ffffffff81039d97>] snb_uncore_imc_event_add+0x47/0x60
[   79.758832]  [<ffffffff81162cbb>] event_sched_in.isra.85+0xfb/0x330
[   79.765519]  [<ffffffff81162f5f>] group_sched_in+0x6f/0x1e0
[   79.771481]  [<ffffffff8101df1a>] ? native_sched_clock+0x2a/0x90
[   79.777858]  [<ffffffff811637bc>] __perf_event_enable+0x25c/0x2a0
[   79.784418]  [<ffffffff810f3e69>] ? tick_nohz_irq_exit+0x29/0x30
[   79.790820]  [<ffffffff8115ef30>] ? cpu_clock_event_start+0x40/0x40
[   79.797546]  [<ffffffff8115ef80>] remote_function+0x50/0x60
[   79.803535]  [<ffffffff810f8cd1>] flush_smp_call_function_queue+0x81/0x180
[   79.810840]  [<ffffffff810f9763>] generic_smp_call_function_single_interrupt+0x13/0x60
[   79.819328]  [<ffffffff8104b5e8>] smp_trace_call_function_single_interrupt+0x38/0xc0
[   79.827614]  [<ffffffff816de9be>] trace_call_function_single_interrupt+0x6e/0x80
[   79.835465]  <EOI>
[   79.837543]  [<ffffffff8156e8b5>] ? cpuidle_enter_state+0x65/0x160
[   79.844377]  [<ffffffff8156e8a1>] ? cpuidle_enter_state+0x51/0x160
[   79.851015]  [<ffffffff8156e9e7>] cpuidle_enter+0x17/0x20
[   79.856791]  [<ffffffff810b6e39>] cpu_startup_entry+0x399/0x440
[   79.863165]  [<ffffffff816c9ddb>] rest_init+0xbb/0xd0

The offending commit is clearly confused as it moves heavy initialization
work into IPI context.

Revert it.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09 11:44:37 +02:00
Ingo Molnar
bace7117d3 x86/asm/entry: (Re-)rename __NR_entry_INT80_compat_max to __NR_syscall_compat_max
Brian Gerst noticed that I did a weird rename in the following commit:

   b2502b418e ("x86/asm/entry: Untangle 'system_call' into two entry points: entry_SYSCALL_64 and entry_INT80_32")

which renamed __NR_ia32_syscall_max to __NR_entry_INT80_compat_max.

Now the original name was a misnomer, but the new one is a misnomer as well,
as all the 32-bit compat syscall entry points (sysenter, syscall) share the
system call table, not just the INT80 based one.

Rename it to __NR_syscall_compat_max.

Reported-by: Brian Gerst <brgerst@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-08 23:43:38 +02:00
Ingo Molnar
9dda1658a9 Merge branch 'x86/asm' into x86/core, to prepare for new patch
Collect all changes to arch/x86/entry/entry_64.S, before applying
patch that changes most of the file.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-08 20:48:20 +02:00
Bjorn Helgaas
633adc711d PCI: Remove unnecessary #includes of <asm/pci.h>
In include/linux/pci.h, we already #include <asm/pci.h>, so we don't need
to include <asm/pci.h> directly.

Remove the unnecessary includes.  All the files here already include
<linux/pci.h>.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>	# sh
Acked-by: Ralf Baechle <ralf@linux-mips.org>
2015-06-08 07:56:09 -05:00
Ingo Molnar
b2502b418e x86/asm/entry: Untangle 'system_call' into two entry points: entry_SYSCALL_64 and entry_INT80_32
The 'system_call' entry points differ starkly between native 32-bit and 64-bit
kernels: on 32-bit kernels it defines the INT 0x80 entry point, while on
64-bit it's the SYSCALL entry point.

This is pretty confusing when looking at generic code, and it also obscures
the nature of the entry point at the assembly level.

So unangle this by splitting the name into its two uses:

	system_call (32) -> entry_INT80_32
	system_call (64) -> entry_SYSCALL_64

As per the generic naming scheme for x86 system call entry points:

	entry_MNEMONIC_qualifier

where 'qualifier' is one of _32, _64 or _compat.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-08 09:14:21 +02:00
Ingo Molnar
4c8cd0c50d x86/asm/entry: Untangle 'ia32_sysenter_target' into two entry points: entry_SYSENTER_32 and entry_SYSENTER_compat
So the SYSENTER instruction is pretty quirky and it has different behavior
depending on bitness and CPU maker.

Yet we create a false sense of coherency by naming it 'ia32_sysenter_target'
in both of the cases.

Split the name into its two uses:

	ia32_sysenter_target (32)    -> entry_SYSENTER_32
	ia32_sysenter_target (64)    -> entry_SYSENTER_compat

As per the generic naming scheme for x86 system call entry points:

	entry_MNEMONIC_qualifier

where 'qualifier' is one of _32, _64 or _compat.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-08 08:47:46 +02:00
Ingo Molnar
2cd23553b4 x86/asm/entry: Rename compat syscall entry points
Rename the following system call entry points:

	ia32_cstar_target       -> entry_SYSCALL_compat
	ia32_syscall            -> entry_INT80_compat

The generic naming scheme for x86 system call entry points is:

	entry_MNEMONIC_qualifier

where 'qualifier' is one of _32, _64 or _compat.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-08 08:47:36 +02:00
Peter Zijlstra
a3d86542de perf/x86/intel/pebs: Add PEBSv3 decoding
PEBSv3 as present on Skylake fixed the long standing issue of the
status bits. They now really reflect the events that generated the
record.

Tested-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:09:16 +02:00
Kan Liang
f38b0dbb49 perf/x86/intel: Introduce PERF_RECORD_LOST_SAMPLES
After enlarging the PEBS interrupt threshold, there may be some mixed up
PEBS samples which are discarded by the kernel.

This patch makes the kernel emit a PERF_RECORD_LOST_SAMPLES record with
the number of possible discarded records when it is impossible to demux
the samples.

It makes sure the user is not left in the dark about such discards.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1431285195-14269-8-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:09:02 +02:00
Yan, Zheng
156174999d perf/intel/x86: Enlarge the PEBS buffer
Currently the PEBS buffer size is 4k, it can only hold about 21
PEBS records. This patch enlarges the PEBS buffer size to 64k
(the same as the BTS buffer).

64k memory can hold about 330 PEBS records. This will significantly
reduce the number of PMIs when batched PEBS interrupts are enabled.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1430940834-8964-7-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:57 +02:00
Yan, Zheng
9c964efa43 perf/x86/intel: Drain the PEBS buffer during context switches
Flush the PEBS buffer during context switches if PEBS interrupt threshold
is larger than one. This allows perf to supply TID for sample outputs.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1430940834-8964-6-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:54 +02:00
Yan, Zheng
3569c0d7c5 perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)
PEBS always had the capability to log samples to its buffers without
an interrupt. Traditionally perf has not used this but always set the
PEBS threshold to one.

For frequently occurring events (like cycles or branches or load/store)
this in term requires using a relatively high sampling period to avoid
overloading the system, by only processing PMIs. This in term increases
sampling error.

For the common cases we still need to use the PMI because the PEBS
hardware has various limitations. The biggest one is that it can not
supply a callgraph. It also requires setting a fixed period, as the
hardware does not support adaptive period. Another issue is that it
cannot supply a time stamp and some other options. To supply a TID it
requires flushing on context switch. It can however supply the IP, the
load/store address, TSX information, registers, and some other things.

So we can make PEBS work for some specific cases, basically as long as
you can do without a callgraph and can set the period you can use this
new PEBS mode.

The main benefit is the ability to support much lower sampling period
(down to -c 1000) without extensive overhead.

One use cases is for example to increase the resolution of the c2c tool.
Another is double checking when you suspect the standard sampling has
too much sampling error.

Some numbers on the overhead, using cycle soak, comparing the elapsed
time from "kernbench -M -H" between plain (threshold set to one) and
multi (large threshold).

The test command for plain:
  "perf record --time -e cycles:p -c $period -- kernbench -M -H"

The test command for multi:
  "perf record --no-time -e cycles:p -c $period -- kernbench -M -H"

( The only difference of test command between multi and plain is time
  stamp options. Since time stamp is not supported by large PEBS
  threshold, it can be used as a flag to indicate if large threshold is
  enabled during the test. )

	period    plain(Sec)  multi(Sec)  Delta
	10003     32.7        16.5        16.2
	20003     30.2        16.2        14.0
	40003     18.6        14.1        4.5
	80003     16.8        14.6        2.2
	100003    16.9        14.1        2.8
	800003    15.4        15.7        -0.3
	1000003   15.3        15.2        0.2
	2000003   15.3        15.1        0.1

With periods below 100003, plain (threshold one) cause much more
overhead. With 10003 sampling period, the Elapsed Time for multi is
even 2X faster than plain.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1430940834-8964-5-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:49 +02:00
Yan, Zheng
21509084f9 perf/x86/intel: Handle multiple records in the PEBS buffer
When the PEBS interrupt threshold is larger than one record and the
machine supports multiple PEBS events, the records of these events are
mixed up and we need to demultiplex them.

Demuxing the records is hard because the hardware is deficient. The
hardware has two issues that, when combined, create impossible
scenarios to demux.

The first issue is that the 'status' field of the PEBS record is a copy
of the GLOBAL_STATUS MSR at PEBS assist time. To see why this is a
problem let us first describe the regular PEBS cycle:

A) the CTRn value reaches 0:
  - the corresponding bit in GLOBAL_STATUS gets set
  - we start arming the hardware assist
  < some unspecified amount of time later -- this could cover multiple
    events of interest >

B) the hardware assist is armed, any next event will trigger it

C) a matching event happens:
  - the hardware assist triggers and generates a PEBS record
    this includes a copy of GLOBAL_STATUS at this moment
  - if we auto-reload we (re)set CTRn
  - we clear the relevant bit in GLOBAL_STATUS

Now consider the following chain of events:

  A0, B0, A1, C0

The event generated for counter 0 will include a status with counter 1
set, even though its not at all related to the record. A similar thing
can happen with a !PEBS event if it just happens to overflow at the
right moment.

The second issue is that the hardware will only emit one record for two
or more counters if the event that triggers the assist is 'close'. The
'close' can be several cycles. In some cases even the complete assist,
if the event is something that doesn't need retirement.

For instance, consider this chain of events:

  A0, B0, A1, B1, C01

Where C01 is an event that triggers both hardware assists, we will
generate but a single record, but again with both counters listed in the
status field.

This time the record pertains to both events.

Note that these two cases are different but undistinguishable with the
data as generated. Therefore demuxing records with multiple PEBS bits
(we can safely ignore status bits for !PEBS counters) is impossible.

Furthermore we cannot emit the record to both events because that might
cause a data leak -- the events might not have the same privileges -- so
what this patch does is discard such events.

The assumption/hope is that such discards will be rare.

Here lists some possible ways you may get high discard rate.

  - when you count the same thing multiple times. But it is not a useful
    configuration.
  - you can be unfortunate if you measure with a userspace only PEBS
    event along with either a kernel or unrestricted PEBS event. Imagine
    the event triggering and setting the overflow flag right before
    entering the kernel. Then all kernel side events will end up with
    multiple bits set.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
[ Changelog improvements. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1430940834-8964-4-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:45 +02:00
Yan, Zheng
43cf76312f perf/x86/intel: Introduce setup_pebs_sample_data()
Move code that sets up the PEBS sample data to a separate function.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1430940834-8964-3-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:40 +02:00
Yan, Zheng
851559e35f perf/x86/intel: Use the PEBS auto reload mechanism when possible
When a fixed period is specified, this patch makes perf use the PEBS
auto reload mechanism. This makes normal profiling faster, because
it avoids one costly MSR write in the PMI handler.

However, the reset value will be loaded by hardware assist. There is a
small delay compared to the previous non-auto-reload mechanism. The
delay time is arbitrary, but very small. The assist cost is 400-800
cycles, assuming common cases with everything cached. The minimum period
the patch currently uses is 10000. In that extreme case it can be ~10%
if cycles are used.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1430940834-8964-2-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:35 +02:00
Stephane Eranian
7b74cfb2ec perf/x86/intel: add support for PERF_SAMPLE_BRANCH_IND_JUMP
This patch enables support for branch sampling filter
for indirect jumps (IND_JUMP). It enables LBR IND_JMP
filtering where available. There is also software filtering
support.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@redhat.com
Cc: dsahern@gmail.com
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Cc: namhyung@kernel.org
Link: http://lkml.kernel.org/r/1431637800-31061-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:27 +02:00
Frederic Weisbecker
4eaca0a887 preempt: Use preempt_schedule_context() as the official tracing preemption point
preempt_schedule_context() is a tracing safe preemption point but it's
only used when CONFIG_CONTEXT_TRACKING=y. Other configs have tracing
recursion issues since commit:

  b30f0e3ffe ("sched/preempt: Optimize preemption operations on __schedule() callers")

introduced function based preemp_count_*() ops.

Lets make it available on all configs and give it a more appropriate
name for its new position.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433432349-1021-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:57:42 +02:00
Kan Liang
8cf1a3de97 perf/x86/intel/uncore: Fix CBOX bit wide and UBOX reg on Haswell-EP
CBOX counters are increased to 48b on HSX.

Correct the MSR address for HSWEP_U_MSR_PMON_CTR0 and
HSWEP_U_MSR_PMON_CTL0.

See specification in:
http://www.intel.com/content/www/us/en/processors/xeon/
xeon-e5-v3-uncore-performance-monitoring.html

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432645835-7918-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:46:50 +02:00
Andy Shevchenko
7b179b8feb x86/microcode: Correct CPU family related variable types
Change the type of variables and function prototypes to be in
alignment with what the x86_*() / __x86_*() family/model
functions return.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433436928-31903-21-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:38:15 +02:00
Borislav Petkov
ee38a90709 x86/microcode: Disable builtin microcode loading on 32-bit for now
Andy Shevchenko reported machine freezes when booting latest tip
on 32-bit setups. Problem is, the builtin microcode handling cannot
really work that early, when we haven't even enabled paging.

A proper fix would involve handling that case specially as every
other early 32-bit boot case in the microcode loader and would
require much more involved changes for which it is too late now,
more than a week before the upcoming merge window.

So, disable the builtin microcode loading on 32-bit for now.

Reported-and-tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433436928-31903-20-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:38:14 +02:00
Ingo Molnar
c2f9b0af8b Merge branch 'x86/ras' into x86/core, to fix conflicts
Conflicts:
	arch/x86/include/asm/irq_vectors.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:35:27 +02:00
Borislav Petkov
c8e56d20f2 x86: Kill CONFIG_X86_HT
In talking to Aravind recently about making certain AMD topology
attributes available to the MCE injection module, it seemed like
that CONFIG_X86_HT thing is more or less superfluous. It is
def_bool y, depends on SMP and gets enabled in the majority of
.configs - distro and otherwise - out there.

So let's kill it and make code behind it depend directly on SMP.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Walter <dwalter@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433436928-31903-18-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:33:44 +02:00
Ashok Raj
243d657eaf x86/mce: Handle Local MCE events
Add the necessary changes to do_machine_check() to be able to
process MCEs signaled as local MCEs. Typically, only recoverable
errors (SRAR type) will be Signaled as LMCE. The architecture
does not restrict to only those errors, however.

When errors are signaled as LMCE, there is no need for the MCE
handler to perform rendezvous with other logical processors
unlike earlier processors that would broadcast machine check
errors.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1433436928-31903-17-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:33:15 +02:00
Ashok Raj
88d538672e x86/mce: Add infrastructure to support Local MCE
Initialize and prepare for handling LMCEs. Add a boot-time
option to disable LMCEs.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
[ Simplify stuff, align statements for better readability, reflow comments; kill
  unused lmce_clear(); save us an MSR write if LMCE is already enabled. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1433436928-31903-16-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:33:14 +02:00
Linus Torvalds
51d0f0cb3a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - early_idt_handlers[] fix that fixes the build with bleeding edge
     tooling

   - build warning fix on GCC 5.1

   - vm86 fix plus self-test to make it harder to break it again"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
  x86/asm/entry/32, selftests: Add a selftest for kernel entries from VM86 mode
  x86/boot: Add CONFIG_PARAVIRT_SPINLOCKS quirk to arch/x86/boot/compressed/misc.h
  x86/asm/entry/32: Really make user_mode() work correctly for VM86 mode
2015-06-05 10:03:48 -07:00
Linus Torvalds
a0e9c6efa5 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "The biggest chunk of the changes are two regression fixes: a HT
  workaround fix and an event-group scheduling fix.  It's been verified
  with 5 days of fuzzer testing.

  Other fixes:

   - eBPF fix
   - a BIOS breakage detection fix
   - PMU driver fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/pt: Fix a refactoring bug
  perf/x86: Tweak broken BIOS rules during check_hw_exists()
  perf/x86/intel/pt: Untangle pt_buffer_reset_markers()
  perf: Disallow sparse AUX allocations for non-SG PMUs in overwrite mode
  perf/x86: Improve HT workaround GP counter constraint
  perf/x86: Fix event/group validation
  perf: Fix race in BPF program unregister
2015-06-05 10:00:53 -07:00
Wei Yang
f2af7d25b4 x86/boot/setup: Clean up the e820_reserve_setup_data() code
Deobfuscate the 'found' logic, it can be replaced with a simple:

	if (!pa_data)
		return;

and 'found' can be eliminated.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1433398729-8314-1-git-send-email-weiyang@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-05 13:53:22 +02:00
Alexander Shishkin
b44a2b53be perf/x86/intel/pt: Fix a refactoring bug
Commit 066450be41 ("perf/x86/intel/pt: Clean up the control flow
in pt_pmu_hw_init()") changed attribute initialization so that
only the first attribute gets initialized using
sysfs_attr_init(), which upsets lockdep.

This patch fixes the glitch so that all allocated attributes are
properly initialized thus fixing the lockdep warning reported by
Tvrtko and Imre.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reported-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: <linux-kernel@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-04 16:07:51 +02:00
Ingo Molnar
00398a0018 x86/asm/entry: Move the vsyscall code to arch/x86/entry/vsyscall/
The vsyscall code is entry code too, so move it to arch/x86/entry/vsyscall/.

Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-04 07:37:37 +02:00
Dave Airlie
a8a50fce60 Linux 4.1-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVa7zvAAoJEHm+PkMAQRiGtfMIAILs3sxFtrC1hApgcfRLF/7z
 K34bwTRqErzqUO/orTwakEr9kSIpIL0zIPSryTCOTPZLfMGkQjhHXO3KR/DSbbTV
 MZ8y/BM/yelFA/Np+1LjbiYjTNRnTRvCoaQihkIH8Rn02g7ob9HyL4gIGKpuGFcZ
 04GacL2cgChqsRSACdNef948jCoJXKgcuDpe39DXphDWZnBKNZ3HFuJ6bryGJf9A
 1/eCI4is85BNwKPemQUYR0xx83UIzDfrghatZP2mOCDDSA2MNg8HNxLTd12LGoQD
 tfgX4B7aftzW9Y7GSEDfZ0IKm2NRzgPmCVj6PjVR/iI0lIK4Aq0Z/lDJxxEq3XQ=
 =AJM5
 -----END PGP SIGNATURE-----

Merge tag 'v4.1-rc6' into drm-next

Linux 4.1-rc6

backmerge 4.1-rc6 as some of the later pull reqs are based on newer bases
and I'd prefer to do the fixup myself.
2015-06-04 09:23:51 +10:00
Ingo Molnar
905a36a285 x86/asm/entry: Move entry_64.S and entry_32.S to arch/x86/entry/
Create a new directory hierarchy for the low level x86 entry code:

    arch/x86/entry/*

This will host all the low level glue that is currently scattered
all across arch/x86/.

Start with entry_64.S and entry_32.S.

Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-03 18:51:28 +02:00
Stephen Rothwell
d6472302f2 x86/mm: Decouple <linux/vmalloc.h> from <asm/io.h>
Nothing in <asm/io.h> uses anything from <linux/vmalloc.h>, so
remove it from there and fix up the resulting build problems
triggered on x86 {64|32}-bit {def|allmod|allno}configs.

The breakages were triggering in places where x86 builds relied
on vmalloc() facilities but did not include <linux/vmalloc.h>
explicitly and relied on the implicit inclusion via <asm/io.h>.

Also add:

  - <linux/init.h> to <linux/io.h>
  - <asm/pgtable_types> to <asm/io.h>

... which were two other implicit header file dependencies.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[ Tidied up the changelog. ]
Acked-by: David Miller <davem@davemloft.net>
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Colin Cross <ccross@android.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: James E.J. Bottomley <JBottomley@odin.com>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Suma Ramars <sramars@cisco.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-03 12:02:00 +02:00
Ingo Molnar
71966f3a0b Merge branch 'locking/core' into x86/core, to prepare for dependent patch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-03 10:07:35 +02:00
Ingo Molnar
34e7724c07 Merge branches 'x86/mm', 'x86/build', 'x86/apic' and 'x86/platform' into x86/core, to apply dependent patch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-03 10:05:18 +02:00
Borislav Petkov
ee098e1aed x86/cpu: Trim model ID whitespace
We did try trimming whitespace surrounding the 'model name'
field in /proc/cpuinfo since reportedly some userspace uses it
in string comparisons and there were discrepancies:

  [thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
  ______1_model_name      :_AMD_Opteron(TM)_Processor_6272
  _____63_model_name      :_AMD_Opteron(TM)_Processor_6272_________________

However, there were issues with overlapping buffers, string
sizes and non-byte-sized copies in the previous proposed
solutions; see Link tags below for the whole farce.

So, instead of diddling with this more, let's simply extend what
was there originally with trimming any present trailing
whitespace. Final result is really simple and obvious.

Testing with the most insane model IDs qemu can generate, looks
good:

  .model_id = "            My funny model ID CPU          ",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "My funny model ID CPU          ",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "            My funny model ID CPU",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "            ",
  ______4_model_name      :__

  .model_id = "",
  ______4_model_name      :_15/02

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 10:38:11 +02:00
Jan Beulich
2f63b9db72 x86/asm/entry/64: Fold identical code paths
retint_kernel doesn't require %rcx to be pointing to thread info
(anymore?), and the code on the two alternative paths is - not
really surprisingly - identical.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/556C664F020000780007FB64@mail.emea.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 10:10:09 +02:00
Andy Lutomirski
425be5679f x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
The early_idt_handlers asm code generates an array of entry
points spaced nine bytes apart.  It's not really clear from that
code or from the places that reference it what's going on, and
the code only works in the first place because GAS never
generates two-byte JMP instructions when jumping to global
labels.

Clean up the code to generate the correct array stride (member size)
explicitly. This should be considerably more robust against
screw-ups, as GAS will warn if a .fill directive has a negative
count.  Using '. =' to advance would have been even more robust
(it would generate an actual error if it tried to move
backwards), but it would pad with nulls, confusing anyone who
tries to disassemble the code.  The new scheme should be much
clearer to future readers.

While we're at it, improve the comments and rename the array and
common code.

Binutils may start relaxing jumps to non-weak labels.  If so,
this change will fix our build, and we may need to backport this
change.

Before, on x86_64:

  0000000000000000 <early_idt_handlers>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 00 00 00 00          jmpq   9 <early_idt_handlers+0x9>
                          5: R_X86_64_PC32        early_idt_handler-0x4
  ...
    48:   66 90                   xchg   %ax,%ax
    4a:   6a 08                   pushq  $0x8
    4c:   e9 00 00 00 00          jmpq   51 <early_idt_handlers+0x51>
                          4d: R_X86_64_PC32       early_idt_handler-0x4
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   e9 00 00 00 00          jmpq   120 <early_idt_handler>
                          11c: R_X86_64_PC32      early_idt_handler-0x4

After:

  0000000000000000 <early_idt_handler_array>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 14 01 00 00          jmpq   11d <early_idt_handler_common>
  ...
    48:   6a 08                   pushq  $0x8
    4a:   e9 d1 00 00 00          jmpq   120 <early_idt_handler_common>
    4f:   cc                      int3
    50:   cc                      int3
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   eb 03                   jmp    120 <early_idt_handler_common>
   11d:   cc                      int3
   11e:   cc                      int3
   11f:   cc                      int3

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Binutils <binutils@sourceware.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 09:39:40 +02:00
Ingo Molnar
085c789783 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

  - Initialization/Kconfig updates: hide most Kconfig options from unsuspecting users.
    There's now a single high level configuration option:

      *
      * RCU Subsystem
      *
      Make expert-level adjustments to RCU configuration (RCU_EXPERT) [N/y/?] (NEW)

    Which if answered in the negative, leaves us with a single interactive
    configuration option:

      Offload RCU callback processing from boot-selected CPUs (RCU_NOCB_CPU) [N/y/?] (NEW)

    All the rest of the RCU options are configured automatically.

  - Remove all uses of RCU-protected array indexes: replace the
    rcu_[access|dereference]_index_check() APIs with READ_ONCE() and rcu_lockdep_assert().

  - RCU CPU-hotplug cleanups.

  - Updates to Tiny RCU: a race fix and further code shrinkage.

  - RCU torture-testing updates: fixes, speedups, cleanups and
    documentation updates.

  - Miscellaneous fixes.

  - Documentation updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 08:18:34 +02:00
Ingo Molnar
f407a82586 Merge branch 'linus' into sched/core, to resolve conflict
Conflicts:
	arch/sparc/include/asm/topology_64.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 08:05:42 +02:00
Ingo Molnar
131484c8da x86/debug: Remove perpetually broken, unmaintainable dwarf annotations
So the dwarf2 annotations in low level assembly code have
become an increasing hindrance: unreadable, messy macros
mixed into some of the most security sensitive code paths
of the Linux kernel.

These debug info annotations don't even buy the upstream
kernel anything: dwarf driven stack unwinding has caused
problems in the past so it's out of tree, and the upstream
kernel only uses the much more robust framepointers based
stack unwinding method.

In addition to that there's a steady, slow bitrot going
on with these annotations, requiring frequent fixups.
There's no tooling and no functionality upstream that
keeps it correct.

So burn down the sick forest, allowing new, healthier growth:

   27 files changed, 350 insertions(+), 1101 deletions(-)

Someone who has the willingness and time to do this
properly can attempt to reintroduce dwarf debuginfo in x86
assembly code plus dwarf unwinding from first principles,
with the following conditions:

 - it should be maximally readable, and maximally low-key to
   'ordinary' code reading and maintenance.

 - find a build time method to insert dwarf annotations
   automatically in the most common cases, for pop/push
   instructions that manipulate the stack pointer. This could
   be done for example via a preprocessing step that just
   looks for common patterns - plus special annotations for
   the few cases where we want to depart from the default.
   We have hundreds of CFI annotations, so automating most of
   that makes sense.

 - it should come with build tooling checks that ensure that
   CFI annotations are sensible. We've seen such efforts from
   the framepointer side, and there's no reason it couldn't be
   done on the dwarf side.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 07:57:48 +02:00
Luiz Capitulino
0ad83caa21 x86: kvmclock: set scheduler clock stable
If you try to enable NOHZ_FULL on a guest today, you'll get
the following error when the guest tries to deactivate the
scheduler tick:

 WARNING: CPU: 3 PID: 2182 at kernel/time/tick-sched.c:192 can_stop_full_tick+0xb9/0x290()
 NO_HZ FULL will not work with unstable sched clock
 CPU: 3 PID: 2182 Comm: kworker/3:1 Not tainted 4.0.0-10545-gb9bb6fb #204
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 Workqueue: events flush_to_ldisc
  ffffffff8162a0c7 ffff88011f583e88 ffffffff814e6ba0 0000000000000002
  ffff88011f583ed8 ffff88011f583ec8 ffffffff8104d095 ffff88011f583eb8
  0000000000000000 0000000000000003 0000000000000001 0000000000000001
 Call Trace:
  <IRQ>  [<ffffffff814e6ba0>] dump_stack+0x4f/0x7b
  [<ffffffff8104d095>] warn_slowpath_common+0x85/0xc0
  [<ffffffff8104d146>] warn_slowpath_fmt+0x46/0x50
  [<ffffffff810bd2a9>] can_stop_full_tick+0xb9/0x290
  [<ffffffff810bd9ed>] tick_nohz_irq_exit+0x8d/0xb0
  [<ffffffff810511c5>] irq_exit+0xc5/0x130
  [<ffffffff814f180a>] smp_apic_timer_interrupt+0x4a/0x60
  [<ffffffff814eff5e>] apic_timer_interrupt+0x6e/0x80
  <EOI>  [<ffffffff814ee5d1>] ? _raw_spin_unlock_irqrestore+0x31/0x60
  [<ffffffff8108bbc8>] __wake_up+0x48/0x60
  [<ffffffff8134836c>] n_tty_receive_buf_common+0x49c/0xba0
  [<ffffffff8134a6bf>] ? tty_ldisc_ref+0x1f/0x70
  [<ffffffff81348a84>] n_tty_receive_buf2+0x14/0x20
  [<ffffffff8134b390>] flush_to_ldisc+0xe0/0x120
  [<ffffffff81064d05>] process_one_work+0x1d5/0x540
  [<ffffffff81064c81>] ? process_one_work+0x151/0x540
  [<ffffffff81065191>] worker_thread+0x121/0x470
  [<ffffffff81065070>] ? process_one_work+0x540/0x540
  [<ffffffff8106b4df>] kthread+0xef/0x110
  [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
  [<ffffffff814ef4f2>] ret_from_fork+0x42/0x70
  [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
 ---[ end trace 06e3507544a38866 ]---

However, it turns out that kvmclock does provide a stable
sched_clock callback. So, let the scheduler know this which
in turn makes NOHZ_FULL work in the guest.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-29 14:01:52 +02:00
Dan Williams
ad5fb870c4 e820, efi: add ACPI 6.0 persistent memory types
ACPI 6.0 formalizes e820-type-7 and efi-type-14 as persistent memory.
Mark it "reserved" and allow it to be claimed by a persistent memory
device driver.

This definition is in addition to the Linux kernel's existing type-12
definition that was recently added in support of shipping platforms with
NVDIMM support that predate ACPI 6.0 (which now classifies type-12 as
OEM reserved).

Note, /proc/iomem can be consulted for differentiating legacy
"Persistent Memory (legacy)" E820_PRAM vs standard "Persistent Memory"
E820_PMEM.

Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-05-27 21:46:05 -04:00
Paul E. McKenney
29c6820f51 mce: mce_chrdev_write() can be static
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:56:17 -07:00
Paul E. McKenney
e90328b87e mce: Stop using array-index-based RCU primitives
Because mce is arch-specific x86 code, there is little or no
performance benefit of using rcu_dereference_index_check() over using
smp_load_acquire().  It also turns out that mce is the only place that
array-index-based RCU is used, and it would be convenient to drop
this portion of the RCU API.

This patch therefore changes rcu_dereference_index_check() uses to
smp_load_acquire(), but keeping the lockdep diagnostics, and also
changes rcu_access_index() uses to READ_ONCE().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: linux-edac@vger.kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
2015-05-27 12:56:16 -07:00
Bartosz Golaszewski
7d79a7bd75 x86: Replace cpu_**_mask() with topology_**_cpumask()
The former duplicate the functionalities of the latter but are
neither documented nor arch-independent.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benoit Cousson <bcousson@baylibre.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/1432645896-12588-9-git-send-email-bgolaszewski@baylibre.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 15:22:17 +02:00
Bartosz Golaszewski
06931e6224 sched/topology: Rename topology_thread_cpumask() to topology_sibling_cpumask()
Rename topology_thread_cpumask() to topology_sibling_cpumask()
for more consistency with scheduler code.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Benoit Cousson <bcousson@baylibre.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/1432645896-12588-2-git-send-email-bgolaszewski@baylibre.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 15:22:15 +02:00
Luis R. Rodriguez
cb32edf65b x86/mm/pat: Wrap pat_enabled into a function API
We use pat_enabled in x86-specific code to see if PAT is enabled
or not but we're granting full access to it even though readers
do not need to set it. If, for instance, we granted access to it
to modules later they then could override the variable
setting... no bueno.

This renames pat_enabled to a new static variable __pat_enabled.
Folks are redirected to use pat_enabled() now.

Code that sets this can only be internal to pat.c. Apart from
the early kernel parameter "nopat" to disable PAT, we also have
a few cases that disable it later and make use of a helper
pat_disable(). It is wrapped under an ifdef but since that code
cannot run unless PAT was enabled its not required to wrap it
with ifdefs, unwrap that. Likewise, since "nopat" doesn't really
change non-PAT systems just remove that ifdef as well.

Although we could add and use an early_param_off(), these
helpers don't use __read_mostly but we want to keep
__read_mostly for __pat_enabled as this is a hot path -- upon
boot, for instance, a simple guest may see ~4k accesses to
pat_enabled(). Since __read_mostly early boot params are not
that common we don't add a helper for them just yet.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1430425520-22275-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-13-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:41:01 +02:00
Luis R. Rodriguez
f9626104a5 x86/mm/mtrr: Generalize runtime disabling of MTRRs
It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT and end
up with a system with MTRR functionality disabled but PAT
functionality enabled. This can happen, for instance, when the
Xen hypervisor is used where MTRRs are not supported but PAT is.
This can happen on Linux as of commit

  47591df505 ("xen: Support Xen pv-domains using PAT")

by Juergen, introduced in v3.19.

Technically, we should assume the proper CPU bits would be set
to disable MTRRs but we can't always rely on this. At least on
the Xen Hypervisor, for instance, only X86_FEATURE_MTRR was
disabled as of Xen 4.4 through Xen commit 586ab6a [0], but not
X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR, or
X86_FEATURE_CYRIX_ARR for instance.

Roger Pau Monné has clarified though that although this is
technically true we will never support PVH on these CPU types so
Xen has no need to disable these bits on those systems. As per
Roger, AMD K6, Centaur and VIA chips don't have the necessary
hardware extensions to allow running PVH guests [1].

As per Toshi it is also possible for the BIOS to disable MTRR
support, in such cases get_mtrr_state() would update the MTRR
state as per the BIOS, we need to propagate this information as
well.

x86 MTRR code relies on quite a bit of checks for mtrr_if being
set to check to see if MTRRs did get set up. Instead, lets
provide a generic getter for that. This also adds a few checks
where they were not before which could potentially safeguard
ourselves against incorrect usage of MTRR where this was not
desirable.

Where possible match error codes as if MTRRs were disabled on
arch/x86/include/asm/mtrr.h.

Lastly, since disabling MTRRs can happen at run time and we
could end up with PAT enabled, best record now in our logs when
MTRRs are disabled.

[0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a 4.4.0-rc1~18
[1] http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg03460.html

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: bhelgaas@google.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: konrad.wilk@oracle.com
Cc: venkatesh.pallipadi@intel.com
Cc: ville.syrjala@linux.intel.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1426893517-2511-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-12-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:41:01 +02:00
Luis R. Rodriguez
7d010fdf29 x86/mm/mtrr: Avoid #ifdeffery with phys_wc_to_mtrr_index()
There is only one user but since we're going to bury MTRR next
out of access to drivers, expose this last piece of API to
drivers in a general fashion only needing io.h for access to
helpers.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Abhilash Kesavan <a.kesavan@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Cristian Stoica <cristian.stoica@freescale.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1429722736-4473-1-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-11-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:41:00 +02:00
Luis R. Rodriguez
2f9e897353 x86/mm/mtrr, pat: Document Write Combining MTRR type effects on PAT / non-PAT pages
As part of the effort to phase out MTRR use document
write-combining MTRR effects on pages with different non-PAT
page attributes flags and different PAT entry values. Extend
arch_phys_wc_add() documentation to clarify power of two sizes /
boundary requirements as we phase out mtrr_add() use.

Lastly hint towards ioremap_uc() for corner cases on device
drivers working with devices with mixed regions where MTRR size
requirements would otherwise not enable write-combining
effective memory types.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-fbdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1430343851-967-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:40:59 +02:00
Toshi Kani
b73522e0c1 x86/mm/mtrr: Enhance MTRR checks in kernel mapping helpers
This patch adds the argument 'uniform' to mtrr_type_lookup(),
which gets set to 1 when a given range is covered uniformly by
MTRRs, i.e. the range is fully covered by a single MTRR entry or
the default type.

Change pud_set_huge() and pmd_set_huge() to honor the 'uniform'
flag to see if it is safe to create a huge page mapping in the
range.

This allows them to create a huge page mapping in a range
covered by a single MTRR entry of any memory type. It also
detects a non-optimal request properly. They continue to check
with the WB type since it does not effectively change the
uniform mapping even if a request spans multiple MTRR entries.

pmd_set_huge() logs a warning message to a non-optimal request
so that driver writers will be aware of such a case. Drivers
should make a mapping request aligned to a single MTRR entry
when the range is covered by MTRRs.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
[ Realign, flesh out comments, improve warning message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-7-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:40:58 +02:00
Toshi Kani
0cc705f56e x86/mm/mtrr: Clean up mtrr_type_lookup()
MTRRs contain fixed and variable entries. mtrr_type_lookup() may
repeatedly call __mtrr_type_lookup() to handle a request that
overlaps with variable entries.

However, __mtrr_type_lookup() also handles the fixed entries,
which do not have to be repeated. Therefore, this patch creates
separate functions, mtrr_type_lookup_fixed() and
mtrr_type_lookup_variable(), to handle the fixed and variable
ranges respectively.

The patch also updates the function headers to clarify the
return values and output argument. It updates comments to
clarify that the repeating is necessary to handle overlaps with
the default type, since overlaps with multiple entries alone can
be handled without such repeating.

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-6-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:40:57 +02:00
Toshi Kani
3d3ca416d9 x86/mm/mtrr: Use symbolic define as a retval for disabled MTRRs
mtrr_type_lookup() returns verbatim 0xFF when MTRRs are
disabled. This patch defines MTRR_TYPE_INVALID to clarify the
meaning of this value, and documents its usage.

Document the return values of the kernel virtual address mapping
helpers pud_set_huge(), pmd_set_huge, pud_clear_huge() and
pmd_clear_huge().

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-5-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:40:57 +02:00
Toshi Kani
9b3aca6208 x86/mm/mtrr: Fix MTRR state checks in mtrr_type_lookup()
'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
and E (MTRRs enabled) flags in MSR_MTRRdefType.  Intel SDM,
section 11.11.2.1, defines these flags as follows:

 - All MTRRs are disabled when the E flag is clear.
   The FE flag has no affect when the E flag is clear.
 - The default type is enabled when the E flag is set.
 - MTRR variable ranges are enabled when the E flag is set.
 - MTRR fixed ranges are enabled when both E and FE flags
   are set.

MTRR state checks in __mtrr_type_lookup() do not match with SDM.

Hence, this patch makes the following changes:
 - The current code detects MTRRs disabled when both E and
   FE flags are clear in mtrr_state.enabled.  Fix to detect
   MTRRs disabled when the E flag is clear.
 - The current code does not check if the FE bit is set in
   mtrr_state.enabled when looking at the fixed entries.
   Fix to check the FE flag.
 - The current code returns the default type when the E flag
   is clear in mtrr_state.enabled. However, the default type
   is UC when the E flag is clear.  Remove the code as this
   case is handled as MTRR disabled with the 1st change.

In addition, this patch defines the E and FE flags in
mtrr_state.enabled as follows.
 - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
 - E  flag: MTRR_STATE_MTRR_ENABLED

print_mtrr_state() and x86_get_mtrr_mem_range() are also updated
accordingly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-4-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:40:56 +02:00
Toshi Kani
7f0431e3dc x86/mm/mtrr: Fix MTRR lookup to handle an inclusive entry
When an MTRR entry is inclusive to a requested range, i.e. the
start and end of the request are not within the MTRR entry range
but the range contains the MTRR entry entirely:

  range_start ... [mtrr_start ... mtrr_end] ... range_end

__mtrr_type_lookup() ignores such a case because both
start_state and end_state are set to zero.

This bug can cause the following issues:

1) reserve_memtype() tracks an effective memory type in case
   a request type is WB (ex. /dev/mem blindly uses WB). Missing
   to track with its effective type causes a subsequent request
   to map the same range with the effective type to fail.

2) pud_set_huge() and pmd_set_huge() check if a requested range
   has any overlap with MTRRs. Missing to detect an overlap may
   cause a performance penalty or undefined behavior.

This patch fixes the bug by adding a new flag, 'inclusive',
to detect the inclusive case.  This case is then handled in
the same way as end_state:1 since the first region is the same.
With this fix, __mtrr_type_lookup() handles the inclusive case
properly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-3-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:40:56 +02:00
Ingo Molnar
d563a6bb3d Linux 4.1-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVYnloAAoJEHm+PkMAQRiGCgkH/j3r2djOOm4h83FXrShaHORY
 p8TBI3FNj4fzLk2PfzqbmiDw2T2CwygB+pxb2Ac9CE99epw8qPk2SRvPXBpdKR7t
 lolhhwfzApLJMZbhzNLVywUCDUhFoiEWRhmPqIfA3WXFcIW3t5VNXAoIFjV5HFr6
 sYUlaxSI1XiQ5tldVv8D6YSFHms41pisziBIZmzhIUg10P6Vv3D0FbE74fjAJwx0
 +08zj3EO7yQMv7Aeeq8F8AJ3628142rcZf0NWF5ohlKLRK3gt0cl9jO5U4Co2dDt
 29v03LP5EI6jDKkIbuWlqRMq9YxJz7N3wnkzV0EJiqXucoqPLFDqzbxB4gnS1pI=
 =7vbA
 -----END PGP SIGNATURE-----

Merge tag 'v4.1-rc5' into x86/mm, to refresh the tree before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:40:10 +02:00
Xie XiuQi
5c31b2800d x86/mce: Fix monarch timeout setting through the mce= cmdline option
Using "mce=1,10000000" on the kernel cmdline to change the
monarch timeout does not work. The cause is that get_option()
does parse a subsequent comma in the option string and signals
that with a return value. So we don't need to check for a second
comma ourselves.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1432120943-25028-1-git-send-email-xiexiuqi@huawei.com
Link: http://lkml.kernel.org/r/1432628901-18044-19-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:39:14 +02:00
Prarit Bhargava
adafb98da6 x86/cpu: Strip any /proc/cpuinfo model name field whitespace
When comparing the 'model name' field of each core in
/proc/cpuinfo it was noticed that there is a whitespace
difference between the cores' model names.

After some quick investigation it was noticed that the model
name fields were actually different -- processor 0's model name
field had trailing whitespace removed, while the other
processors did not.

Another way of seeing this behaviour is to convert spaces into
underscores in the output of /proc/cpuinfo,

  [thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
  ______1_model_name      :_AMD_Opteron(TM)_Processor_6272
  _____63_model_name      :_AMD_Opteron(TM)_Processor_6272_________________

which shows the discrepancy.

This occurs because the kernel calls strim() on cpu 0's
x86_model_id field to output a pretty message to the console in
print_cpu_info(), and as a result strips the whitespace at the
end of the ->x86_model_id field.

But, the ->x86_model_id field should be the same for the all
identical CPUs in the box. Thus, we need to remove both leading
and trailing whitespace.

As a result, the print_cpu_info() output looks like

  smpboot: CPU0: AMD Opteron(TM) Processor 6272 (fam: 15, model: 01, stepping: 02)

and the x86_model_id field is correct on all processors on AMD
platforms:

  _____64_model_name      :_AMD_Opteron(TM)_Processor_6272

Output is still correct on an Intel box:

  ____144_model_name      :_Intel(R)_Xeon(R)_CPU_E7-8890_v3_@_2.50GHz

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
Link: http://lkml.kernel.org/r/1432628901-18044-15-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:38:24 +02:00
Huang Rui
0fb0328d34 sched/x86: Drop repeated word from mwait_idle() comment
A single "default" is fine.

Signed-off-by: Huang Rui <ray.huang@amd.com>
[ Fix another typo and reflow comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432022472-2224-5-git-send-email-ray.huang@amd.com
Link: http://lkml.kernel.org/r/1432628901-18044-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:38:04 +02:00
Ingo Molnar
d65fcd608f x86/fpu: Simplify copy_kernel_to_xregs_booting()
copy_kernel_to_xregs_booting() has a second parameter that is the mask
of xfeatures that should be copied - but this parameter is always -1.

Simplify the call site of this function, this also makes it more
similar to the function call signature of other copy_kernel_to*regs()
functions.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:33 +02:00
Ingo Molnar
003e2e8b57 x86/fpu: Standardize the parameter type of copy_kernel_to_fpregs()
Bring the __copy_fpstate_to_fpregs() and copy_fpstate_to_fpregs() functions
in line with the parameter passing convention of other kernel-to-FPU-registers
copying functions: pass around an in-memory FPU register state pointer,
instead of struct fpu *.

NOTE: This patch also changes the assembly constraint of the FXSAVE-leak
      workaround from 'fpu->fpregs_active' to 'fpstate' - but that is fine,
      as we only need a valid memory address there for the FILDL instruction.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:32 +02:00
Ingo Molnar
9ccc27a5d2 x86/fpu: Remove error return values from copy_kernel_to_*regs() functions
None of the copy_kernel_to_*regs() FPU register copying functions are
supposed to fail, and all of them have debugging checks that enforce
this.

Remove their return values and simplify their call sites, which have
redundant error checks and error handling code paths.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:30 +02:00
Ingo Molnar
3e1bf47e5c x86/fpu: Rename copy_fpstate_to_fpregs() to copy_kernel_to_fpregs()
Bring the __copy_fpstate_to_fpregs() and copy_fpstate_to_fpregs() functions
in line with the naming of other kernel-to-FPU-registers copying functions.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:29 +02:00
Ingo Molnar
ce2a1e67f1 x86/fpu: Add debugging check to fpu__restore()
The copy_fpstate_to_fpregs() function is never supposed to fail,
so add a debugging check to its call site in fpu__restore().

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:27 +02:00
Ingo Molnar
343763c3b0 x86/fpu: Optimize fpu__activate_fpstate_write()
fpu__activate_fpstate_write() is used before ptrace writes to the fpstate
context. Because it expects the modified registers to be reloaded on the
nexts context switch, it's only valid to call this function for stopped
child tasks.

  - add a debugging check for this assumption

  - remove code that only runs if the current task's FPU state needs
    to be saved, which cannot occur here

  - update comments to match the implementation

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:27 +02:00
Ingo Molnar
6a81d7eb33 x86/fpu: Rename fpu__activate_fpstate() to fpu__activate_fpstate_write()
Remaining users of fpu__activate_fpstate() are all places that want to modify
FPU registers, rename the function to fpu__activate_fpstate_write() according
to this usage.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:26 +02:00
Ingo Molnar
9ba6b79102 x86/fpu: Optimize fpu__activate_fpstate_read()
fpu__activate_fpstate_read() is used before FPU registers are
read from the fpstate by ptrace and core dumping.

It's not necessary to unlazy non-current child tasks in this case,
since the reading of registers is non-destructive.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:25 +02:00
Ingo Molnar
0560281266 x86/fpu: Split out the fpu__activate_fpstate_read() method
Currently fpu__activate_fpstate() is used for two distinct purposes:

  - read access by ptrace and core dumping, where in the core dumping
    case the current task's FPU state may be examined as well.

  - write access by ptrace, which modifies FPU registers and expects
    the modified registers to be reloaded on the next context switch.

Split out the reading side into fpu__activate_fpstate_read().

( Note that this is just a pure duplication of fpu__activate_fpstate()
  for the time being, we'll optimize the new function in the next patch. )

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:24 +02:00
Ingo Molnar
47f01e8cc2 x86/fpu: Fix FPU register read access to the current task
Bobby Powers reported the following FPU warning during ELF coredumping:

   WARNING: CPU: 0 PID: 27452 at arch/x86/kernel/fpu/core.c:324 fpu__activate_stopped+0x8a/0xa0()

This warning unearthed an invalid assumption about fpu__activate_stopped()
that I added in:

  67e97fc2ec ("x86/fpu: Rename init_fpu() to fpu__unlazy_stopped() and add debugging check")

the old init_fpu() function had an (intentional but obscure) side effect:
when FPU registers are accessed for the current task, for reading, then
it synchronized live in-register FPU state with the fpstate by saving it.

So fix this bug by saving the FPU if we are the current task. We'll
still warn in fpu__save() if this is called for not yet stopped
child tasks, so the debugging check is still preserved.

Also rename the function to fpu__activate_fpstate(), because it's not
exclusively used for stopped tasks, but for the current task as well.

( Note that this bug calls for a cleaner separation of access-for-read
  and access-for-modification FPU methods, but we'll do that in separate
  patches. )

Reported-by: Bobby Powers <bobbypowers@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 12:40:18 +02:00
Alexander Shishkin
a82d24edfe perf/x86/intel/pt: Remove redundant variable declaration
There is a 'pt' variable in the outer scope of pt_event_stop() with the same
type, we don't really need another one in the inner scope.

This patch removes the redundant variable declaration.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1432308626-18845-8-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:48 +02:00
Alexander Shishkin
0a487aad2d perf/x86/intel/pt: Kill pt_is_running()
Initially, we were trying to guard against scenarios where somebody
attaches to the system with a hardware debugger while PT is enabled
from software and pt_is_running() tries to make sure we handle this
better, but the truth is, there is still a race window no matter what
and people with hardware debuggers should really know what they are
doing anyway.

In other words, there is no point in keeping this one around, and
it's one RDMSR instructions fewer in the fast path.

The case when PT is enabled by the BIOS at boot time is handled
in the driver initialization path and doesn't use pt_is_running().

This patch gets rid of it.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1429622177-22843-6-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:48 +02:00
Alexander Shishkin
5b1dbd17c0 perf/x86/intel/pt: Document pt_buffer_reset_offsets()
Currently, the description of pt_buffer_reset_offsets() lacks information
about its calling constraints and ordering with regards to other buffer
management functions.

Add a clarification about when this function has to be called.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1429622177-22843-5-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:47 +02:00
Alexander Shishkin
cf302bfdf3 perf/x86/intel/pt: Document pt_buffer_reset_markers()
The comments in the driver don't make it absolutely clear as to what
exactly is the calling order and other possible constraints of buffer
management functions.

Document constraints and calling order for the buffer configuration
functions. While at it, replace a redundant check in
pt_buffer_reset_markers() with an explanation why it is not needed.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1429622177-22843-4-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:47 +02:00
Alexander Shishkin
74387bcb71 perf/x86/intel/pt: Kill an unused variable
Currently, there's a set-but-not-used variable in setup_topa_index();
this patch gets rid of it. And while at it, fixes a style issue with
brackets around a one-line block.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1429622177-22843-2-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:46 +02:00
Peter Zijlstra
ba040653b4 perf/x86/intel: Simplify put_exclusive_constraints()
Don't bother with taking locks if we're not actually going to do
anything. Also, drop the _irqsave(), this is very much only called
from IRQ-disabled context.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:46 +02:00
Peter Zijlstra
8736e548db perf/x86: Simplify the x86_schedule_events() logic
!x && y == ! (x || !y)

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:45 +02:00
Peter Zijlstra
43ef205bde perf/x86/intel: Remove intel_excl_states::init_state
For some obscure reason intel_{start,stop}_scheduling() copy the HT
state to an intermediate array. This would make sense if we ever were
to make changes to it which we'd have to discard.

Except we don't. By the time we call intel_commit_scheduling() we're;
as the name implies; committed to them. We'll never back out.

A further hint its pointless is that stop_scheduling() unconditionally
publishes the state.

So the intermediate array is pointless, modify the state in place and
kill the extra array.

And remove the pointless array initialization: INTEL_EXCL_UNUSED == 0.

Note; all is serialized by intel_excl_cntr::lock.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:45 +02:00
Peter Zijlstra
1fe684e349 perf/x86/intel: Remove pointless tests
Both intel_commit_scheduling() and intel_get_excl_contraints() test
for cntr < 0.

The only way that can happen (aside from a bug) is through
validate_event(), however that is already captured by the
cpuc->is_fake test.

So remove these test and simplify the code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:44 +02:00
Peter Zijlstra
0c41e756b9 perf/x86/intel: Clean up intel_commit_scheduling() placement
Move the code of intel_commit_scheduling() to the right place, which is
in between start() and stop().

No change in functionality.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:44 +02:00
Peter Zijlstra
17186ccda3 perf/x86/intel: Make WARN()ings consistent
The intel_commit_scheduling() callback is pointlessly different from
the start and stop scheduling callback.

Furthermore, the constraint should never be NULL, so remove that test.

Even though we'll never get called (because we NULL the callbacks)
when !is_ht_workaround_enabled() put that test in.

Collapse the (pointless) WARN_ON_ONCE() and bail on !cpuc->excl_cntrs --
this is doubly pointless, because its the same condition as
is_ht_workaround_enabled() which was already pointless because the
whole method won't ever be called.

Furthremore, make all the !excl_cntrs test WARN_ON_ONCE(); they're all
pointless, because the above, either the function
({get,put}_excl_constraint) are already predicated on it existing or
the is_ht_workaround_enabled() thing is the same test.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:43 +02:00
Peter Zijlstra
aaf932e816 perf/x86/intel: Simplify the dynamic constraint code somewhat
We have two 'struct event_constraint' local variables in
intel_get_excl_constraints(): 'cx' and 'c'.

Instead of using 'cx' after the dynamic allocation, put all 'cx' inside
the dynamic allocation block and use 'c' outside of it.

Also use direct assignment to copy the structure; let the compiler
figure it out.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:43 +02:00
Peter Zijlstra
b32ed7f5de perf/x86/intel: Add lockdep assert
Lockdep is very good at finding incorrect IRQ state while locking and
is far better at telling us if we hold a lock than the _is_locked()
API. It also generates less code for !DEBUG kernels.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:42 +02:00
Peter Zijlstra
1c565833ac perf/x86/intel: Correct local vs remote sibling state
For some obscure reason the current code accounts the current SMT
thread's state on the remote thread and reads the remote's state on
the local SMT thread.

While internally consistent, and 'correct' its pointless confusion we
can do without.

Flip them the right way around.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:42 +02:00
Matt Fleming
adafa99960 perf/x86/intel/cqm: Use 'u32' data type for RMIDs
Since we write RMID values to MSRs the correct type to use is 'u32'
because that clearly articulates we're writing a hardware register
value.

Fix up all uses of RMID in this code to consistently use the correct data
type.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/1432285182-17180-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:41 +02:00
Thomas Gleixner
bf926731e1 perf/x86/intel/cqm: Add storage for 'closid' and clean up 'struct intel_pqr_state'
'closid' (CLass Of Service ID) is used for the Class based Cache
Allocation Technology (CAT). Add explicit storage to the per cpu cache
for it, so it can be used later with the CAT support (requires to move
the per cpu data).

While at it:

 - Rename the structure to intel_pqr_state which reflects the actual
   purpose of the struct: cache values which go into the PQR MSR

 - Rename 'cnt' to rmid_usecnt which reflects the actual purpose of
   the counter.

 - Document the structure and the struct members.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235150.240899319@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:41 +02:00
Thomas Gleixner
43d0c2f6dc perf/x86/intel/cqm: Remove useless wrapper function
intel_cqm_event_del() is a 1:1 wrapper for intel_cqm_event_stop().
Remove the useless indirection.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235150.159779847@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:40 +02:00
Thomas Gleixner
0bac237845 perf/x86/intel/cqm: Avoid pointless MSR write
If the usage counter is non-zero there is no point to update the rmid
in the PQR MSR.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235150.080844281@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:40 +02:00
Thomas Gleixner
9e7eaac95a perf/x86/intel/cqm: Remove pointless spinlock from state cache
'struct intel_cqm_state' is a strict per CPU cache of the rmid and the
usage counter. It can never be modified from a remote CPU.

The three functions which modify the content: intel_cqm_event[start|stop|del]
(del maps to stop) are called from the perf core with interrupts disabled
which is enough protection for the per CPU state values.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235150.001006529@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:39 +02:00
Thomas Gleixner
b3df4ec442 perf/x86/intel/cqm: Use proper data types
'int' is really not a proper data type for an MSR. Use u32 to make it
clear that we are dealing with a 32-bit unsigned hardware value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235149.919350144@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:39 +02:00
Thomas Gleixner
f4d9757ca6 perf/x86/intel/cqm: Document PQR MSR abuse
The CQM code acts like it owns the PQR MSR completely. That's not true
because only the lower 10 bits are used for CQM. The upper 32 bits are
used for the 'CLass Of Service ID' (CLOSID). Document the abuse. Will be
fixed in a later patch.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235149.823214798@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:38 +02:00
Ingo Molnar
8d12ded3dd Merge branch 'perf/urgent' into perf/core, before applying dependent patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:21 +02:00
Don Zickus
68ab747604 perf/x86: Tweak broken BIOS rules during check_hw_exists()
I stumbled upon an AMD box that had the BIOS using a hardware performance
counter. Instead of printing out a warning and continuing, it failed and
blocked further perf counter usage.

Looking through the history, I found this commit:

  a5ebe0ba3d ("perf/x86: Check all MSRs before passing hw check")

which tweaked the rules for a Xen guest on an almost identical box and now
changed the behaviour.

Unfortunately the rules were tweaked incorrectly and will always lead to
MSR failures even though the MSRs are completely fine.

What happens now is in arch/x86/kernel/cpu/perf_event.c::check_hw_exists():

<snip>
        for (i = 0; i < x86_pmu.num_counters; i++) {
                reg = x86_pmu_config_addr(i);
                ret = rdmsrl_safe(reg, &val);
                if (ret)
                        goto msr_fail;
                if (val & ARCH_PERFMON_EVENTSEL_ENABLE) {
                        bios_fail = 1;
                        val_fail = val;
                        reg_fail = reg;
                }
        }

<snip>
        /*
         * Read the current value, change it and read it back to see if it
         * matches, this is needed to detect certain hardware emulators
         * (qemu/kvm) that don't trap on the MSR access and always return 0s.
         */
        reg = x86_pmu_event_addr(0);
				^^^^

if the first perf counter is enabled, then this routine will always fail
because the counter is running. :-(

        if (rdmsrl_safe(reg, &val))
                goto msr_fail;
        val ^= 0xffffUL;
        ret = wrmsrl_safe(reg, val);
        ret |= rdmsrl_safe(reg, &val_new);
        if (ret || val != val_new)
                goto msr_fail;

The above bios_fail used to be a 'goto' which is why it worked in the past.

Further, most vendors have migrated to using fixed counters to hide their
evilness hence this problem rarely shows up now days except on a few old boxes.

I fixed my problem and kept the spirit of the original Xen fix, by recording a
safe non-enable register to be used safely for the reading/writing check.
Because it is not enabled, this passes on bare metal boxes (like metal), but
should continue to throw an msr_fail on Xen guests because the register isn't
emulated yet.

Now I get a proper bios_fail error message and Xen should still see their
msr_fail message (untested).

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: george.dunlap@eu.citrix.com
Cc: konrad.wilk@oracle.com
Link: http://lkml.kernel.org/r/1431976608-56970-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:16:20 +02:00
Alexander Shishkin
f73ec48c90 perf/x86/intel/pt: Untangle pt_buffer_reset_markers()
Currently, pt_buffer_reset_markers() is a difficult to read knot of
arithmetics with a redundant check for multiple-entry TOPA capability,
a commented out wakeup marker placement and a logical error wrt to
stop marker placement. The latter happens when write head is not page
aligned and results in stop marker being placed one page earlier than
it actually should.

All these problems only affect PT implementations that support
multiple-entry TOPA tables (read: proper scatter-gather).

For single-entry TOPA implementations, there is no functional impact.

This patch deals with all of the above. Tested on both single-entry
and multiple-entry TOPA PT implementations.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1432308626-18845-4-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:16:20 +02:00
Peter Zijlstra
cc1790cf54 perf/x86: Improve HT workaround GP counter constraint
The (SNB/IVB/HSW) HT bug only affects events that can be programmed
onto GP counters, therefore we should only limit the number of GP
counters that can be used per cpu -- iow we should not constrain the
FP counters.

Furthermore, we should only enfore such a limit when there are in fact
exclusive events being scheduled on either sibling.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Fixed build fail for the !CONFIG_CPU_SUP_INTEL case. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:16:03 +02:00
Peter Zijlstra
b371b59431 perf/x86: Fix event/group validation
Commit 43b4578071 ("perf/x86: Reduce stack usage of
x86_schedule_events()") violated the rule that 'fake' scheduling; as
used for event/group validation; should not change the event state.

This went mostly un-noticed because repeated calls of
x86_pmu::get_event_constraints() would give the same result. And
x86_pmu::put_event_constraints() would mostly not do anything.

Commit e979121b1b ("perf/x86/intel: Implement cross-HT corruption
bug workaround") made the situation much worse by actually setting the
event->hw.constraint value to NULL, so when validation and actual
scheduling interact we get NULL ptr derefs.

Fix it by removing the constraint pointer from the event and move it
back to an array, this time in cpuc instead of on the stack.

validate_group()
  x86_schedule_events()
    event->hw.constraint = c; # store

      <context switch>
        perf_task_event_sched_in()
          ...
            x86_schedule_events();
              event->hw.constraint = c2; # store

              ...

              put_event_constraints(event); # assume failure to schedule
                intel_put_event_constraints()
                  event->hw.constraint = NULL;

      <context switch end>

    c = event->hw.constraint; # read -> NULL

    if (!test_bit(hwc->idx, c->idxmsk)) # <- *BOOM* NULL deref

This in particular is possible when the event in question is a
cpu-wide event and group-leader, where the validate_group() tries to
add an event to the group.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 43b4578071 ("perf/x86: Reduce stack usage of x86_schedule_events()")
Fixes: e979121b1b ("perf/x86/intel: Implement cross-HT corruption bug workaround")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 08:46:44 +02:00
Ingo Molnar
6e5535940f x86/fpu: Fix fpu__init_system_xstate() comments
Remove obsolete comment about __init limitations: in the new code there aren't any.

Also standardize the comment style in the function while at it.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-25 12:49:34 +02:00
Andy Lutomirski
cdeb604894 x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
The early_idt_handlers asm code generates an array of entry
points spaced nine bytes apart.  It's not really clear from that
code or from the places that reference it what's going on, and
the code only works in the first place because GAS never
generates two-byte JMP instructions when jumping to global
labels.

Clean up the code to generate the correct array stride (member size)
explicitly. This should be considerably more robust against
screw-ups, as GAS will warn if a .fill directive has a negative
count.  Using '. =' to advance would have been even more robust
(it would generate an actual error if it tried to move
backwards), but it would pad with nulls, confusing anyone who
tries to disassemble the code.  The new scheme should be much
clearer to future readers.

While we're at it, improve the comments and rename the array and
common code.

Binutils may start relaxing jumps to non-weak labels.  If so,
this change will fix our build, and we may need to backport this
change.

Before, on x86_64:

  0000000000000000 <early_idt_handlers>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 00 00 00 00          jmpq   9 <early_idt_handlers+0x9>
                          5: R_X86_64_PC32        early_idt_handler-0x4
  ...
    48:   66 90                   xchg   %ax,%ax
    4a:   6a 08                   pushq  $0x8
    4c:   e9 00 00 00 00          jmpq   51 <early_idt_handlers+0x51>
                          4d: R_X86_64_PC32       early_idt_handler-0x4
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   e9 00 00 00 00          jmpq   120 <early_idt_handler>
                          11c: R_X86_64_PC32      early_idt_handler-0x4

After:

  0000000000000000 <early_idt_handler_array>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 14 01 00 00          jmpq   11d <early_idt_handler_common>
  ...
    48:   6a 08                   pushq  $0x8
    4a:   e9 d1 00 00 00          jmpq   120 <early_idt_handler_common>
    4f:   cc                      int3
    50:   cc                      int3
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   eb 03                   jmp    120 <early_idt_handler_common>
   11d:   cc                      int3
   11e:   cc                      int3
   11f:   cc                      int3

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Binutils <binutils@sourceware.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-24 08:35:03 +02:00
Ingo Molnar
6f56a8d024 Merge branch 'x86/urgent' into x86/fpu, to resolve a conflict
Conflicts:
	arch/x86/kernel/i387.c

This commit is conflicting:

  e88221c50c ("x86/fpu: Disable XSAVES* support for now")

These functions changed a lot, move the quirk to arch/x86/kernel/fpu/init.c's
fpu__init_system_xstate_size_legacy().

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 12:01:01 +02:00
Ingo Molnar
e88221c50c x86/fpu: Disable XSAVES* support for now
The kernel's handling of 'compacted' xsave state layout is buggy:

    http://marc.info/?l=linux-kernel&m=142967852317199

I don't have such a system, and the description there is vague, but
from extrapolation I guess that there were two kinds of bugs
observed:

  - boot crashes, due to size calculations being wrong and the dynamic
    allocation allocating a too small xstate area. (This is now fixed
    in the new FPU code - but still present in stable kernels.)

  - FPU state corruption and ABI breakage: if signal handlers try to
    change the FPU state in standard format, which then the kernel
    tries to restore in the compacted format.

These breakages are scary, but they only occur on a small number of
systems that have XSAVES* CPU support. Yet we have had XSAVES support
in the upstream kernel for a large number of stable kernel releases,
and the fixes are involved and unproven.

So do the safe resolution first: disable XSAVES* support and only
use the standard xstate format. This makes the code work and is
easy to backport.

On top of this we can work on enabling (and testing!) proper
compacted format support, without backporting pressure, on top of the
new, cleaned up FPU code.

Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 11:58:26 +02:00
Nicholas Krause
ed3cf15271 kvm: x86: Make functions that have no external callers static
This makes the functions kvm_guest_cpu_init and  kvm_init_debugfs
static now due to having no external callers outside their
declarations in the file, kvm.c.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20 11:48:21 +02:00
Ingo Molnar
5856afed0c x86/fpu/init: Clean up and comment the __setup() functions
Explain the functions and also standardize their style
and naming.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 11:39:35 +02:00
Ingo Molnar
7cf82d33b6 x86/fpu/init: Move __setup() functions to fpu/init.c
We had a number of FPU init related boot option handlers
in arch/x86/kernel/cpu/common.c - move them over into
arch/x86/kernel/fpu/init.c to have them all in a
single place.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 11:35:42 +02:00
Dave Airlie
bdcddf95e8 Linux 4.1-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVWh3TAAoJEHm+PkMAQRiG/kwH/2c9irodp2+M9OUnX2bfsBb6
 LnChiDpvkF5BB8jhP6d/XmvPp4NJzAbTxByhjdfb2E2HkorCUHCOIn2tI1TE2pUs
 2qjkOVH+XCzoV0goGtQjzK1ht8f2IrtlDiEjyRekK5cJHzhggb22QPtWL4npyd0O
 reDmG2jsRaF9POr9uLSFEv4CEnkksmRLUU0vuQX0TZeCJ41O7TXrkN/wKrLZ5mj4
 IWpqXQaSlrffq/T5HnVbXBxk3/T8QmhrIoppiMpV1mUVj0uTqlFRNi5qwT2Nit1h
 FVljWI4+WgOk3bf7fUlp+ahopjkTgu+GuXkiRP/pdgWNQO0cxCWSAzSndAlIIAE=
 =uOoJ
 -----END PGP SIGNATURE-----

Backmerge v4.1-rc4 into into drm-next

We picked up a silent conflict in amdkfd with drm-fixes and drm-next,
backmerge v4.1-rc5 and fix the conflicts

Signed-off-by: Dave Airlie <airlied@redhat.com>

Conflicts:
	drivers/gpu/drm/drm_irq.c
2015-05-20 16:23:53 +10:00
Paolo Bonzini
c35ebbeade Revert "kvmclock: set scheduler clock stable"
This reverts commit ff7bbb9c6a.
Sasha Levin is seeing odd jump in time values during boot of a KVM guest:

[...]
[    0.000000] tsc: Detected 2260.998 MHz processor
[3376355.247558] Calibrating delay loop (skipped) preset value..
[...]

and bisected them to this commit.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:37 +02:00
Thomas Gleixner
c3b5d3cea5 Merge branch 'linus' into timers/core
Make sure the upstream fixes are applied before adding further
modifications.
2015-05-19 16:12:32 +02:00
Feng Wu
501b32653e x86/irq: Show statistics information for posted-interrupts
Show the statistics information for notification event
and wakeup event for posted-interrupt in /proc/interrupts.

[ tglx: Named the short identifiers PIN and PIW to match the long
  	identifiers ]

Signed-off-by: Feng Wu <feng.wu@intel.com>
Cc: jiang.liu@linux.intel.com
Link: http://lkml.kernel.org/r/1432026437-16560-5-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19 15:51:17 +02:00
Feng Wu
f6b3c72c23 x86/irq: Define a global vector for VT-d Posted-Interrupts
Currently, we use a global vector as the Posted-Interrupts
Notification Event for all the vCPUs in the system. We need
to introduce another global vector for VT-d Posted-Interrtups,
which will be used to wakeup the sleep vCPU when an external
interrupt from a direct-assigned device happens for that vCPU.

[ tglx: Removed a gazillion of extra newlines ]

Signed-off-by: Feng Wu <feng.wu@intel.com>
Cc: jiang.liu@linux.intel.com
Link: http://lkml.kernel.org/r/1432026437-16560-4-git-send-email-feng.wu@intel.com
Suggested-by: Yang Zhang <yang.z.zhang@intel.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19 15:51:17 +02:00
Feng Wu
a2f1c8bdc0 x86/irq/msi: Implement irq_set_vcpu_affinity for remapped MSI irqs
Implement irq_set_vcpu_affinity for pci_msi_ir_controller.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/1432026437-16560-3-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19 15:51:17 +02:00
Ingo Molnar
e97131a839 x86/fpu: Add CONFIG_X86_DEBUG_FPU=y FPU debugging code
There are various internal FPU state debugging checks that never
trigger in practice, but which are useful for FPU code development.

Separate these out into CONFIG_X86_DEBUG_FPU=y, and also add a
couple of new ones.

The size difference is about 0.5K of code on defconfig:

   text        data     bss          filename
   15028906    2578816  1638400      vmlinux
   15029430    2578816  1638400      vmlinux

( Keep this enabled by default until the new FPU code is debugged. )

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:12 +02:00
Ingo Molnar
d364a7656c x86/fpu: Fix the 'nofxsr' boot parameter to also clear X86_FEATURE_FXSR_OPT
I tried to simulate an ancient CPU via this option, and
found that it still has fxsr_opt enabled, confusing the
FPU code.

Make the 'nofxsr' option also clear FXSR_OPT flag.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:12 +02:00
Ingo Molnar
e1884d69f6 x86/fpu: Pass 'struct fpu' to fpu__restore()
This cleans up the call sites and the function a bit,
and also makes it more symmetric with the other high
level FPU state handling functions.

It's still only valid for the current task, as we copy
to the FPU registers of the current CPU.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:11 +02:00
Ingo Molnar
32231879f6 x86/fpu/init: Propagate __init annotations
Now that all the FPU init function call dependencies are
cleaned up we can propagate __init annotations deeper.

This shrinks the runtime size of the kernel a bit, and
also addresses a few section warnings.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:11 +02:00
Ingo Molnar
5fd402dfa7 x86/fpu/xstate: Clean up setup_xstate_comp() call
So call setup_xstate_comp() from the xstate init code, not
from the generic fpu__init_system() code.

This allows us to remove the protytype from xstate.h as well.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:11 +02:00
Ingo Molnar
39f1acd243 x86/fpu/xstate: Don't assume the first zero xfeatures zero bit means the end
The current xstate code in setup_xstate_features() assumes that
the first zero bit means the end of xfeatures - but that is not
so, the SDM clearly states that an arbitrary set of xfeatures
might be enabled - and it is also clear from the description
of the compaction feature that holes are possible:

  "13-6 Vol. 1MANAGING STATE USING THE XSAVE FEATURE SET
  [...]

  Compacted format. Each state component i (i ≥ 2) is located at a byte
  offset from the base address of the XSAVE area based on the XCOMP_BV
  field in the XSAVE header:

  — If XCOMP_BV[i] = 0, state component i is not in the XSAVE area.

  — If XCOMP_BV[i] = 1, the following items apply:

  • If XCOMP_BV[j] = 0 for every j, 2 ≤ j < i, state component i is
    located at a byte offset 576 from the base address of the XSAVE
    area. (This item applies if i is the first bit set in bits 62:2 of
    the XCOMP_BV; it implies that state component i is located at the
    beginning of the extended region.)

  • Otherwise, let j, 2 ≤ j < i, be the greatest value such that
    XCOMP_BV[j] = 1. Then state component i is located at a byte offset
    X from the location of state component j, where X is the number of
    bytes required for state component j as enumerated in
    CPUID.(EAX=0DH,ECX=j):EAX. (This item implies that state component i
    immediately follows the preceding state component whose bit is set
    in XCOMP_BV.)"

So don't assume that the first zero xfeatures bit means the end of
all xfeatures - iterate through all of them.

I'm not aware of hardware that triggers this currently.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:10 +02:00
Ingo Molnar
63c6680cd0 x86/fpu: Move debugging check from kernel_fpu_begin() to __kernel_fpu_begin()
kernel_fpu_begin() is __kernel_fpu_begin() with a preempt_disable().

Move the kernel_fpu_begin() debugging check into __kernel_fpu_begin(),
so that users of __kernel_fpu_begin() may benefit from it as well.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:10 +02:00
Ingo Molnar
aeb997b9f2 x86/fpu: Change fpu->fpregs_active from 'int' to 'char', add lazy switching comments
Improve the memory layout of 'struct fpu':

 - change ->fpregs_active from 'int' to 'char' - it's just a single flag
   and modern x86 CPUs can do efficient byte accesses.

 - pack related fields closer to each other: often 'fpu->state' will not be
   touched, while the other fields will - so pack them into a group.

Also add comments to each field, describing their purpose, and add
some background information about lazy restores.

Also fix an obsolete, lazy switching related comment in fpu_copy()'s description.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:09 +02:00
Ingo Molnar
c47ada305d x86/fpu: Harmonize FPU register state types
Use these consistent names:

    struct fregs_state           # was: i387_fsave_struct
    struct fxregs_state          # was: i387_fxsave_struct
    struct swregs_state          # was: i387_soft_struct
    struct xregs_state           # was: xsave_struct
    union  fpregs_state          # was: thread_xstate

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:09 +02:00
Ingo Molnar
0c306bcfba x86/fpu: Factor out the FPU regset code into fpu/regset.c
So much of fpu/core.c is the regset code, but it just obscures the generic
FPU state machine logic. Factor out the regset code into fpu/regset.c, where
it can be read in isolation.

This affects one API: fpu__activate_stopped() has to be made available
from the core to fpu/regset.c.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:09 +02:00
Ingo Molnar
b992c660d3 x86/fpu: Factor out fpu/signal.c
fpu/xstate.c has a lot of generic FPU signal frame handling routines,
move them into a separate file: fpu/signal.c.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:08 +02:00
Ingo Molnar
c681314421 x86/fpu: Rename all the fpregs, xregs, fxregs and fregs handling functions
Standardize the naming of the various functions that copy register
content in specific FPU context formats:

  copy_fxregs_to_kernel()         # was: fpu_fxsave()
  copy_xregs_to_kernel()          # was: xsave_state()

  copy_kernel_to_fregs()          # was: frstor_checking()
  copy_kernel_to_fxregs()         # was: fxrstor_checking()
  copy_kernel_to_xregs()          # was: fpu_xrstor_checking()
  copy_kernel_to_xregs_booting()  # was: xrstor_state_booting()

  copy_fregs_to_user()            # was: fsave_user()
  copy_fxregs_to_user()           # was: fxsave_user()
  copy_xregs_to_user()            # was: xsave_user()

  copy_user_to_fregs()            # was: frstor_user()
  copy_user_to_fxregs()           # was: fxrstor_user()
  copy_user_to_xregs()            # was: xrestore_user()
  copy_user_to_fpregs_zeroing()   # was: restore_user_xstate()

Eliminate fpu_xrstor_checking(), because it was just a wrapper.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:08 +02:00
Ingo Molnar
815418890e x86/fpu: Move restore_init_xstate() out of fpu/internal.h
Move restore_init_xstate() next to its sole caller.

Also rename it to copy_init_fpstate_to_fpregs() and add
some comments about what it does.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:08 +02:00
Ingo Molnar
6f57502310 x86/fpu: Generalize 'init_xstate_ctx'
So the handling of init_xstate_ctx has a layering violation: both
'struct xsave_struct' and 'union thread_xstate' have a
'struct i387_fxsave_struct' member:

   xsave_struct::i387
   thread_xstate::fxsave

The handling of init_xstate_ctx is generic, it is used on all
CPUs, with or without XSAVE instruction. So it's confusing how
the generic code passes around and handles an XSAVE specific
format.

What we really want is for init_xstate_ctx to be a proper
fpstate and we use its ::fxsave and ::xsave members, as
appropriate.

Since the xsave_struct::i387 and thread_xstate::fxsave aliases
each other this is not a functional problem.

So implement this, and move init_xstate_ctx to the generic FPU
code in the process.

Also, since init_xstate_ctx is not XSAVE specific anymore,
rename it to init_fpstate, and mark it __read_mostly,
because it's only modified once during bootup, and used
as a reference fpstate later on.

There's no change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:07 +02:00
Ingo Molnar
bf935b0b52 x86/fpu: Create 'union thread_xstate' helper for fpstate_init()
fpstate_init() only uses fpu->state, so pass that in to it.

This enables the cleanup we will do in the next patch.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:07 +02:00
Ingo Molnar
0aba697894 x86/fpu: Harmonize the names of the fpstate_init() helper functions
Harmonize the inconsistent naming of these related functions:

                          fpstate_init()
  finit_soft_fpu()   =>   fpstate_init_fsoft()
  fx_finit()         =>   fpstate_init_fxstate()
  fx_finit()         =>   fpstate_init_fstate()       # split out

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:07 +02:00
Ingo Molnar
e1cebad49c x86/fpu: Factor out the exception error code handling code
Factor out the FPU error code handling code from traps.c and fpu/internal.h
and move them close to each other.

Also convert the helper functions to 'struct fpu *', which further simplifies
them.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:06 +02:00
Ingo Molnar
acd58a3ad0 x86/fpu: Remove run-once init quirks
Remove various boot quirks that came from the old code.

The new code is cleanly split up into per-system and per-cpu
init sequences, and system init functions are only called once.

Remove the run-once quirks.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:06 +02:00
Ingo Molnar
59a36d16be x86/fpu: Factor out fpu/regset.h from fpu/internal.h
Only a few places use the regset definitions, so factor them out.

Also fix related header dependency assumptions.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:06 +02:00
Ingo Molnar
fcbc99c403 x86/fpu: Split out fpu/signal.h from fpu/internal.h for signal frame handling functions
Most of the FPU does not use them, so split it out and include
them in signal.c and ia32_signal.c

Also fix header file dependency assumption in fpu/core.c.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:05 +02:00
Ingo Molnar
05012c13f6 x86/fpu: Move is_ia32*frame() helpers out of fpu/internal.h
Move them to their only user. This makes the code easier to read,
the header is less cluttered, and it also speeds up the build a bit.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:05 +02:00
Ingo Molnar
fbce778246 x86/fpu: Merge fpu__reset() and fpu__clear()
With recent cleanups and fixes the fpu__reset() and fpu__clear()
functions have become almost identical in functionality: the only
difference is that fpu__reset() assumed that the fpstate
was already active in the eagerfpu case, while fpu__clear()
activated it if it was inactive.

This distinction almost never matters, the only case where such
fpstate activation happens if if the init thread (PID 1) gets exec()-ed
for the first time.

So keep fpu__clear() and change all fpu__reset() uses to
fpu__clear() to simpify the logic.

( In a later patch we'll further simplify fpu__clear() by making
  sure that all contexts it is called on are already active. )

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:05 +02:00
Ingo Molnar
82c0e45eb5 x86/fpu: Move the signal frame handling code closer to each other
Consolidate more signal frame related functions:

   text      data    bss     dec       filename
   14108070  2575280 1634304 18317654  vmlinux.before
   14107944  2575344 1634304 18317592  vmlinux.after

Also, while moving it, rename alloc_mathframe() to fpu__alloc_mathframe().

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:04 +02:00
Ingo Molnar
9dfe99b755 x86/fpu: Rename restore_xstate_sig() to fpu__restore_sig()
restore_xstate_sig() is a misnomer: it's not limited to 'xstate' at all,
it is the high level 'restore FPU state from a signal frame' function
that works with all legacy FPU formats as well.

Rename it (and its helper) accordingly, and also move it to the
fpu__*() namespace.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:04 +02:00
Ingo Molnar
04c8e01d50 x86/fpu: Move fpu__clear() to 'struct fpu *' parameter passing
Do it like all other high level FPU state handling functions: they
only know about struct fpu, not about the task.

(Also remove a dead prototype while at it.)

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:04 +02:00
Ingo Molnar
6ffc152e46 x86/fpu: Move all the fpu__*() high level methods closer to each other
The fpu__*() methods are closely related, but they are defined
in scattered places within the FPU code.

Concentrate them, and also uninline fpu__save(), fpu__drop()
and fpu__reset() to save about 5K of kernel text on 64-bit kernels:

   text            data    bss     dec        filename
   14113063        2575280 1634304 18322647   vmlinux.before
   14108070        2575280 1634304 18317654   vmlinux.after

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:04 +02:00
Ingo Molnar
0e75c54f17 x86/fpu: Rename restore_fpu_checking() to copy_fpstate_to_fpregs()
fpu_restore_checking() is a helper function of restore_fpu_checking(),
but this is not apparent from the naming.

Both copy fpstate contents to fpregs, while the fuller variant does
a full copy without leaking information.

So rename them to:

    copy_fpstate_to_fpregs()
  __copy_fpstate_to_fpregs()

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:03 +02:00
Ingo Molnar
5033861575 x86/fpu: Synchronize the naming of drop_fpu() and fpu_reset_state()
drop_fpu() and fpu_reset_state() are similar in functionality
and in scope, yet this is not apparent from their names.

drop_fpu() deactivates FPU contents (both the fpregs and the fpstate),
but leaves register contents intact in the eager-FPU case, mostly as an
optimization. It disables fpregs in the lazy FPU case. The drop_fpu()
method can be used to destroy FPU state in an optimized way, when we
know that a new state will be loaded before user-space might see
any remains of the old FPU state:

     - such as in sys_exit()'s exit_thread() where we know this task
       won't execute any user-space instructions anymore and the
       next context switch cleans up the FPU. The old FPU state
       might still be around in the eagerfpu case but won't be
       saved.

     - in __restore_xstate_sig(), where we use drop_fpu() before
       copying a new state into the fpstate and activating that one.
       No user-pace instructions can execute between those steps.

     - in sys_execve()'s fpu__clear(): there we use drop_fpu() in
       the !eagerfpu case, where it's equivalent to a full reinit.

fpu_reset_state() is a stronger version of drop_fpu(): both in
the eagerfpu and the lazy-FPU case it guarantees that fpregs
are reinitialized to init state. This method is used in cases
where we need a full reset:

     - handle_signal() uses fpu_reset_state() to reset the FPU state
       to init before executing a user-space signal handler. While we
       have already saved the original FPU state at this point, and
       always restore the original state, the signal handling code
       still has to do this reinit, because signals may interrupt
       any user-space instruction, and the FPU might be in various
       intermediate states (such as an unbalanced x87 stack) that is
       not immediately usable for general C signal handler code.

     - __restore_xstate_sig() uses fpu_reset_state() when the signal
       frame has no FP context. Since the signal handler may have
       modified the FPU state, it gets reset back to init state.

     - in another branch __restore_xstate_sig() uses fpu_reset_state()
       to handle a restoration error: when restore_user_xstate() fails
       to restore FPU state and we might have inconsistent FPU data,
       fpu_reset_state() is used to reset it back to a known good
       state.

     - __kernel_fpu_end() uses fpu_reset_state() in an error branch.
       This is in a 'must not trigger' error branch, so on bug-free
       kernels this never triggers.

     - fpu__restore() uses fpu_reset_state() in an error path
       as well: if the fpstate was set up with invalid FPU state
       (via ptrace or via a signal handler), then it's reset back
       to init state.

     - likewise, the scheduler's switch_fpu_finish() uses it in a
       restoration error path too.

Move both drop_fpu() and fpu_reset_state() to the fpu__*() namespace
and harmonize their naming with their function:

    fpu__drop()
    fpu__reset()

This clearly shows that both methods operate on the full state of the
FPU, just like fpu__restore().

Also add comments to explain what each function does.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:03 +02:00
Ingo Molnar
5e907bb045 x86/alternatives, x86/fpu: Add 'alternatives_patched' debug flag and use it in xsave_state()
We'd like to use xsave_state() earlier, but its SYSTEM_BOOTING check
is too imprecise.

The real condition that xsave_state() would like to check is whether
alternative XSAVE instructions were patched into the kernel image
already.

Add such a (read-mostly) debug flag and use it in xsave_state().

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:03 +02:00
Ingo Molnar
2e85591a6c x86/fpu: Better document fpu__clear() state handling
So prior to this fix:

  c88d47480d ("x86/fpu: Always restore_xinit_state() when use_eager_cpu()")

we leaked FPU state across execve() boundaries on eagerfpu systems:

	$ /host/home/mingo/dump-xmm-regs-exec
	# XMM state before execve():
	XMM0 : 000000000000dede
	XMM1 : 000000000000dedf
	XMM2 : 000000000000dee0
	XMM3 : 000000000000dee1
	XMM4 : 000000000000dee2
	XMM5 : 000000000000dee3
	XMM6 : 000000000000dee4
	XMM7 : 000000000000dee5
	XMM8 : 000000000000dee6
	XMM9 : 000000000000dee7
	XMM10: 000000000000dee8
	XMM11: 000000000000dee9
	XMM12: 000000000000deea
	XMM13: 000000000000deeb
	XMM14: 000000000000deec
	XMM15: 000000000000deed

	# XMM state after execve(), in the new task context:
	XMM0 : 0000000000000000
	XMM1 : 2f2f2f2f2f2f2f2f
	XMM2 : 0000000000000000
	XMM3 : 0000000000000000
	XMM4 : 00000000000000ff
	XMM5 : 00000000ff000000
	XMM6 : 000000000000dee4
	XMM7 : 000000000000dee5
	XMM8 : 0000000000000000
	XMM9 : 0000000000000000
	XMM10: 0000000000000000
	XMM11: 0000000000000000
	XMM12: 0000000000000000
	XMM13: 000000000000deeb
	XMM14: 000000000000deec
	XMM15: 000000000000deed

Better explain what this function is supposed to do and why.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:02 +02:00
Ingo Molnar
b1276c48e9 x86/fpu: Initialize fpregs in fpu__init_cpu_generic()
FPU fpregs do not get initialized during bootup on secondary CPUs,
on non-xsave capable CPUs.

For example on one of my systems, the secondary CPU has this FPU
state on bootup:

	x86: Booting SMP configuration:
	.... node  #0, CPUs:      #1
	x86/fpu ######################
	x86/fpu # FPU register dump on CPU#1:
	x86/fpu # ... CWD: ffff0040
	x86/fpu # ... SWD: ffff0000
	x86/fpu # ... TWD: ffff555a
	x86/fpu # ... FIP: 00000000
	x86/fpu # ... FCS: 00000000
	x86/fpu # ... FOO: 00000000
	x86/fpu # ... FOS: ffff0000
	x86/fpu # ... FP0: 02 57 00 00 00 00 00 00 ff ff
	x86/fpu # ... FP1: 1b e2 00 00 00 00 00 00 ff ff
	x86/fpu # ... FP2: 00 00 00 00 00 00 00 00 00 00
	x86/fpu # ... FP3: 00 00 00 00 00 00 00 00 00 00
	x86/fpu # ... FP4: 00 00 00 00 00 00 00 00 00 00
	x86/fpu # ... FP5: 00 00 00 00 00 00 00 00 00 00
	x86/fpu # ... FP6: 00 00 00 00 00 00 00 00 00 00
	x86/fpu # ... FP7: 00 00 00 00 00 00 00 00 00 00
	x86/fpu # ...  SW: dadadada
	x86/fpu ######################

Note how CWD and TWD are off their usual init state (0x037f and 0xffff),
and how FP0 and FP1 has non-zero content.

This is normally not a problem, because any user-space FPU state
is initalized properly - but it can complicate the use of FPU
instructions in kernel code via kernel_fpu_begin()/end(): if
the FPU using code does not initialize registers itself, it
might generate spurious exceptions depending on which CPU it
executes on.

Fix this by initializing the x87 state via the FNINIT instruction.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:02 +02:00
Ingo Molnar
3c6dffa93b x86/fpu: Rename user_has_fpu() to fpregs_active()
Rename this function in line with the new FPU nomenclature.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:02 +02:00
Ingo Molnar
be7436d519 x86/fpu: Clarify ancient comments in fpu__restore()
So this function still had ancient language about 'saving current
math information' - but we haven't been doing lazy FPU saves for
quite some time, we are doing lazy FPU restores.

Also remove IRQ13 related comment, which we don't support anymore
either.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:01 +02:00
Ingo Molnar
2a52af8b8a x86/fpu: Rename save_user_xstate() to copy_fpregs_to_sigframe()
Move the naming in line with existing names, so that we now have:

  copy_fpregs_to_fpstate()
  copy_fpstate_to_sigframe()
  copy_fpregs_to_sigframe()

... where each function does what its name suggests.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:01 +02:00
Ingo Molnar
c8e1404120 x86/fpu: Rename save_xstate_sig() to copy_fpstate_to_sigframe()
Standardize the naming of save_xstate_sig() by renaming it to
copy_fpstate_to_sigframe(): this tells us at a glance that
the function copies an FPU fpstate to a signal frame.

This naming also follows the naming of copy_fpregs_to_fpstate().

Don't put 'xstate' into the name: since this is a generic name,
it's expected that the function is able to handle xstate frames
as well, beyond legacy frames.

xstate used to be the odd case in the x86 FPU code - now it's the
common case.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:01 +02:00
Ingo Molnar
36e49e7f2e x86/fpu: Pass 'struct fpu' to fpstate_sanitize_xstate()
Currently fpstate_sanitize_xstate() has a task_struct input parameter,
but it only uses the fpu structure from it - so pass in a 'struct fpu'
pointer only and update all call sites.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:00 +02:00
Ingo Molnar
1ac91a767f x86/fpu: Simplify fpstate_sanitize_xstate() calls
Remove the extra layer of __fpstate_sanitize_xstate():

	if (!use_xsaveopt())
		return;
	__fpstate_sanitize_xstate(tsk);

and move the check for use_xsaveopt() into fpstate_sanitize_xstate().

In general we optimize for the presence of CPU features, not for
the absence of them. Furthermore there's little point in this inlining,
as the call sites are not super hot code paths.

Doing this uninlining shrinks the code a bit:

   text    data     bss     dec     hex filename
   14108751        2573624 1634304 18316679        1177d87 vmlinux.before
   14108627        2573624 1634304 18316555        1177d0b vmlinux.after

Also remove a pointless '!fx' check from fpstate_sanitize_xstate().

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:00 +02:00
Ingo Molnar
d090319312 x86/fpu: Rename sanitize_i387_state() to fpstate_sanitize_xstate()
So the sanitize_i387_state() function has the following purpose:
on CPUs that support optimized xstate saving instructions, an
FPU fpstate might end up having partially uninitialized data.

This function initializes that data.

Note that the function name is a misnomer and confusing on two levels,
not only is it not i387 specific at all, but it is the exact opposite:
it only matters on xstate CPUs.

So rename sanitize_i387_state() and __sanitize_i387_state() to
fpstate_sanitize_xstate() and __fpstate_sanitize_xstate(),
to clearly express the purpose and usage of the function.

We'll further clean up this function in the next patch.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:00 +02:00
Ingo Molnar
befc61ad3c x86/fpu: Move asm/xcr.h to asm/fpu/internal.h
Now that all FPU internals using drivers are converted to public APIs,
move xcr.h's definitions into fpu/internal.h and remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:00 +02:00
Ingo Molnar
33588b5222 x86/fpu: Simplify print_xstate_features()
We do a boot time printout of xfeatures in print_xstate_features(),
simplify this code to make use of the recently introduced cpu_has_xfeature()
method.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:55 +02:00
Ingo Molnar
5b07343034 x86/fpu: Introduce cpu_has_xfeatures(xfeatures_mask, feature_name)
A lot of FPU using driver code is querying complex CPU features to be
able to figure out whether a given set of xstate features is supported
by the CPU or not.

Introduce a simplified API function that can be used on any CPU type
to get this information. Also add an error string return pointer,
so that the driver can print a meaningful error message with a
standardized feature name.

Also mark xfeatures_mask as __read_only.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:55 +02:00
Ingo Molnar
6278485450 x86/fpu: Rename fpu/xsave.c to fpu/xstate.c
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:55 +02:00
Ingo Molnar
b16529004f x86/fpu: Optimize fpu_copy() some more on lazy switching systems
The current fpu_copy() code on lazy switching CPUs always saves
into the current fpstate and then copies it over into the child
context:

		preempt_disable();
		if (!copy_fpregs_to_fpstate(src_fpu))
			fpregs_deactivate(src_fpu);
		preempt_enable();
		memcpy(&dst_fpu->state, &src_fpu->state, xstate_size);

That memcpy() can be avoided on all lazy switching setups except
really old FNSAVE-only systems: change fpu_copy() to directly save
into the child context, for both the lazy and the eager context
switching case.

Note that we still have to do a memcpy() back into the parent
context in the FNSAVE case, but this won't be executed on the
majority of x86 systems that got built in the last 10 years or so.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:54 +02:00
Ingo Molnar
68271c6ae7 x86/fpu: Optimize fpu_copy()
Optimize fpu_copy() a bit by expanding the ->fpstate_active == 1
portion of fpu__save() into it.

( The main purpose of this change is to enable another, larger
  optimization that will be done in the next patch. )

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:54 +02:00
Ingo Molnar
48c4717f30 x86/fpu: Optimize fpu__save()
So fpu__save() does this currently:

		copy_fpregs_to_fpstate(fpu);
		if (!use_eager_fpu())
			fpregs_deactivate(fpu);

... which deactivates the FPU on lazy switching systems unconditionally.

Both usecases of fpu__save() use this function to save the
FPU state into a fpstate: fork()/clone() and math error signal handling.

The unconditional disabling of FPU registers in the lazy switching
case is probably a mistaken conversion of old FNSAVE code (that had
to disable FPU registers).

So speed up this code by only disabling FPU registers when absolutely
necessary: when indicated by the copy_fpregs_to_fpstate() return
code:

		if (!copy_fpregs_to_fpstate(fpu))
			fpregs_deactivate(fpu);

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:53 +02:00
Ingo Molnar
fea435a202 x86/fpu: Simplify fpu__save()
Factor out a common call.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:53 +02:00
Ingo Molnar
9f876d6766 x86/fpu: Eliminate __save_fpu()
The current implementation of __save_fpu():

	if (use_xsave()) {
		xsave_state(&fpu->state.xsave);
	} else {
		fpu_fxsave(fpu);
	}

Is actually a simplified version of copy_fpregs_to_fpstate(),
if use_eager_fpu() is true.

But all call sites of __save_fpu() call it only it when use_eager_fpu()
is true.

So we can eliminate __save_fpu() altogether and use the standard
copy_fpregs_to_fpstate() function. This cleans up the code
by making it use fewer variants of FPU register saving.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:53 +02:00
Ingo Molnar
72ee6f87ad x86/fpu: Simplify __save_fpu()
__save_fpu() has this pattern:

		if (unlikely(system_state == SYSTEM_BOOTING))
			xsave_state_booting(&fpu->state.xsave);
		else
			xsave_state(&fpu->state.xsave);

... but it does not actually get called during system bootup.

So remove the complication and always call xsave_state().

To make sure this assumption is correct, add a WARN_ONCE()
debug check to xsave_state().

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:53 +02:00
Ingo Molnar
32b49b3c83 x86/fpu: Factor out FPU hw activation/deactivation
We have repeat patterns of:

	if (!use_eager_fpu())
		clts();

... to activate FPU registers, and:

	if (!use_eager_fpu())
		stts();

... to deactivate them.

Encapsulate these in:

	__fpregs_activate_hw();
	__fpregs_activate_hw();

and use them accordingly.

Doing this synchronizes the idiom with the fpu->fpregs_active
software-flag's handling functions, creating clear patterns of:

	__fpregs_activate_hw();
	__fpregs_activate(fpu);

etc., which improves readability.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:52 +02:00
Ingo Molnar
67ee658e6f x86/fpu: Rename fpu__unlazy_stopped() to fpu__activate_stopped()
In line with the fpstate_activate() change, name
fpu__unlazy_stopped() in a similar fashion as well: its purpose
is to make the fpstate of a stopped task the current and active FPU
context, which may require unlazying and initialization.

The unlazying is just part of the job, the main concept is to make
the fpstate active.

Also clarify the function's description to clarify its exact
usage and the background behind it all.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:52 +02:00
Ingo Molnar
c4d72e2db3 x86/fpu: Simplify fpstate_init_curr() usage
Now that fpstate_init_curr() is not doing implicit allocations
anymore, almost all uses of it involve a very simple pattern:

	if (!fpu->fpstate_active)
		fpstate_init_curr(fpu);

which is basically activating the FPU fpstate if it was not active
before.

So propagate the check into the function itself, and rename the
function according to its new purpose:

	fpu__activate_curr(fpu);

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:51 +02:00
Ingo Molnar
2fb29fc7c6 x86/fpu: Simplify fpu__unlazy_stopped() error handling
Now that FPU contexts are always allocated, fpu__unlazy_stopped()
cannot fail. Remove its error return and propagate the changes to
the callers.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:51 +02:00
Ingo Molnar
e62bb3d894 x86/fpu: Rename fpstate_alloc_init() to fpstate_init_curr()
Now that there are no FPU context allocations, rename fpstate_alloc_init()
to fpstate_init_curr(), to signal that it initializes the fpstate and
marks it active, for the current task.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:50 +02:00
Ingo Molnar
91d93d0e20 x86/fpu: Remove failure return from fpstate_alloc_init()
Remove the failure code and propagate this down to callers.

Note that this function still has an 'init' aspect, which must be
called.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:50 +02:00
Ingo Molnar
c4d6ee6e2e x86/fpu: Remove failure paths from fpstate-alloc low level functions
Now that we always allocate the FPU context as part of task_struct there's
no need for separate allocations - remove them and their primary failure
handling code.

( Note that there's still secondary error codes that have become superfluous,
  those will be removed in separate patches. )

Move the somewhat misplaced setup_xstate_comp() call to the core.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:50 +02:00
Ingo Molnar
7366ed771f x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again)
So 6 years ago we made the FPU fpstate dynamically allocated:

  aa283f4927 ("x86, fpu: lazy allocation of FPU area - v5")
  61c4628b53 ("x86, fpu: split FPU state from task struct - v5")

In hindsight this was a mistake:

   - it complicated context allocation failure handling, such as:

		/* kthread execs. TODO: cleanup this horror. */
		if (WARN_ON(fpstate_alloc_init(fpu)))
			force_sig(SIGKILL, tsk);

   - it caused us to enable irqs in fpu__restore():

                local_irq_enable();
                /*
                 * does a slab alloc which can sleep
                 */
                if (fpstate_alloc_init(fpu)) {
                        /*
                         * ran out of memory!
                         */
                        do_group_exit(SIGKILL);
                        return;
                }
                local_irq_disable();

   - it (slightly) slowed down task creation/destruction by adding
     slab allocation/free pattens.

   - it made access to context contents (slightly) slower by adding
     one more pointer dereference.

The motivation for the dynamic allocation was two-fold:

   - reduce memory consumption by non-FPU tasks

   - allocate and handle only the necessary amount of context for
     various XSAVE processors that have varying hardware frame
     sizes.

These days, with glibc using SSE memcpy by default and GCC optimizing
for SSE/AVX by default, the scope of FPU using apps on an x86 system is
much larger than it was 6 years ago.

For example on a freshly installed Fedora 21 desktop system, with a
recent kernel, all non-kthread tasks have used the FPU shortly after
bootup.

Also, even modern embedded x86 CPUs try to support the latest vector
instruction set - so they'll too often use the larger xstate frame
sizes.

So remove the dynamic allocation complication by embedding the FPU
fpstate in task_struct again. This should make the FPU a lot more
accessible to all sorts of atomic contexts.

We could still optimize for the xstate frame size in the future,
by moving the state structure to the last element of task_struct,
and allocating only a part of that.

This change is kept minimal by still keeping the ctx_alloc()/free()
routines (that now do nothing substantial) - we'll remove them in
the following patches.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:49 +02:00
Ingo Molnar
4f83634710 x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate()
So fpu_save_init() is a historic name that got its name when the only
way the FPU state was FNSAVE, which cleared (well, destroyed) the FPU
state after saving it.

Nowadays the name is misleading, because ever since the introduction of
FXSAVE (and more modern FPU saving instructions) the 'we need to reload
the FPU state' part is only true if there's a pending FPU exception [*],
which is almost never the case.

So rename it to copy_fpregs_to_fpstate() to make it clear what's
happening. Also add a few comments about why we cannot keep registers
in certain cases.

Also clean up the control flow a bit, to make it more apparent when
we are dropping/keeping FP registers, and to optimize the common
case (of keeping fpregs) some more.

[*] Probably not true anymore, modern instructions always leave the FPU
    state intact, even if exceptions are pending: because pending FP
    exceptions are posted on the next FP instruction, not asynchronously.

    They were truly asynchronous back in the IRQ13 case, and we had to
    synchronize with them, but that code is not working anymore: we don't
    have IRQ13 mapped in the IDT anymore.

    But a cleanup patch is obviously not the place to change subtle behavior.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:49 +02:00
Ingo Molnar
910665882f x86/fpu: Uninline the irq_ts_save()/restore() functions
Especially the irq_ts_save() function is pretty bloaty, generating
over a dozen instructions, so uninline them.

Even though the API is used rarely, the space savings are measurable:

   text    data     bss     dec     hex filename
   13331995        2572920 1634304 17539219        10ba093 vmlinux.before
   13331739        2572920 1634304 17538963        10b9f93 vmlinux.after

( This also allows the removal of an include file inclusion from fpu/api.h,
  speeding up the kernel build slightly. )

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:48 +02:00
Ingo Molnar
952f07ecbd x86/fpu: Move various internal function prototypes to fpu/internal.h
There are a number of FPU internal function prototypes and an inline function
in fpu/api.h, mostly placed so historically as the code grew over the years.

Move them over into fpu/internal.h where they belong. (Add sched.h include
to stackprotector.h which incorrectly relied on getting it from fpu/api.h.)

fpu/api.h is now a pure file that only contains FPU APIs intended for driver
use.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:48 +02:00
Ingo Molnar
d63e79b114 x86/fpu: Uninline kernel_fpu_begin()/end()
Both inline functions call an inline function unconditionally, so we
already pay the function call based clobbering cost. Uninline them.

This saves quite a bit of code in various performance sensitive
code paths:

   text            data    bss     dec             hex     filename
   13321334        2569888 1634304 17525526        10b6b16 vmlinux.before
   13320246        2569888 1634304 17524438        10b66d6 vmlinux.after

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:48 +02:00
Ingo Molnar
ae02679c56 x86/fpu: Add more comments to the FPU init code
Extend the comments of the FPU init code, and fix old ones.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:47 +02:00
Ingo Molnar
41e78410d8 x86/fpu: Reorder init methods
Reorder init methods in order of their relationship and usage, to
form coherent blocks throughout the whole file.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:47 +02:00
Ingo Molnar
7638b74b56 x86/fpu: Rename fpstate_xstate_init_size() to fpu__init_system_xstate_size_legacy()
To bring it in line with the other init_system*() methods.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:47 +02:00
Ingo Molnar
c66e3f2823 x86/fpu: Remove the extra fpu__detect() layer
Now that fpu__detect() has become an empty layer around
fpu__init_system(), eliminate it and make fpu__init_system()
the main system initialization routine.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:46 +02:00
Ingo Molnar
dd863880ac x86/fpu: Move fpu__init_system_early_generic() out of fpu__detect()
Move the fpu__init_system_early_generic() call into fpu__init_system(),
which hosts all the system init calls.

Expose fpu__init_system() to other modules - this will be our main and only
system init function.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:46 +02:00
Ingo Molnar
71eb3c6d15 x86/fpu: Make check_fpu() init ordering independent
check_fpu() currently relies on being called early in the init sequence,
when CR0::TS has not been set up yet.

Save/restore CR0::TS across this function, to make it invariant to
init ordering. This way we'll be able to move the generic FPU setup
routines earlier in the init sequence.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:46 +02:00
Ingo Molnar
0bf23f3d6c x86/fpu: Factor out FPU bug checks into fpu/bugs.c
Create separate fpu/bugs.c code so that if we read generic FPU code
we don't have to wade through all the bugcheck related code first.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:45 +02:00
Ingo Molnar
e83ab9ad97 x86/fpu: Move !FPU check ingo fpu__init_system_early_generic()
There's a !FPU related sanity check in fpu__init_cpu_generic(),
which is executed on every CPU onlining - even though we should do
this only once, and during system init.

Move this check to fpu__init_system_early_generic().

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:45 +02:00
Ingo Molnar
2e2f3da771 x86/fpu: Factor out fpu__init_system_early_generic()
Move the generic bits of fpu__detect() into fpu__init_system_early_generic().

We'll move some other code here too in a followup patch.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:45 +02:00
Ingo Molnar
7218e8b723 x86/fpu: Factor out fpu__init_system_generic()
Factor out the generic bits from fpu__init_system().

Rename mxcsr_feature_mask_init() to fpu__init_system_mxcsr()
to bring it in line with the rest of the nomenclature.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:45 +02:00
Ingo Molnar
b11316ed9e x86/fpu: Factor out fpu__init_cpu_generic()
Factor out the generic bits from fpu__init_cpu(), to create
a flat sequence of per CPU initialization function calls:

	fpu__init_cpu_generic();
	fpu__init_cpu_xstate();
	fpu__init_cpu_ctx_switch();

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:44 +02:00
Ingo Molnar
21c4cd108a x86/fpu: Simplify fpu__cpu_init()
After the latest round of cleanups, fpu__cpu_init() has become
a simple call to fpu__init_cpu().

Rename fpu__init_cpu() to fpu__cpu_init() and remove the
extra layer.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:44 +02:00
Ingo Molnar
7202ab46f7 x86/fpu: Remove fpu__init_cpu_ctx_switch() call from fpu__init_system()
We are now doing the fpu__init_cpu_ctx_switch() call from fpu__init_cpu(),
so there's no need to call it from fpu__init_system() anymore.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:44 +02:00
Ingo Molnar
067051ccd2 x86/fpu: Do system-wide setup from fpu__detect()
fpu__cpu_init() is called on every CPU, so it is the wrong place
to call fpu__init_system() from. Call it from fpu__detect():
this is early CPU init code, but we already have CPU features detected,
so we can call the system-wide FPU init code from here.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:43 +02:00
Ingo Molnar
3960fccf2e x86/fpu: Call fpu__init_cpu_ctx_switch() from fpu__init_cpu()
fpu__init_cpu() is currently called from fpu__init_system(),
which is the wrong place for it: call it from the proper high level
per CPU init function, fpu__init_cpu().

Note, we still keep the old call site as well, because it depends
on having proper CR0::TS setup. We'll fix this in the next patch.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:43 +02:00
Ingo Molnar
997578b14c x86/fpu: Move the fpstate_xstate_init_size() call into fpu__init_system()
The fpstate_xstate_init_size() function sets up a basic xstate_size, called
during fpu__detect() currently.

Its real dependency is to be called before fpu__init_system_xstate().

So move the function call site into fpu__init_system(), to right before the
fpu__init_system_xstate() call.

Also add a once-per-boot flag to fpstate_xstate_init_size(), we'll remove
this quirk later once we've cleaned up the init dependencies.

This moves the two related functions closer to each other and makes them
both part of the _init_system() functionality.

Currently we do the fpstate_xstate_init_size()
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:43 +02:00
Ingo Molnar
530b37e43c x86/fpu: Do CLTS fpu__init_system()
mxcsr_feature_mask_init() depends on TS being cleared, as it executes
an FXSAVE instruction.

After later changes we will move the TS setup into fpu__init_cpu(),
which will interact with this - so clear the TS flag explicitly.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:43 +02:00
Ingo Molnar
011545b570 x86/fpu: Split fpu__ctx_switch_init() into _cpu() and _system() portions
So fpu__ctx_switch_init() has two aspects: a once per bootup functionality
that sets up a capability flag, and a per CPU functionality that sets CR0::TS.

Split the function.

Note that at this stage we still have duplicate calls into these methods, as
both the _system() and the _cpu() methods are run on all CPUs, with lower
level on_boot_cpu flags filtering out the duplicates where needed. So add
TS flag clearing as well, to handle the aftermath of early CPU init sequences
that might call in without having eager-fpu set - don't assume the TS flag
is cleared.

Calling each from its respective init level will happen later on.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:42 +02:00
Ingo Molnar
064e51e3c8 x86/fpu: Clean up eager_fpu_init() and rename it to fpu__ctx_switch_init()
It's not an xsave specific function anymore, so rename it accordingly
and also clean it up a bit:

 - remove the obsolete __init_refok, as the code paths are not
   mixed anymore

 - rename it from eager_fpu_init() to fpu__ctx_switch_init()

 - remove stray 'return;'

 - make it static to its only user

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:42 +02:00
Ingo Molnar
6f5d265aff x86/fpu: Move eager_fpu_init() to fpu/init.c
Move eager_fpu_init() and the 'eagerfpu' boot parameter handling function
to the generic FPU init file: it's generic FPU functionality.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:42 +02:00
Ingo Molnar
89abbe01a4 x86/fpu: Move all eager-fpu setup code to eager_fpu_init()
The FPU context switch type (lazy or eager) setup code is split into
two places currently - move it all to eager_fpu_init().

Note that the code we move will now be executed on non-xstate CPUs
as well, but this should be safe: both xfeatures_mask and
cpu_has_xsaveopt is 0 there.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:41 +02:00
Ingo Molnar
a5cb56e9a6 x86/fpu: Remove setup_init_fpu_buf() call from eager_fpu_init()
It's a pure xstate method now, no need for this duplicate call.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:41 +02:00
Ingo Molnar
2507e1c03f x86/fpu: Set up the legacy FPU init image from fpu__init_system()
The legacy FPU init image is used on older CPUs who don't run xstate init.
But the init code is called within setup_init_fpu_buf(), an xstate method.

Move this legacy init out of the xstate code and put it into fpu/init.c.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:41 +02:00
Ingo Molnar
429ced50a0 x86/fpu: Do fpu__init_system_xstate only from fpu__init_system()
Only call xstate system setup routines from fpu__init_system().

Likewise, don't call fpu__init_cpu_xstate() from fpu__init_system().

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:41 +02:00
Ingo Molnar
c42103b226 x86/fpu: Remove xsave_init()
Expand fpu__init_system_xstate() and fpu__init_cpu_xstate() calls
into xsave_init() calls.

(This will allow us to call the proper versions in higher level FPU init code
later on.)

No change in functionality.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:40 +02:00
Ingo Molnar
62db6871ae x86/fpu: Propagate once per boot quirk into fpu__init_system_xstate()
Linearize the call sequence in xsave_init():

	fpu__init_system_xstate();
	fpu__init_cpu_xstate();

We do this by propagating the boot-once quirk into
fpu__init_system_xstate(). fpu__init_cpu_xstate() is
safe to be called multiple time.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:40 +02:00
Ingo Molnar
e9dbfd673a x86/fpu: Move legacy check to fpu__init_system_xstate()
Now that legacy code can execute fpu__init_cpu_xstate() in
xsave_init(), we can move the once per boot legacy check into
fpu__init_system_xstate(), where it belongs.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:40 +02:00
Ingo Molnar
e84611fc96 x86/fpu: Move CPU capability check into fpu__init_cpu_xstate()
fpu__init_system_xstate() does an FPU capability check that is better
done in fpu__init_cpu_xstate(). This will allow us to call
fpu__init_cpu_xstate() directly on legacy CPUs as well.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:39 +02:00
Ingo Molnar
55cc4678b7 x86/fpu: Make the system/cpu init distinction clear in the xstate code as well
Rename existing xstate init functions along the system/cpu init principles:

	fpu__init_system_xstate(): called once per system bootup
	fpu__init_cpu_xstate():    called per CPU onlining

Also make the fpu__init_cpu_xstate() early code invariant:
if xfeatures_mask is not set yet then don't crash but return.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:39 +02:00
Ingo Molnar
e35f6f1414 x86/fpu: Split fpu__cpu_init() into early-boot and cpu-boot parts
There are two kinds of FPU initialization sequences necessary to bring FPU
functionality up: once per system bootup activities, such as detection,
feature initialization, etc. of attributes that are shared by all CPUs
in the system - and per cpu initialization sequences run when a CPU is
brought online (either during bootup or during CPU hotplug onlining),
such as CR0/CR4 register setting, etc.

The FPU code is mixing these roles together, with no clear distinction.

Start sorting this out by splitting the main FPU detection routine
(fpu__cpu_init()) into two parts: fpu__init_system() for
one per system init activities, and fpu__init_cpu() for the
per CPU onlining init activities.

Note that xstate_init() is called from both variants for the time being,
because it has a dual nature as well. We'll fix that in upcoming patches.

Just do the split and call it as we used to before, don't introduce any
change in initialization behavior yet, beyond duplicate (and harmless)
fpu__init_cpu() and xstate_init() calls - which we'll fix in later
patches.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:39 +02:00
Ingo Molnar
3e5e126774 x86/fpu: Remove 'init_xstate_buf' bootmem allocation
Make init_xstate_buf allocated statically at build time.

This structure's maximum size is around 1KB - and it's allocated even on
most modern embedded x86 CPUs which strive for FPU instruction set parity
with desktop and server CPUs, so it's not like we can save much on smaller
systems.

This removes the last bootmem allocation from the FPU init path, allowing
it to be called earlier in the boot sequence.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:39 +02:00
Ingo Molnar
26b1f5d05a x86/fpu: Make setup_init_fpu_buf() run-once explicitly
Remove the dependency on the init_xstate_buf == NULL check to
implement once-per-bootup logic in eager_fpu_init(), by making
setup_init_fpu_buf() run once per bootup explicitly.

This is in preparation to make init_xstate_buf statically
allocated.

The various boot-once quirks in the FPU init code will be removed
in a later cleanup stage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:38 +02:00
Ingo Molnar
966ece619e x86/fpu: Remove xsave_init() bootmem allocations
There's only 8 xstate bits at the moment, and it's not like we
can support unknown bits - so put xstate_offsets[] and
xstate_sizes[] into static allocation.

This is in preparation to be able to call the FPU init code
earlier, when there's no bootmem available yet.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:38 +02:00
Ingo Molnar
6a13320758 x86/fpu: Remove fpstate_xstate_init_size() boot quirk
fpstate_xstate_init_size() is called in fpu__cpu_init(), which is
run on every CPU, every time they are brought online.

But we want to call fpstate_xstate_init_size() only once. Move it to
fpu__detect(), which only runs once, on the boot CPU.

Also clean up the flow of fpstate_xstate_init_size() a bit, by
removing a 'return' from the middle of the function.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:38 +02:00
Ingo Molnar
66af8e2764 x86/fpu: Rename __thread_fpu_end() to fpregs_deactivate()
Propagate the 'fpu->fpregs_active' naming to the high level function that
clears it.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:37 +02:00
Ingo Molnar
232f62cdd7 x86/fpu: Rename __thread_fpu_begin() to fpregs_activate()
Propagate the 'fpu->fpregs_active' naming to the high level
function that sets it.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:37 +02:00
Ingo Molnar
d5cea9b0af x86/fpu: Rename fpu->has_fpu to fpu->fpregs_active
So the current code uses fpu->has_cpu to determine whether a given
user FPU context is actively loaded into the FPU's registers [*] and
that those registers represent the task's current FPU state.

But this term is not unambiguous: especially the distinction between
fpu->has_fpu, PF_USED_MATH and fpu_fpregs_owner_ctx is not clear.

Increase clarity by unambigously signalling that it's about
hardware registers being active right now, by renaming it to
fpu->fpregs_active.

( In later patches we'll use more of the 'fpregs' naming, which will
  make it easier to grep for as well. )

[*] There's the kernel_fpu_begin()/end() primitive that also
    activates FPU hw registers as well and uses them, without
    touching the fpu->fpregs_active flag.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:36 +02:00
Ingo Molnar
73a3aeb3ac x86/fpu: Improve the __sanitize_i387_state() documentation
Improve the comments and add new ones, as this code isn't very obvious.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:36 +02:00
Ingo Molnar
678eaf6034 x86/fpu: Rename regset FPU register accessors
Rename regset accessors to prefix them with 'regset_', because we
want to start using the 'fpregs_active' name elsewhere.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:35 +02:00
Ingo Molnar
91a8c2a5b4 x86/fpu: Clean up and fix MXCSR handling
The code has the following problems:

 - it uses a single global 'fx_scratch' area that multiple CPUs could
   write into simultaneously, in theory.

 - it wastes 512 bytes of .data for something that is only rarely used.

Fix this by moving the state buffer to the stack. Note that while
this is 512 bytes, we don't ever call this function in very deep
callchains, so its stack usage should not be a problem.

Also add comments to explain the magic 0x0000ffbf default value.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:35 +02:00
Ingo Molnar
400e4b2091 x86/fpu: Rename xsave.header::xstate_bv to 'xfeatures'
'xsave.header::xstate_bv' is a misnomer - what does 'bv' stand for?

It probably comes from the 'XGETBV' instruction name, but I could
not find in the Intel documentation where that abbreviation comes
from. It could mean 'bit vector' - or something else?

But how about - instead of guessing about a weird name - we named
the field in an obvious and descriptive way that tells us exactly
what it does?

So rename it to 'xfeatures', which is a bitmask of the
xfeatures that are fpstate_active in that context structure.

Eyesore like:

           fpu->state->xsave.xsave_hdr.xstate_bv |= XSTATE_FP;

is now much more readable:

           fpu->state->xsave.header.xfeatures |= XSTATE_FP;

Which form is not just infinitely more readable, but is also
shorter as well.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:35 +02:00
Ingo Molnar
3a54450b5e x86/fpu: Rename 'xsave_hdr' to 'header'
Code like:

           fpu->state->xsave.xsave_hdr.xstate_bv |= XSTATE_FP;

is an eyesore, because not only is the words 'xsave' and 'state'
are repeated twice times (!), but also because of the 'hdr' and 'bv'
abbreviations that are pretty meaningless at a first glance.

Start cleaning this up by renaming 'xsave_hdr' to 'header'.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:34 +02:00
Ingo Molnar
8dcea8db79 x86/fpu: Clean up regset functions
Clean up various regset handlers: use the 'fpu' pointer which
is available in most cases.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:34 +02:00
Ingo Molnar
9254aaa0fe x86/fpu: Move XCR0 manipulation to the FPU code proper
The suspend code accesses FPU state internals, add a helper for
it and isolate it.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:33 +02:00
Ingo Molnar
84246fe4e3 x86/fpu: Rename 'xstate_features' to 'xfeatures_nr'
The name 'xstate_features' does not tell us whether it's a bitmap
or any other value. That it's a count of features is only obvious
if you read the code that calculates it.

Rename it to the more descriptive 'xfeatures_nr' name.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:33 +02:00
Ingo Molnar
614df7fb8a x86/fpu: Rename 'pcntxt_mask' to 'xfeatures_mask'
So the 'pcntxt_mask' is a misnomer, it's essentially meaningless to anyone
who doesn't know what it does exactly.

Name it more descriptively as 'xfeatures_mask'.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:33 +02:00
Ingo Molnar
69496e10f8 x86/fpu: Print supported xstate features in human readable way
Inform the user/admin about which xstate features the kernel supports.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:32 +02:00
Ingo Molnar
32d4d9ccb0 x86/fpu: Improve FPU detection kernel messages
Standardize the various boot time messages printed during FPU detection:

 - Use a common 'x86/fpu: ' prefix for consistency and to make it easy
   to grep boot logs for FPU related messages

 - Correct speling errors

 - Add printout for the legacy FPU case as well

 - Clarify messages

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:32 +02:00
Ingo Molnar
c0841e34fd x86/fpu: Remove xsave_init() __init obfuscation
So this code surprised me - and being surprised when reading FPU code
does not help maintainability of an already overly complex subsystem.

Remove the obfuscation and just don't use __init annotation for now.
Anyone who wants to free these ~600 bytes of xstate_enable_boot_cpu()
should implement it cleanly.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:31 +02:00