Commit Graph

10905 Commits

Author SHA1 Message Date
Alexander Shishkin
a82d24edfe perf/x86/intel/pt: Remove redundant variable declaration
There is a 'pt' variable in the outer scope of pt_event_stop() with the same
type, we don't really need another one in the inner scope.

This patch removes the redundant variable declaration.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1432308626-18845-8-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:48 +02:00
Alexander Shishkin
0a487aad2d perf/x86/intel/pt: Kill pt_is_running()
Initially, we were trying to guard against scenarios where somebody
attaches to the system with a hardware debugger while PT is enabled
from software and pt_is_running() tries to make sure we handle this
better, but the truth is, there is still a race window no matter what
and people with hardware debuggers should really know what they are
doing anyway.

In other words, there is no point in keeping this one around, and
it's one RDMSR instructions fewer in the fast path.

The case when PT is enabled by the BIOS at boot time is handled
in the driver initialization path and doesn't use pt_is_running().

This patch gets rid of it.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1429622177-22843-6-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:48 +02:00
Alexander Shishkin
5b1dbd17c0 perf/x86/intel/pt: Document pt_buffer_reset_offsets()
Currently, the description of pt_buffer_reset_offsets() lacks information
about its calling constraints and ordering with regards to other buffer
management functions.

Add a clarification about when this function has to be called.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1429622177-22843-5-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:47 +02:00
Alexander Shishkin
cf302bfdf3 perf/x86/intel/pt: Document pt_buffer_reset_markers()
The comments in the driver don't make it absolutely clear as to what
exactly is the calling order and other possible constraints of buffer
management functions.

Document constraints and calling order for the buffer configuration
functions. While at it, replace a redundant check in
pt_buffer_reset_markers() with an explanation why it is not needed.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1429622177-22843-4-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:47 +02:00
Alexander Shishkin
74387bcb71 perf/x86/intel/pt: Kill an unused variable
Currently, there's a set-but-not-used variable in setup_topa_index();
this patch gets rid of it. And while at it, fixes a style issue with
brackets around a one-line block.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1429622177-22843-2-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:46 +02:00
Peter Zijlstra
ba040653b4 perf/x86/intel: Simplify put_exclusive_constraints()
Don't bother with taking locks if we're not actually going to do
anything. Also, drop the _irqsave(), this is very much only called
from IRQ-disabled context.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:46 +02:00
Peter Zijlstra
8736e548db perf/x86: Simplify the x86_schedule_events() logic
!x && y == ! (x || !y)

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:45 +02:00
Peter Zijlstra
43ef205bde perf/x86/intel: Remove intel_excl_states::init_state
For some obscure reason intel_{start,stop}_scheduling() copy the HT
state to an intermediate array. This would make sense if we ever were
to make changes to it which we'd have to discard.

Except we don't. By the time we call intel_commit_scheduling() we're;
as the name implies; committed to them. We'll never back out.

A further hint its pointless is that stop_scheduling() unconditionally
publishes the state.

So the intermediate array is pointless, modify the state in place and
kill the extra array.

And remove the pointless array initialization: INTEL_EXCL_UNUSED == 0.

Note; all is serialized by intel_excl_cntr::lock.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:45 +02:00
Peter Zijlstra
1fe684e349 perf/x86/intel: Remove pointless tests
Both intel_commit_scheduling() and intel_get_excl_contraints() test
for cntr < 0.

The only way that can happen (aside from a bug) is through
validate_event(), however that is already captured by the
cpuc->is_fake test.

So remove these test and simplify the code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:44 +02:00
Peter Zijlstra
0c41e756b9 perf/x86/intel: Clean up intel_commit_scheduling() placement
Move the code of intel_commit_scheduling() to the right place, which is
in between start() and stop().

No change in functionality.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:44 +02:00
Peter Zijlstra
17186ccda3 perf/x86/intel: Make WARN()ings consistent
The intel_commit_scheduling() callback is pointlessly different from
the start and stop scheduling callback.

Furthermore, the constraint should never be NULL, so remove that test.

Even though we'll never get called (because we NULL the callbacks)
when !is_ht_workaround_enabled() put that test in.

Collapse the (pointless) WARN_ON_ONCE() and bail on !cpuc->excl_cntrs --
this is doubly pointless, because its the same condition as
is_ht_workaround_enabled() which was already pointless because the
whole method won't ever be called.

Furthremore, make all the !excl_cntrs test WARN_ON_ONCE(); they're all
pointless, because the above, either the function
({get,put}_excl_constraint) are already predicated on it existing or
the is_ht_workaround_enabled() thing is the same test.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:43 +02:00
Peter Zijlstra
aaf932e816 perf/x86/intel: Simplify the dynamic constraint code somewhat
We have two 'struct event_constraint' local variables in
intel_get_excl_constraints(): 'cx' and 'c'.

Instead of using 'cx' after the dynamic allocation, put all 'cx' inside
the dynamic allocation block and use 'c' outside of it.

Also use direct assignment to copy the structure; let the compiler
figure it out.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:43 +02:00
Peter Zijlstra
b32ed7f5de perf/x86/intel: Add lockdep assert
Lockdep is very good at finding incorrect IRQ state while locking and
is far better at telling us if we hold a lock than the _is_locked()
API. It also generates less code for !DEBUG kernels.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:42 +02:00
Peter Zijlstra
1c565833ac perf/x86/intel: Correct local vs remote sibling state
For some obscure reason the current code accounts the current SMT
thread's state on the remote thread and reads the remote's state on
the local SMT thread.

While internally consistent, and 'correct' its pointless confusion we
can do without.

Flip them the right way around.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:42 +02:00
Matt Fleming
adafa99960 perf/x86/intel/cqm: Use 'u32' data type for RMIDs
Since we write RMID values to MSRs the correct type to use is 'u32'
because that clearly articulates we're writing a hardware register
value.

Fix up all uses of RMID in this code to consistently use the correct data
type.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/1432285182-17180-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:41 +02:00
Thomas Gleixner
bf926731e1 perf/x86/intel/cqm: Add storage for 'closid' and clean up 'struct intel_pqr_state'
'closid' (CLass Of Service ID) is used for the Class based Cache
Allocation Technology (CAT). Add explicit storage to the per cpu cache
for it, so it can be used later with the CAT support (requires to move
the per cpu data).

While at it:

 - Rename the structure to intel_pqr_state which reflects the actual
   purpose of the struct: cache values which go into the PQR MSR

 - Rename 'cnt' to rmid_usecnt which reflects the actual purpose of
   the counter.

 - Document the structure and the struct members.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235150.240899319@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:41 +02:00
Thomas Gleixner
43d0c2f6dc perf/x86/intel/cqm: Remove useless wrapper function
intel_cqm_event_del() is a 1:1 wrapper for intel_cqm_event_stop().
Remove the useless indirection.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235150.159779847@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:40 +02:00
Thomas Gleixner
0bac237845 perf/x86/intel/cqm: Avoid pointless MSR write
If the usage counter is non-zero there is no point to update the rmid
in the PQR MSR.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235150.080844281@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:40 +02:00
Thomas Gleixner
9e7eaac95a perf/x86/intel/cqm: Remove pointless spinlock from state cache
'struct intel_cqm_state' is a strict per CPU cache of the rmid and the
usage counter. It can never be modified from a remote CPU.

The three functions which modify the content: intel_cqm_event[start|stop|del]
(del maps to stop) are called from the perf core with interrupts disabled
which is enough protection for the per CPU state values.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235150.001006529@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:39 +02:00
Thomas Gleixner
b3df4ec442 perf/x86/intel/cqm: Use proper data types
'int' is really not a proper data type for an MSR. Use u32 to make it
clear that we are dealing with a 32-bit unsigned hardware value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235149.919350144@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:39 +02:00
Thomas Gleixner
f4d9757ca6 perf/x86/intel/cqm: Document PQR MSR abuse
The CQM code acts like it owns the PQR MSR completely. That's not true
because only the lower 10 bits are used for CQM. The upper 32 bits are
used for the 'CLass Of Service ID' (CLOSID). Document the abuse. Will be
fixed in a later patch.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Will Auld <will.auld@intel.com>
Link: http://lkml.kernel.org/r/20150518235149.823214798@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:38 +02:00
Ingo Molnar
8d12ded3dd Merge branch 'perf/urgent' into perf/core, before applying dependent patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:17:21 +02:00
Don Zickus
68ab747604 perf/x86: Tweak broken BIOS rules during check_hw_exists()
I stumbled upon an AMD box that had the BIOS using a hardware performance
counter. Instead of printing out a warning and continuing, it failed and
blocked further perf counter usage.

Looking through the history, I found this commit:

  a5ebe0ba3d ("perf/x86: Check all MSRs before passing hw check")

which tweaked the rules for a Xen guest on an almost identical box and now
changed the behaviour.

Unfortunately the rules were tweaked incorrectly and will always lead to
MSR failures even though the MSRs are completely fine.

What happens now is in arch/x86/kernel/cpu/perf_event.c::check_hw_exists():

<snip>
        for (i = 0; i < x86_pmu.num_counters; i++) {
                reg = x86_pmu_config_addr(i);
                ret = rdmsrl_safe(reg, &val);
                if (ret)
                        goto msr_fail;
                if (val & ARCH_PERFMON_EVENTSEL_ENABLE) {
                        bios_fail = 1;
                        val_fail = val;
                        reg_fail = reg;
                }
        }

<snip>
        /*
         * Read the current value, change it and read it back to see if it
         * matches, this is needed to detect certain hardware emulators
         * (qemu/kvm) that don't trap on the MSR access and always return 0s.
         */
        reg = x86_pmu_event_addr(0);
				^^^^

if the first perf counter is enabled, then this routine will always fail
because the counter is running. :-(

        if (rdmsrl_safe(reg, &val))
                goto msr_fail;
        val ^= 0xffffUL;
        ret = wrmsrl_safe(reg, val);
        ret |= rdmsrl_safe(reg, &val_new);
        if (ret || val != val_new)
                goto msr_fail;

The above bios_fail used to be a 'goto' which is why it worked in the past.

Further, most vendors have migrated to using fixed counters to hide their
evilness hence this problem rarely shows up now days except on a few old boxes.

I fixed my problem and kept the spirit of the original Xen fix, by recording a
safe non-enable register to be used safely for the reading/writing check.
Because it is not enabled, this passes on bare metal boxes (like metal), but
should continue to throw an msr_fail on Xen guests because the register isn't
emulated yet.

Now I get a proper bios_fail error message and Xen should still see their
msr_fail message (untested).

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: george.dunlap@eu.citrix.com
Cc: konrad.wilk@oracle.com
Link: http://lkml.kernel.org/r/1431976608-56970-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:16:20 +02:00
Alexander Shishkin
f73ec48c90 perf/x86/intel/pt: Untangle pt_buffer_reset_markers()
Currently, pt_buffer_reset_markers() is a difficult to read knot of
arithmetics with a redundant check for multiple-entry TOPA capability,
a commented out wakeup marker placement and a logical error wrt to
stop marker placement. The latter happens when write head is not page
aligned and results in stop marker being placed one page earlier than
it actually should.

All these problems only affect PT implementations that support
multiple-entry TOPA tables (read: proper scatter-gather).

For single-entry TOPA implementations, there is no functional impact.

This patch deals with all of the above. Tested on both single-entry
and multiple-entry TOPA PT implementations.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1432308626-18845-4-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:16:20 +02:00
Peter Zijlstra
cc1790cf54 perf/x86: Improve HT workaround GP counter constraint
The (SNB/IVB/HSW) HT bug only affects events that can be programmed
onto GP counters, therefore we should only limit the number of GP
counters that can be used per cpu -- iow we should not constrain the
FP counters.

Furthermore, we should only enfore such a limit when there are in fact
exclusive events being scheduled on either sibling.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ Fixed build fail for the !CONFIG_CPU_SUP_INTEL case. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 09:16:03 +02:00
Peter Zijlstra
b371b59431 perf/x86: Fix event/group validation
Commit 43b4578071 ("perf/x86: Reduce stack usage of
x86_schedule_events()") violated the rule that 'fake' scheduling; as
used for event/group validation; should not change the event state.

This went mostly un-noticed because repeated calls of
x86_pmu::get_event_constraints() would give the same result. And
x86_pmu::put_event_constraints() would mostly not do anything.

Commit e979121b1b ("perf/x86/intel: Implement cross-HT corruption
bug workaround") made the situation much worse by actually setting the
event->hw.constraint value to NULL, so when validation and actual
scheduling interact we get NULL ptr derefs.

Fix it by removing the constraint pointer from the event and move it
back to an array, this time in cpuc instead of on the stack.

validate_group()
  x86_schedule_events()
    event->hw.constraint = c; # store

      <context switch>
        perf_task_event_sched_in()
          ...
            x86_schedule_events();
              event->hw.constraint = c2; # store

              ...

              put_event_constraints(event); # assume failure to schedule
                intel_put_event_constraints()
                  event->hw.constraint = NULL;

      <context switch end>

    c = event->hw.constraint; # read -> NULL

    if (!test_bit(hwc->idx, c->idxmsk)) # <- *BOOM* NULL deref

This in particular is possible when the event in question is a
cpu-wide event and group-leader, where the validate_group() tries to
add an event to the group.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 43b4578071 ("perf/x86: Reduce stack usage of x86_schedule_events()")
Fixes: e979121b1b ("perf/x86/intel: Implement cross-HT corruption bug workaround")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 08:46:44 +02:00
Stephane Eranian
a41f3c8cd4 perf/x86/intel/uncore: Add Broadwell-U uncore IMC PMU support
This patch enables the uncore Memory Controller (IMC) PMU
support for Intel Broadwell-U (Model 61) mobile processors.
The IMC PMU enables measuring memory bandwidth.

To use with perf:
$ perf stat -a -I 1000 -e
uncore_imc/data_reads/,uncore_imc/data_writes/ sleep 10

Tested-by: Sonny Rao <sonnyrao@chromium.org>
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kan.liang@intel.com
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/20150423065642.GA4890@thinkpad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-11 11:57:47 +02:00
Stephane Eranian
44b11fee51 perf/x86/rapl: Enable Broadwell-U RAPL support
This patch enables RAPL counters (energy consumption counters)
support for Intel Broadwell-U processors (Model 61):

To use:

  $ perf stat -a -I 1000 -e power/energy-cores/,power/energy-pkg/,power/energy-ram/ sleep 10

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jacob.jun.pan@linux.intel.com
Cc: kan.liang@intel.com
Cc: peterz@infradead.org
Cc: sonnyrao@chromium.org
Link: http://lkml.kernel.org/r/20150423070709.GA4970@thinkpad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-11 11:52:30 +02:00
Kan Liang
6d37405635 perf/x86/intel: Fix SLM cache event list
iTLB-load-misses and LLC-load-misses count incorrectly on SLM.

There is no ITLB.MISSES support on SLM. Event PAGE_WALKS.I_SIDE_WALK
should be used to count iTLB-load-misses. This event counts when an
instruction (I) page walk is completed or started. Since a page walk
implies a TLB miss, the number of TLB misses can be counted by counting
the number of pagewalks.

DMND_DATA_RD counts both demand and DCU prefetch data reads. However,
LLC-load-misses should only count demand reads. There is no way to not
include prefetches with a single counter on SLM. So the LLC-load-misses
support should be removed on SLM.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1429608881-5055-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 11:59:41 +02:00
Linus Torvalds
0e1dc42748 xen: bug fixes for 4.1-rc2
- Fix blkback regression if using persistent grants.
 - Fix various event channel related suspend/resume bugs.
 - Fix AMD x86 regression with X86_BUG_SYSRET_SS_ATTRS.
 - SWIOTLB on ARM now uses frames <4 GiB (if available) so device only
   capable of 32-bit DMA work.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJVSiC1AAoJEFxbo/MsZsTRojgH/1zWPD0r5WMAEPb6DFdb7Ga1
 SqBbyHFu43axNwZ7EvUzSqI8BKDPbTnScQ3+zC6Zy1SIEfS+40+vn7kY/uASmWtK
 LYaYu8nd49OZP8ykH0HEvsJ2LXKnAwqAwvVbEigG7KJA7h8wXo7aDwdwxtZmHlFP
 18xRTfHcrnINtAJpjVRmIGZsCMXhXQz4bm0HwsXTTX0qUcRWtxydKDlMPTVFyWR8
 wQ2m5+76fQ8KlFsoJEB0M9ygFdheZBF4FxBGHRrWXBUOhHrQITnH+cf1aMVxTkvy
 NDwiEebwXUDHacv21onszoOkNjReLsx+DWp9eHknlT/fgPo6tweMM2yazFGm+JQ=
 =W683
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.1b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:

 - fix blkback regression if using persistent grants

 - fix various event channel related suspend/resume bugs

 - fix AMD x86 regression with X86_BUG_SYSRET_SS_ATTRS

 - SWIOTLB on ARM now uses frames <4 GiB (if available) so device only
   capable of 32-bit DMA work.

* tag 'for-linus-4.1b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM
  hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests
  xen/events: Set irq_info->evtchn before binding the channel to CPU in __startup_pirq()
  xen/console: Update console event channel on resume
  xen/xenbus: Update xenbus event channel on resume
  xen/events: Clear cpu_evtchn_mask before resuming
  xen-pciback: Add name prefix to global 'permissive' variable
  xen: Suspend ticks on all CPUs during suspend
  xen/grant: introduce func gnttab_unmap_refs_sync()
  xen/blkback: safely unmap purge persistent grants
2015-05-06 15:58:06 -07:00
Linus Torvalds
3d54ac9e35 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "EFI fixes, and FPU fix, a ticket spinlock boundary condition fix and
  two build fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Always restore_xinit_state() when use_eager_cpu()
  x86: Make cpu_tss available to external modules
  efi: Fix error handling in add_sysfs_runtime_map_entry()
  x86/spinlocks: Fix regression in spinlock contention detection
  x86/mm: Clean up types in xlate_dev_mem_ptr()
  x86/efi: Store upper bits of command line buffer address in ext_cmd_line_ptr
  efivarfs: Ensure VariableName is NUL-terminated
2015-05-06 10:57:37 -07:00
Linus Torvalds
d8fce2db72 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Mostly tooling fixes, but also an uncore PMU driver fix and an uncore
  PMU driver hardware-enablement addition"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf probe: Fix segfault if passed with ''.
  perf report: Fix -T/--threads option to work again
  perf bench numa: Fix immediate meeting of convergence condition
  perf bench numa: Fixes of --quiet argument
  perf bench futex: Fix hung wakeup tasks after requeueing
  perf probe: Fix bug with global variables handling
  perf top: Fix a segfault when kernel map is restricted.
  tools lib traceevent: Fix build failure on 32-bit arch
  perf kmem: Fix compiles on RHEL6/OL6
  tools lib api: Undefine _FORTIFY_SOURCE before setting it
  perf kmem: Consistently use PRIu64 for printing u64 values
  perf trace: Disable events and drain events when forked workload ends
  perf trace: Enable events when doing system wide tracing and starting a workload
  perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver
  perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
  perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu
2015-05-06 10:47:25 -07:00
Bobby Powers
c88d47480d x86/fpu: Always restore_xinit_state() when use_eager_cpu()
The following commit:

  f893959b08 ("x86/fpu: Don't abuse drop_init_fpu() in flush_thread()")

removed drop_init_fpu() usage from flush_thread(). This seems to break
things for me - the Go 1.4 test suite fails all over the place with
floating point comparision errors (offending commit found through
bisection).

The functional change was that flush_thread() after this commit
only calls restore_init_xstate() when both use_eager_fpu() and
!used_math() are true. drop_init_fpu() (now fpu_reset_state()) calls
restore_init_xstate() regardless of whether current used_math() - apply
the same logic here.

Switch used_math() -> tsk_used_math(tsk) to consistently use the grabbed
tsk instead of current, like in the rest of flush_thread().

Tested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Bobby Powers <bobbypowers@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: f893959b ("x86/fpu: Don't abuse drop_init_fpu() in flush_thread()")
Link: http://lkml.kernel.org/r/1430147441-9820-1-git-send-email-bobbypowers@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-06 11:22:03 +02:00
Marc Dionne
de71ad2c97 x86: Make cpu_tss available to external modules
Commit 75182b1632 ("x86/asm/entry: Switch all C consumers of
kernel_stack to this_cpu_sp0()") changed current_thread_info
to use this_cpu_sp0, and indirectly made it rely on init_tss
which was exported with EXPORT_PER_CPU_SYMBOL_GPL.
As a result some macros and inline functions such as set/get_fs,
test_thread_flag and variants have been made unusable for
external modules.

Make cpu_tss exported with EXPORT_PER_CPU_SYMBOL so that these
functions are accessible again, as they were previously.

Signed-off-by: Marc Dionne <marc.dionne@your-file-system.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/1430763404-21221-1-git-send-email-marc.dionne@your-file-system.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-05 20:40:31 +02:00
Boris Ostrovsky
a71dbdaa8c hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests
Commit 61f01dd941 ("x86_64, asm: Work around AMD SYSRET SS descriptor
attribute issue") makes AMD processors set SS to __KERNEL_DS in
__switch_to() to deal with cases when SS is NULL.

This breaks Xen PV guests who do not want to load SS with__KERNEL_DS.

Since the problem that the commit is trying to address would have to be
fixed in the hypervisor (if it in fact exists under Xen) there is no
reason to set X86_BUG_SYSRET_SS_ATTRS flag for PV VPCUs here.

This can be easily achieved by adding x86_hyper_xen_hvm.set_cpu_features
op which will clear this flag. (And since this structure is no longer
HVM-specific we should do some renaming).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-05-05 18:27:43 +01:00
Paolo Bonzini
73459e2a1a x86: pvclock: Really remove the sched notifier for cross-cpu migrations
This reverts commits 0a4e6be9ca
and 80f7fdb1c7.

The task migration notifier was originally introduced in order to support
the pvclock vsyscall with non-synchronized TSC, but KVM only supports it
with synchronized TSC.  Hence, on KVM the race condition is only needed
due to a bad implementation on the host side, and even then it's so rare
that it's mostly theoretical.

As far as KVM is concerned it's possible to fix the host, avoiding the
additional complexity in the vDSO and the (re)introduction of the task
migration notifier.

Xen, on the other hand, hasn't yet implemented vsyscall support at
all, so we do not care about its plans for non-synchronized TSC.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-27 15:49:30 +02:00
Andy Lutomirski
61f01dd941 x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue
AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
SS == 0 results in an invalid usermode state in which SS is apparently
equal to __USER_DS but causes #SS if used.

Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
ensuring that SYSRET never happens with SS set to NULL.

This was exposed by a recent vDSO cleanup.

Fixes: e7d6eefaaa x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-26 17:57:38 -07:00
Linus Torvalds
836ee4874e Initial ACPI support for arm64:
This series introduces preliminary ACPI 5.1 support to the arm64 kernel
 using the "hardware reduced" profile. We don't support any peripherals
 yet, so it's fairly limited in scope:
 
 - Memory init (UEFI)
 - ACPI discovery (RSDP via UEFI)
 - CPU init (FADT)
 - GIC init (MADT)
 - SMP boot (MADT + PSCI)
 - ACPI Kconfig options (dependent on EXPERT)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJVNOC2AAoJELescNyEwWM08dIH/1Pn5xa04wwNDn0MOpbuQMk2
 kHM7hx69fbXflTJpnZRVyFBjRxxr5qilA7rljAFLnFeF8Fcll/s5VNy7ElHKLISq
 CB0ywgUfOd/sFJH57rcc67pC1b/XuqTbE1u1NFwvp2R3j1kGAEJWNA6SyxIP4bbc
 NO5jScx0lQOJ3rrPAXBW8qlGkeUk7TPOQJtMrpftNXlFLFrR63rPaEmMZ9dWepBF
 aRE4GXPvyUhpyv5o9RvlN5l8bQttiRJ3f9QjyG7NYhX0PXH3DyvGUzYlk2IoZtID
 v3ssCQH3uRsAZHIBhaTyNqFnUIaDR825bvGqyG/tj2Dt3kQZiF+QrfnU5D9TuMw=
 =zLJn
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull initial ACPI support for arm64 from Will Deacon:
 "This series introduces preliminary ACPI 5.1 support to the arm64
  kernel using the "hardware reduced" profile.  We don't support any
  peripherals yet, so it's fairly limited in scope:

   - MEMORY init (UEFI)

   - ACPI discovery (RSDP via UEFI)

   - CPU init (FADT)

   - GIC init (MADT)

   - SMP boot (MADT + PSCI)

   - ACPI Kconfig options (dependent on EXPERT)

  ACPI for arm64 has been in development for a while now and hardware
  has been available that can boot with either FDT or ACPI tables.  This
  has been made possible by both changes to the ACPI spec to cater for
  ARM-based machines (known as "hardware-reduced" in ACPI parlance) but
  also a Linaro-driven effort to get this supported on top of the Linux
  kernel.  This pull request is the result of that work.

  These changes allow us to initialise the CPUs, interrupt controller,
  and timers via ACPI tables, with memory information and cmdline coming
  from EFI.  We don't support a hybrid ACPI/FDT scheme.  Of course,
  there is still plenty of work to do (a serial console would be nice!)
  but I expect that to happen on a per-driver basis after this core
  series has been merged.

  Anyway, the diff stat here is fairly horrible, but splitting this up
  and merging it via all the different subsystems would have been
  extremely painful.  Instead, we've got all the relevant Acks in place
  and I've not seen anything other than trivial (Kconfig) conflicts in
  -next (for completeness, I've included my resolution below).  Nearly
  half of the insertions fall under Documentation/.

  So, we'll see how this goes.  Right now, it all depends on EXPERT and
  I fully expect people to use FDT by default for the immediate future"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (31 commits)
  ARM64 / ACPI: make acpi_map_gic_cpu_interface() as void function
  ARM64 / ACPI: Ignore the return error value of acpi_map_gic_cpu_interface()
  ARM64 / ACPI: fix usage of acpi_map_gic_cpu_interface
  ARM64: kernel: acpi: honour acpi=force command line parameter
  ARM64: kernel: acpi: refactor ACPI tables init and checks
  ARM64: kernel: psci: let ACPI probe PSCI version
  ARM64: kernel: psci: factor out probe function
  ACPI: move arm64 GSI IRQ model to generic GSI IRQ layer
  ARM64 / ACPI: Don't unflatten device tree if acpi=force is passed
  ARM64 / ACPI: additions of ACPI documentation for arm64
  Documentation: ACPI for ARM64
  ARM64 / ACPI: Enable ARM64 in Kconfig
  XEN / ACPI: Make XEN ACPI depend on X86
  ARM64 / ACPI: Select ACPI_REDUCED_HARDWARE_ONLY if ACPI is enabled on ARM64
  clocksource / arch_timer: Parse GTDT to initialize arch timer
  irqchip: Add GICv2 specific ACPI boot support
  ARM64 / ACPI: Introduce ACPI_IRQ_MODEL_GIC and register device's gsi
  ACPI / processor: Make it possible to get CPU hardware ID via GICC
  ACPI / processor: Introduce phys_cpuid_t for CPU hardware ID
  ARM64 / ACPI: Parse MADT for SMP initialization
  ...
2015-04-24 08:23:45 -07:00
Sonny Rao
0140e6141e perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver
This keeps all the related PCI IDs together in the driver where
they are used.

Signed-off-by: Sonny Rao <sonnyrao@chromium.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1429644791-25724-1-git-send-email-sonnyrao@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-22 08:29:19 +02:00
Sonny Rao
80bcffb376 perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
This uncore is the same as the Haswell desktop part but uses a
different PCI ID.

Signed-off-by: Sonny Rao <sonnyrao@chromium.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1429569247-16697-1-git-send-email-sonnyrao@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-22 08:27:43 +02:00
Jiri Olsa
3b6e042188 perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu
The core_pmu does not define cpu_* callbacks, which handles
allocation of 'struct cpu_hw_events::shared_regs' data,
initialization of debug store and PMU_FL_EXCL_CNTRS counters.

While this probably won't happen on bare metal, virtual CPU can
define x86_pmu.extra_regs together with PMU version 1 and thus
be using core_pmu -> using shared_regs data without it being
allocated. That could could leave to following panic:

	BUG: unable to handle kernel NULL pointer dereference at (null)
	IP: [<ffffffff8152cd4f>] _spin_lock_irqsave+0x1f/0x40

	SNIP

	 [<ffffffff81024bd9>] __intel_shared_reg_get_constraints+0x69/0x1e0
	 [<ffffffff81024deb>] intel_get_event_constraints+0x9b/0x180
	 [<ffffffff8101e815>] x86_schedule_events+0x75/0x1d0
	 [<ffffffff810586dc>] ? check_preempt_curr+0x7c/0x90
	 [<ffffffff810649fe>] ? try_to_wake_up+0x24e/0x3e0
	 [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20
	 [<ffffffff8109eb16>] ? autoremove_wake_function+0x16/0x40
	 [<ffffffff810577e9>] ? __wake_up_common+0x59/0x90
	 [<ffffffff811a9517>] ? __d_lookup+0xa7/0x150
	 [<ffffffff8119db5f>] ? do_lookup+0x9f/0x230
	 [<ffffffff811a993a>] ? dput+0x9a/0x150
	 [<ffffffff8119c8f5>] ? path_to_nameidata+0x25/0x60
	 [<ffffffff8119e90a>] ? __link_path_walk+0x7da/0x1000
	 [<ffffffff8101d8f9>] ? x86_pmu_add+0xb9/0x170
	 [<ffffffff8101d7a7>] x86_pmu_commit_txn+0x67/0xc0
	 [<ffffffff811b07b0>] ? mntput_no_expire+0x30/0x110
	 [<ffffffff8119c731>] ? path_put+0x31/0x40
	 [<ffffffff8107c297>] ? current_fs_time+0x27/0x30
	 [<ffffffff8117d170>] ? mem_cgroup_get_reclaim_stat_from_page+0x20/0x70
	 [<ffffffff8111b7aa>] group_sched_in+0x13a/0x170
	 [<ffffffff81014a29>] ? sched_clock+0x9/0x10
	 [<ffffffff8111bac8>] ctx_sched_in+0x2e8/0x330
	 [<ffffffff8111bb7b>] perf_event_sched_in+0x6b/0xb0
	 [<ffffffff8111bc36>] perf_event_context_sched_in+0x76/0xc0
	 [<ffffffff8111eb3b>] perf_event_comm+0x1bb/0x2e0
	 [<ffffffff81195ee9>] set_task_comm+0x69/0x80
	 [<ffffffff81195fe1>] setup_new_exec+0xe1/0x2e0
	 [<ffffffff811ea68e>] load_elf_binary+0x3ce/0x1ab0

Adding cpu_(prepare|starting|dying) for core_pmu to have
shared_regs data allocated for core_pmu. AFAICS there's no harm
to initialize debug store and PMU_FL_EXCL_CNTRS either for
core_pmu.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20150421152623.GC13169@krava.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-22 08:24:33 +02:00
Linus Torvalds
41d5e08ea8 TTY/Serial patches for 4.1-rc1
Here's the big tty/serial driver update for 4.1-rc1.
 
 It was delayed for a bit due to some questions surrounding some of the
 console command line parsing changes that are in here.  There's still
 one tiny regression for people who were previously putting multiple
 console command lines and expecting them all to be ignored for some odd
 reason, but Peter is working on fixing that.  If not, I'll send a revert
 for the offending patch, but I have faith that Peter can address it.
 
 Other than the console work here, there's the usual serial driver
 updates and changes, and a buch of 8250 reworks to try to make that
 driver easier to maintain over time, and have it support more devices in
 the future.
 
 All of these have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlU2IcUACgkQMUfUDdst+ylFqACcC8LPhFEZg9aHn0hNUoqGK3rE
 5dUAnR4b8r/NYqjVoE9FJZgZfB/TqVi1
 =lyN/
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial updates from Greg KH:
 "Here's the big tty/serial driver update for 4.1-rc1.

  It was delayed for a bit due to some questions surrounding some of the
  console command line parsing changes that are in here.  There's still
  one tiny regression for people who were previously putting multiple
  console command lines and expecting them all to be ignored for some
  odd reason, but Peter is working on fixing that.  If not, I'll send a
  revert for the offending patch, but I have faith that Peter can
  address it.

  Other than the console work here, there's the usual serial driver
  updates and changes, and a buch of 8250 reworks to try to make that
  driver easier to maintain over time, and have it support more devices
  in the future.

  All of these have been in linux-next for a while"

* tag 'tty-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (119 commits)
  n_gsm: Drop unneeded cast on netdev_priv
  sc16is7xx: expose RTS inversion in RS-485 mode
  serial: 8250_pci: port failed after wakeup from S3
  earlycon: 8250: Document kernel command line options
  earlycon: 8250: Fix command line regression
  earlycon: Fix __earlycon_table stride
  tty: clean up the tty time logic a bit
  serial: 8250_dw: only get the clock rate in one place
  serial: 8250_dw: remove useless ACPI ID check
  dmaengine: hsu: move memory allocation to GFP_NOWAIT
  dmaengine: hsu: remove redundant pieces of code
  serial: 8250_pci: add Intel Tangier support
  dmaengine: hsu: add Intel Tangier PCI ID
  serial: 8250_pci: replace switch-case by formula for Intel MID
  serial: 8250_pci: replace switch-case by formula
  tty: cpm_uart: replace CONFIG_8xx by CONFIG_CPM1
  serial: jsm: some off by one bugs
  serial: xuartps: Fix check in console_setup().
  serial: xuartps: Get rid of register access macros.
  serial: xuartps: Fix iobase use.
  ...
2015-04-21 09:33:10 -07:00
Linus Torvalds
6496edfce9 This is the final removal (after several years!) of the obsolete cpus_*
functions, prompted by their mis-use in staging.
 
 With these function removed, all cpu functions should only iterate to
 nr_cpu_ids, so we finally only allocate that many bits when cpumasks
 are allocated offstack.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVNPMuAAoJENkgDmzRrbjx7ZIP/j65e6xs1jEyXR3WOYSdTU1x
 bMo6JcII6O1oEZLgyKXgx9KiBg6uIIDta1NG/H/XIe354dwfHVsHvj5HHHQR5Xof
 iRrjLOaHj4XglI3hvsk0eEEl3/OBBLgyo9bUwDvMF1fmr/9tW4caMs3Op6n7Evzm
 YIvoAyeJ0A8BfEtOU5lXhcVIGmnHtSw0x6mdGXpXIBmWYQPCtdQP868s4lnl44w0
 bSNpAYdzEqg64Ph3SK0prgWPrn5+5EiaAhV7HZzENZ5+o0DAdIXWq/W7uHyCWPKH
 536cJDojec+nSUQkPYngngGprxrKO02aBcMw/3JGJ0tdCDj8yw3XAyVAFzz4hmMb
 Lkmyv4QHHIILLvJ4ZRH5KHWCjjVBg41LNCs2H3HnoxFACdm0lZYKHsUAh2ucBVtU
 Wb/eHmLxOG43UIkpX4yrhy3SfE1ZdnOVzEzOzPXtr51t8ojqk+bLFe/hJ6EkzrQX
 X+90qHfBq+PMJlAnc3zdXHjxoJrL6KPWVwVvFrNeibgEKtVvy/BiwZkS6QceC1Ea
 TatOYA5r6awFVHHQCooN1DGAxN5Juvu2SmdnTUA9ymsCNDghj1YUoAKRNP81u8Sa
 pe3hco/63iCuPna+vlwNDU6SgsaMk9m0p+1n1BiDIfVJIkWYCNeG+u2gQkzbDKlQ
 AJuKKQv1QuZiF0ylZ0wq
 =VAgA
 -----END PGP SIGNATURE-----

Merge tag 'cpumask-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull final removal of deprecated cpus_* cpumask functions from Rusty Russell:
 "This is the final removal (after several years!) of the obsolete
  cpus_* functions, prompted by their mis-use in staging.

  With these function removed, all cpu functions should only iterate to
  nr_cpu_ids, so we finally only allocate that many bits when cpumasks
  are allocated offstack"

* tag 'cpumask-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits)
  cpumask: remove __first_cpu / __next_cpu
  cpumask: resurrect CPU_MASK_CPU0
  linux/cpumask.h: add typechecking to cpumask_test_cpu
  cpumask: only allocate nr_cpumask_bits.
  Fix weird uses of num_online_cpus().
  cpumask: remove deprecated functions.
  mips: fix obsolete cpumask_of_cpu usage.
  x86: fix more deprecated cpu function usage.
  ia64: remove deprecated cpus_ usage.
  powerpc: fix deprecated CPU_MASK_CPU0 usage.
  CPU_MASK_ALL/CPU_MASK_NONE: remove from deprecated region.
  staging/lustre/o2iblnd: Don't use cpus_weight
  staging/lustre/libcfs: replace deprecated cpus_ calls with cpumask_
  staging/lustre/ptlrpc: Do not use deprecated cpus_* functions
  blackfin: fix up obsolete cpu function usage.
  parisc: fix up obsolete cpu function usage.
  tile: fix up obsolete cpu function usage.
  arm64: fix up obsolete cpu function usage.
  mips: fix up obsolete cpu function usage.
  x86: fix up obsolete cpu function usage.
  ...
2015-04-20 10:19:03 -07:00
Linus Torvalds
34a984f7b0 Merge branch 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull PMEM driver from Ingo Molnar:
 "This is the initial support for the pmem block device driver:
  persistent non-volatile memory space mapped into the system's physical
  memory space as large physical memory regions.

  The driver is based on Intel code, written by Ross Zwisler, with fixes
  by Boaz Harrosh, integrated with x86 e820 memory resource management
  and tidied up by Christoph Hellwig.

  Note that there were two other separate pmem driver submissions to
  lkml: but apparently all parties (Ross Zwisler, Boaz Harrosh) are
  reasonably happy with this initial version.

  This version enables minimal support that enables persistent memory
  devices out in the wild to work as block devices, identified through a
  magic (non-standard) e820 flag and auto-discovered if
  CONFIG_X86_PMEM_LEGACY=y, or added explicitly through manipulating the
  memory maps via the "memmap=..." boot option with the new, special '!'
  modifier character.

  Limitations: this is a regular block device, and since the pmem areas
  are not struct page backed, they are invisible to the rest of the
  system (other than the block IO device), so direct IO to/from pmem
  areas, direct mmap() or XIP is not possible yet.  The page cache will
  also shadow and double buffer pmem contents, etc.

  Initial support is for x86"

* 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  drivers/block/pmem: Fix 32-bit build warning in pmem_alloc()
  drivers/block/pmem: Add a driver for persistent memory
  x86/mm: Add support for the non-standard protected e820 type
2015-04-18 11:42:49 -04:00
Linus Torvalds
90d1c08786 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "This tree includes:

   - an FPU related crash fix

   - a ptrace fix (with matching testcase in tools/testing/selftests/)

   - an x86 Kconfig DMA-config defaults tweak to better avoid
     non-working drivers"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected
  x86/fpu: Load xsave pointer *after* initialization
  x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
  x86, selftests: Add single_step_syscall test
2015-04-18 11:31:11 -04:00
Linus Torvalds
96b90f27bc Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "This update has mostly fixes, but also other bits:

   - perf tooling fixes

   - PMU driver fixes

   - Intel Broadwell PMU driver HW-enablement for LBR callstacks

   - a late coming 'perf kmem' tool update that enables it to also
     analyze page allocation data.  Note, this comes with MM tracepoint
     changes that we believe to not break anything: because it changes
     the formerly opaque 'struct page *' field that uniquely identifies
     pages to 'pfn' which identifies pages uniquely too, but isn't as
     opaque and can be used for other purposes as well"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/pt: Fix and clean up error handling in pt_event_add()
  perf/x86/intel: Add Broadwell support for the LBR callstack
  perf/x86/intel/rapl: Fix energy counter measurements but supporing per domain energy units
  perf/x86/intel: Fix Core2,Atom,NHM,WSM cycles:pp events
  perf/x86: Fix hw_perf_event::flags collision
  perf probe: Fix segfault when probe with lazy_line to file
  perf probe: Find compilation directory path for lazy matching
  perf probe: Set retprobe flag when probe in address-based alternative mode
  perf kmem: Analyze page allocator events also
  tracing, mm: Record pfn instead of pointer to struct page
2015-04-18 11:26:46 -04:00
Ingo Molnar
0c99241c93 perf/x86/intel/pt: Fix and clean up error handling in pt_event_add()
Dan Carpenter reported that pt_event_add() has buggy
error handling logic: it returns 0 instead of -EBUSY when
it fails to start a newly added event.

Furthermore, the control flow in this function is messy,
with cleanup labels mixed with direct returns.

Fix the bug and clean up the code by converting it to
a straight fast path for the regular non-failing case,
plus a clear sequence of cascading goto labels to do
all cleanup.

NOTE: I materially changed the existing clean up logic in the
pt_event_start() failure case to use the direct
perf_aux_output_end() path, not pt_event_del(), because
perf_aux_output_end() is enough here.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150416103830.GB7847@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-18 13:31:26 +02:00
Borislav Petkov
18ecb3bfa5 x86/fpu: Load xsave pointer *after* initialization
So I was playing with gdb today and did this simple thing:

	gdb /bin/ls

	...

	(gdb) run

Box exploded with this splat:

	BUG: unable to handle kernel NULL pointer dereference at 00000000000001d0
	IP: [<ffffffff8100fe5a>] xstateregs_get+0x7a/0x120
	[...]

	Call Trace:
	 ptrace_regset
	 ptrace_request
	 ? wait_task_inactive
	 ? preempt_count_sub
	 arch_ptrace
	 ? ptrace_get_task_struct
	 SyS_ptrace
	 system_call_fastpath

... because we do cache &target->thread.fpu.state->xsave into the
local variable xsave but that pointer is NULL at that time and
it gets initialized later, in init_fpu(), see:

	e7f180dcd8 ("x86/fpu: Change xstateregs_get()/set() to use ->xsave.i387 rather than ->fxsave")

The fix is simple: load xsave *after* init_fpu() has run.

Also do the same in xstateregs_set(), as suggested by Oleg Nesterov.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1429209697-5902-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-17 10:15:47 +02:00
Kan Liang
78d504bcd7 perf/x86/intel: Add Broadwell support for the LBR callstack
Same as Haswell, Broadwell also support the LBR callstack.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1427962377-40955-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-17 09:59:07 +02:00
Jacob Pan
6455239601 perf/x86/intel/rapl: Fix energy counter measurements but supporing per domain energy units
RAPL energy hardware unit can vary within a single CPU package, e.g.
HSW server DRAM has a fixed energy unit of 15.3 uJ (2^-16) whereas
the unit on other domains can be enumerated from power unit MSR.

There might be other variations in the future, this patch adds
per cpu model quirk to allow special handling of certain cpus.

hw_unit is also removed from per cpu data since it is not per cpu
and the sampling rate for energy counter is typically not high.

Without this patch, DRAM domain on HSW servers will be counted
4x higher than the real energy counter.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1427405325-780-1-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-17 09:58:56 +02:00