Here's the big driver core merge for 3.4-rc1.
Lots of various things here, sysfs fixes/tweaks (with the nlink breakage
reverted), dynamic debugging updates, w1 drivers, hyperv driver updates,
and a variety of other bits and pieces, full information in the
shortlog.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iEYEABECAAYFAk9neCsACgkQMUfUDdst+ylyQwCfY2eizvzw5HhjQs8gOiBRDADe
yrgAnj1Zan2QkoCnQIFJNAoxqNX9yAhd
=biH6
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core patches for 3.4-rc1 from Greg KH:
"Here's the big driver core merge for 3.4-rc1.
Lots of various things here, sysfs fixes/tweaks (with the nlink
breakage reverted), dynamic debugging updates, w1 drivers, hyperv
driver updates, and a variety of other bits and pieces, full
information in the shortlog."
* tag 'driver-core-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (78 commits)
Tools: hv: Support enumeration from all the pools
Tools: hv: Fully support the new KVP verbs in the user level daemon
Drivers: hv: Support the newly introduced KVP messages in the driver
Drivers: hv: Add new message types to enhance KVP
regulator: Support driver probe deferral
Revert "sysfs: Kill nlink counting."
uevent: send events in correct order according to seqnum (v3)
driver core: minor comment formatting cleanups
driver core: move the deferred probe pointer into the private area
drivercore: Add driver probe deferral mechanism
DS2781 Maxim Stand-Alone Fuel Gauge battery and w1 slave drivers
w1_bq27000: Only one thread can access the bq27000 at a time.
w1_bq27000 - remove w1_bq27000_write
w1_bq27000: remove unnecessary NULL test.
sysfs: Fix memory leak in sysfs_sd_setsecdata().
intel_idle: Revert change of auto_demotion_disable_flags for Nehalem
w1: Fix w1_bq27000
driver-core: documentation: fix up Greg's email address
powernow-k6: Really enable auto-loading
powernow-k7: Fix CPU family number
...
I got somewhat tired of having to decode hex numbers..
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
Link: http://lkml.kernel.org/n/tip-0vsy1sgywc4uar3mu1szm0rg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With branch stack sampling, it is possible to filter by priv levels.
In system-wide mode, that means it is possible to capture only user
level branches. The builtin SW LBR filter needs to disassemble code
based on LBR captured addresses. For that, it needs to know the task
the addresses are associated with. Because of context switches, the
content of the branch stack buffer may contain addresses from
different tasks.
We need a callback on context switch to either flush the branch stack
or save it. This patch adds a new callback in struct pmu which is called
during context switches. The callback is called only when necessary.
That is when a system-wide context has, at least, one event which
uses PERF_SAMPLE_BRANCH_STACK. The callback is never called for
per-thread context.
In this version, the Intel x86 code simply flushes (resets) the LBR
on context switches (fills it with zeroes). Those zeroed branches are
then filtered out by the SW filter.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-11-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds an internal sofware filter to complement
the (optional) LBR hardware filter.
The software filter is necessary:
- as a substitute when there is no HW LBR filter (e.g., Atom, Core)
- to complement HW LBR filter in case of errata (e.g., Nehalem/Westmere)
- to provide finer grain filtering (e.g., all processors)
Sometimes the LBR HW filter cannot distinguish between two types
of branches. For instance, to capture syscall as CALLS, it is necessary
to enable the LBR_FAR filter which will also capture JMP instructions.
Thus, a second pass is necessary to filter those out, this is what the
SW filter can do.
The SW filter is built on top of the internal x86 disassembler. It
is a best effort filter especially for user level code. It is subject
to the availability of the text page of the program.
The SW filter is enabled on all Intel processors. It is bypassed
when the user is capturing all branches at all priv levels.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-9-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch implements PERF_SAMPLE_BRANCH support for Intel
x86processors. It connects PERF_SAMPLE_BRANCH to the actual LBR.
The patch adds the hooks in the PMU irq handler to save the LBR
on counter overflow for both regular and PEBS modes.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-8-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The patch adds a restriction for Intel Atom LBR support. Only
steppings 10 (PineView) and more recent are supported. Older models
do not have a functional LBR. Their LBR does not freeze on PMU
interrupt which makes LBR unusable in the context of perf_events.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-7-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the mappings from the generic PERF_SAMPLE_BRANCH_*
filters to the actual Intel x86LBR filters, whenever they exist.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-6-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If precise sampling is enabled on Intel x86 then perf_event uses PEBS.
To correct for the off-by-one error of PEBS, perf_event uses LBR when
precise_sample > 1.
On Intel x86 PERF_SAMPLE_BRANCH_STACK is implemented using LBR,
therefore both features must be coordinated as they may not
configure LBR the same way.
For PEBS, LBR needs to capture all branches at the priv level of
the associated event.
This patch checks that the branch type and priv level of BRANCH_STACK
is compatible with that of the PEBS LBR requirement, thereby allowing:
$ perf record -b any,u -e instructions:upp ....
But:
$ perf record -b any_call,u -e instructions:upp
Is not possible.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-5-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The Intel LBR on some recent processor is capable
of filtering branches by type. The filter is configurable
via the LBR_SELECT MSR register.
There are limitation on how this register can be used.
On Nehalem/Westmere, the LBR_SELECT is shared by the two HT threads
when HT is on. It is private to each core when HT is off.
On SandyBridge, the LBR_SELECT register is private to each thread
when HT is on. It is private to each core when HT is off.
The kernel must manage the sharing of LBR_SELECT. It allows
multiple users on the same logical CPU to use LBR_SELECT as
long as they program it with the same value. Across sibling
CPUs (HT threads), the same restriction applies on NHM/WSM.
This patch implements this sharing logic by leveraging the
mechanism put in place for managing the offcore_response
shared MSR.
We modify __intel_shared_reg_get_constraints() to cause
x86_get_event_constraint() to be called because LBR may
be associated with events that may be counter constrained.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-4-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the LBR definitions for NHM/WSM/SNB and Core.
It also adds the definitions for the architected LBR MSR:
LBR_SELECT, LBRT_TOS.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the ability to sample taken branches to the
perf_event interface.
The ability to capture taken branches is very useful for all
sorts of analysis. For instance, basic block profiling, call
counts, statistical call graph.
This new capability requires hardware assist and as such may
not be available on all HW platforms. On Intel x86 it is
implemented on top of the Last Branch Record (LBR) facility.
To enable taken branches sampling, the PERF_SAMPLE_BRANCH_STACK
bit must be set in attr->sample_type.
Sampled taken branches may be filtered by type and/or priv
levels.
The patch adds a new field, called branch_sample_type, to the
perf_event_attr structure. It contains a bitmask of filters
to apply to the sampled taken branches.
Filters may be implemented in HW. If the HW filter does not exist
or is not good enough, some arch may also implement a SW filter.
The following generic filters are currently defined:
- PERF_SAMPLE_USER
only branches whose targets are at the user level
- PERF_SAMPLE_KERNEL
only branches whose targets are at the kernel level
- PERF_SAMPLE_HV
only branches whose targets are at the hypervisor level
- PERF_SAMPLE_ANY
any type of branches (subject to priv levels filters)
- PERF_SAMPLE_ANY_CALL
any call branches (may incl. syscall on some arch)
- PERF_SAMPLE_ANY_RET
any return branches (may incl. syscall returns on some arch)
- PERF_SAMPLE_IND_CALL
indirect call branches
Obviously filter may be combined. The priv level bits are optional.
If not provided, the priv level of the associated event are used. It
is possible to collect branches at a priv level different from the
associated event. Use of kernel, hv priv levels is subject to permissions
and availability (hv).
The number of taken branch records present in each sample may vary based
on HW, the type of sampled branches, the executed code. Therefore
each sample contains the number of taken branches it contains.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It turned out that a performance counter on AMD does not
count at all when the GO or HO bit is set in the control
register and SVM is disabled in EFER.
This patch works around this issue by masking out the HO bit
in the performance counter control register when SVM is not
enabled.
The GO bit is not touched because it is only set when the
user wants to count in guest-mode only. So when SVM is
disabled the counter should not run at all and the
not-counting is the intended behaviour.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Avi Kivity <avi@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: stable@vger.kernel.org # v3.2
Link: http://lkml.kernel.org/r/1330523852-19566-1-git-send-email-joerg.roedel@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce/AMD: Fix UP build error
x86: Specify a size for the cmp in the NMI handler
x86/nmi: Test saved %cs in NMI to determine nested NMI case
x86/amd: Fix L1i and L2 cache sharing information for AMD family 15h processors
x86/microcode: Remove noisy AMD microcode warning
141168c36c ("x86: Simplify code by removing a !SMP #ifdefs
from 'struct cpuinfo_x86'") removed a bunch of CONFIG_SMP ifdefs
around code touching struct cpuinfo_x86 members but also caused
the following build error with Randy's randconfigs:
mce_amd.c:(.cpuinit.text+0x4723): undefined reference to `cpu_llc_shared_map'
Restore the #ifdef in threshold_create_bank() which creates
symlinks on the non-BSP CPUs.
There's a better patch series being worked on by Kevin Winchester
which will solve this in a cleaner fashion, but that series is
too ambitious for v3.3 merging - so we first queue up this trivial
fix and then do the rest for v3.4.
Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Kevin Winchester <kjwinchester@gmail.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Nick Bowler <nbowler@elliptictech.com>
Link: http://lkml.kernel.org/r/20120203191801.GA2846@x1.osrc.amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
(And define it properly for x86-32, which had its 'current_task'
declaration in separate from x86-64)
Bitten by my dislike for modules on the machines I use, and the fact
that apparently nobody else actually wanted to test the patches I sent
out.
Snif. Nobody else cares.
Anyway, we probably should uninline the 'kernel_fpu_begin()' function
that is what modules actually use and that references this, but this is
the minimal fix for now.
Reported-by: Josh Boyer <jwboyer@gmail.com>
Reported-and-tested-by: Jongman Heo <jongman.heo@samsung.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes us recognize when we try to restore FPU state that matches
what we already have in the FPU on this CPU, and avoids the restore
entirely if so.
To do this, we add two new data fields:
- a percpu 'fpu_owner_task' variable that gets written any time we
update the "has_fpu" field, and thus acts as a kind of back-pointer
to the task that owns the CPU. The exception is when we save the FPU
state as part of a context switch - if the save can keep the FPU
state around, we leave the 'fpu_owner_task' variable pointing at the
task whose FP state still remains on the CPU.
- a per-thread 'last_cpu' field, that indicates which CPU that thread
used its FPU on last. We update this on every context switch
(writing an invalid CPU number if the last context switch didn't
leave the FPU in a lazily usable state), so we know that *that*
thread has done nothing else with the FPU since.
These two fields together can be used when next switching back to the
task to see if the CPU still matches: if 'fpu_owner_task' matches the
task we are switching to, we know that no other task (or kernel FPU
usage) touched the FPU on this CPU in the meantime, and if the current
CPU number matches the 'last_cpu' field, we know that this thread did no
other FP work on any other CPU, so the FPU state on the CPU must match
what was saved on last context switch.
In that case, we can avoid the 'f[x]rstor' entirely, and just clear the
CR0.TS bit.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We currently include commas on both sides of the feature ID in a
modalias, but this prevents the lowest numbered feature of a CPU from
being matched. Since all feature IDs have the same length, we do not
need to worry about substring matches, so omit commas from the
modalias entirely.
Avoid generating multiple adjacent wildcards when there is no
feature ID to match.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Thomas Renninger <trenn@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
snprintf() does not return a negative value when truncating.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Thomas Renninger <trenn@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Found out that show_msr=<cpus> is broken, when I asked a
user to use it to capture debug info about broken MTRR's
whose MTRR settings are probably different between CPUs.
Only the first CPUs MSRs are printed, but that is not
enough to track down the suspected bug.
For years we called print_cpu_msr from print_cpu_info(),
but this commit:
| commit 2eaad1fddd
| Author: Mike Travis <travis@sgi.com>
| Date: Thu Dec 10 17:19:36 2009 -0800
|
| x86: Limit the number of processor bootup messages
removed the print_cpu_info() call from all APs.
Put it back - it will only print MSRs when the user
specifically requests them via show_msr=<cpus>.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/1329069237-11483-1-git-send-email-yinghai@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For L1 instruction cache and L2 cache the shared CPU information
is wrong. On current AMD family 15h CPUs those caches are shared
between both cores of a compute unit.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=42607
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Petkov Borislav <Borislav.Petkov@amd.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/20120208195229.GA17523@alberich.amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The following patch fixes a bug introduced by the following
commit:
e050e3f0a7 ("perf: Fix broken interrupt rate throttling")
The patch caused the following warning to pop up depending on
the sampling frequency adjustments:
------------[ cut here ]------------
WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4()
It was caused by the following call sequence:
perf_adjust_freq_unthr_context.part() {
stop()
if (delta > 0) {
perf_adjust_period() {
if (period > 8*...) {
stop()
...
start()
}
}
}
start()
}
Which caused a double start and a double stop, thus triggering
the assert in x86_pmu_start().
The patch fixes the problem by avoiding the double calls. We
pass a new argument to perf_adjust_period() to indicate whether
or not the event is already stopped. We can't just remove the
start/stop from that function because it's called from
__perf_event_overflow where the event needs to be reloaded via a
stop/start back-toback call.
The patch reintroduces the assertion in x86_pmu_start() which
was removed by commit:
84f2b9b ("perf: Remove deprecated WARN_ON_ONCE()")
In this second version, we've added calls to disable/enable PMU
during unthrottling or frequency adjustment based on bug report
of spurious NMI interrupts from Eric Dumazet.
Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: markus@trippelsdorf.de
Cc: paulus@samba.org
Link: http://lkml.kernel.org/r/20120207133956.GA4932@quad
[ Minor edits to the changelog and to the code ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Stephane Eranian reported that doing a scheduler latency
measurements with perf on AMD doesn't work out as expected due
to the fact that the sched_clock() granularity is too coarse,
i.e. done in jiffies due to the sched_clock_stable not set,
which, if set, would mean that we get to use the TSC as sample
source which would give us much higher precision.
However, there's no reason not to set sched_clock_stable on AMD
because all families from F10h and upwards do have an invariant
TSC and have the CPUID flag to prove (CPUID_8000_0007_EDX[8]).
Make it so, #1.
Signed-off-by: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Venki Pallipadi <venki@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: http://lkml.kernel.org/r/20120206132546.GA30854@quad
[ Should any non-standard system break the TSC, we should
mark them so explicitly, in their platform init handler, or
in a DMI quirk. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
So that we can get the perf bench exec stack fixes and then apply the
remaining fix for the files added after what is in perf/urgent.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With the new throttling/unthrottling code introduced with
commit:
e050e3f0a7 ("perf: Fix broken interrupt rate throttling")
we occasionally hit two WARN_ON_ONCE() checks in:
- intel_pmu_pebs_enable()
- intel_pmu_lbr_enable()
- x86_pmu_start()
The assertions are no longer problematic. There is a valid
path where they can trigger but it is harmless.
The assertion can be triggered with:
$ perf record -e instructions:pp ....
Leading to paths:
intel_pmu_pebs_enable
intel_pmu_enable_event
x86_perf_event_set_period
x86_pmu_start
perf_adjust_freq_unthr_context
perf_event_task_tick
scheduler_tick
And:
intel_pmu_lbr_enable
intel_pmu_enable_event
x86_perf_event_set_period
x86_pmu_start
perf_adjust_freq_unthr_context.
perf_event_task_tick
scheduler_tick
cpuc->enabled is always on because when we get to
perf_adjust_freq_unthr_context() the PMU is not totally
disabled. Furthermore when we need to adjust a period,
we only stop the event we need to change and not the
entire PMU. Thus, when we re-enable, cpuc->enabled is
already set. Note that when we stop the event, both
pebs and lbr are stopped if necessary (and possible).
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/20120202110401.GA30911@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch is based on Andi Kleen's work:
Implement autoprobing/loading of modules serving CPU
specific features (x86cpu autoloading).
And Kay Siever's work to get rid of sysdev cpu structures
and making use of struct device instead.
Before, the cpuid driver had to be loaded to get the x86cpu
autoloading feature. With this patch autoloading works through
the /sys/devices/system/cpu object
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It is rather similar to CPB (boot capability) feature
and exists since fam10h (can be looked up in AMD's BKDG).
The feature is needed for powernow-k8 to cleanup init functions and to
provide proper autoloading matching with the new x86cpu modalias
feature.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Borislav Petkov <bp@amd64.org>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There's a growing number of drivers that support a specific x86 feature
or CPU. Currently loading these drivers currently on a generic
distribution requires various driver specific hacks and it often
doesn't work.
This patch adds auto probing for drivers based on the x86 cpuid
information, in particular based on vendor/family/model number
and also based on CPUID feature bits.
For example a common issue is not loading the SSE 4.2 accelerated
CRC module: this can significantly lower the performance of BTRFS
which relies on fast CRC.
Another issue is loading the right CPUFREQ driver for the current CPU.
Currently distributions often try all all possible driver until
one sticks, which is not really a good way to do this.
It works with existing udev without any changes. The code
exports the x86 information as a generic string in sysfs
that can be matched by udev's pattern matching.
This scheme does not support numeric ranges, so if you want to
handle e.g. ranges of model numbers they have to be encoded
in ASCII or simply all models or families listed. Fixing
that would require changing udev.
Another issue is that udev will happily load all drivers that match,
there is currently no nice way to stop a specific driver from
being loaded if it's not needed (e.g. if you don't need fast CRC)
But there are not that many cpu specific drivers around and they're
all not that bloated, so this isn't a particularly serious issue.
Originally this patch added the modalias to the normal cpu
sysdevs. However sysdevs don't have all the infrastructure
needed for udev, so it couldn't really autoload drivers.
This patch instead adds the CPU modaliases to the cpuid devices,
which are real devices with full support for udev. This implies
that the cpuid driver has to be loaded to use this.
This patch just adds infrastructure, some driver conversions
in followups.
Thanks to Kay for helping with some sysfs magic.
v2: Constifcation, some updates
v4: (trenn@suse.de):
- Use kzalloc instead of kmalloc to terminate modalias buffer
- Use uppercase hex values to match correctly against hex values containing
letters
Cc: Dave Jones <davej@redhat.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Jen Axboe <axboe@kernel.dk>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When suspending, there was a large list of warnings going something like:
Device 'machinecheck1' does not have a release() function, it is broken and must be fixed
This patch turns the static mce_devices into dynamically allocated, and
properly frees them when they are removed from the system. It solves
the warning messages on my laptop here.
Reported-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Djalal Harouni <tixxdz@opendz.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@amd64.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
perf tools: Fix compile error on x86_64 Ubuntu
perf report: Fix --stdio output alignment when --showcpuutilization used
perf annotate: Get rid of field_sep check
perf annotate: Fix usage string
perf kmem: Fix a memory leak
perf kmem: Add missing closedir() calls
perf top: Add error message for EMFILE
perf test: Change type of '-v' option to INCR
perf script: Add missing closedir() calls
tracing: Fix compile error when static ftrace is enabled
recordmcount: Fix handling of elf64 big-endian objects.
perf tools: Add const.h to MANIFEST to make perf-tar-src-pkg work again
perf tools: Add support for guest/host-only profiling
perf kvm: Do guest-only counting by default
perf top: Don't update total_period on process_sample
perf hists: Stop using 'self' for struct hist_entry
perf hists: Rename total_session to total_period
x86: Add counter when debug stack is used with interrupts enabled
x86: Allow NMIs to hit breakpoints in i386
x86: Keep current stack in NMI breakpoints
...
Commit 8a25a2fd12 ("cpu: convert 'cpu' and 'machinecheck' sysdev_class
to a regular subsystem") changed how things are dealt with in the MCE
subsystem. Some of the things that got broken due to this are CPU
hotplug and suspend/hibernate.
MCE uses per_cpu allocations of struct device. So, when a CPU goes
offline and comes back online, in order to ensure that we start from a
clean slate with respect to the MCE subsystem, zero out the entire
per_cpu device structure to 0 before using it.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
arm: fix up some samsung merge sysdev conversion problems
firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
Drivers:hv: Fix a bug in vmbus_driver_unregister()
driver core: remove __must_check from device_create_file
debugfs: add missing #ifdef HAS_IOMEM
arm: time.h: remove device.h #include
driver-core: remove sysdev.h usage.
clockevents: remove sysdev.h
arm: convert sysdev_class to a regular subsystem
arm: leds: convert sysdev_class to a regular subsystem
kobject: remove kset_find_obj_hinted()
m86k: gpio - convert sysdev_class to a regular subsystem
mips: txx9_sram - convert sysdev_class to a regular subsystem
mips: 7segled - convert sysdev_class to a regular subsystem
sh: dma - convert sysdev_class to a regular subsystem
sh: intc - convert sysdev_class to a regular subsystem
power: suspend - convert sysdev_class to a regular subsystem
power: qe_ic - convert sysdev_class to a regular subsystem
power: cmm - convert sysdev_class to a regular subsystem
s390: time - convert sysdev_class to a regular subsystem
...
Fix up conflicts with 'struct sysdev' removal from various platform
drivers that got changed:
- arch/arm/mach-exynos/cpu.c
- arch/arm/mach-exynos/irq-eint.c
- arch/arm/mach-s3c64xx/common.c
- arch/arm/mach-s3c64xx/cpu.c
- arch/arm/mach-s5p64x0/cpu.c
- arch/arm/mach-s5pv210/common.c
- arch/arm/plat-samsung/include/plat/cpu.h
- arch/powerpc/kernel/sysfs.c
and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Use "do { } while(0)" for empty lock_cmos()/unlock_cmos() macros
x86: Use "do { } while(0)" for empty flush_tlb_fix_spurious_fault() macro
x86, CPU: Drop superfluous get_cpu_cap() prototype
arch/x86/mm/pageattr.c: Quiet sparse noise; local functions should be static
arch/x86/kernel/ptrace.c: Quiet sparse noise
x86: Use kmemdup() in copy_thread(), rather than duplicating its implementation
x86: Replace the EVT_TO_HPET_DEV() macro with an inline function
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86: Fix atomic64_xxx_cx8() functions
x86: Fix and improve cmpxchg_double{,_local}()
x86_64, asm: Optimise fls(), ffs() and fls64()
x86, bitops: Move fls64.h inside __KERNEL__
x86: Fix and improve percpu_cmpxchg{8,16}b_double()
x86: Report cpb and eff_freq_ro flags correctly
x86/i386: Use less assembly in strlen(), speed things up a bit
x86: Use the same node_distance for 32 and 64-bit
x86: Fix rflags in FAKE_STACK_FRAME
x86: Clean up and extend do_int3()
x86: Call do_notify_resume() with interrupts enabled
x86/div64: Add a micro-optimization shortcut if base is power of two
x86-64: Cleanup some assembly entry points
x86-64: Slightly shorten line system call entry and exit paths
x86-64: Reduce amount of redundant code generated for invalidate_interruptNN
x86-64: Slightly shorten int_ret_from_sys_call
x86, efi: Convert efi_phys_get_time() args to physical addresses
x86: Default to vsyscall=emulate
x86-64: Set siginfo and context on vsyscall emulation faults
x86: consolidate xchg and xadd macros
...
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Skip cpus with apic-ids >= 255 in !x2apic_mode
x86, x2apic: Allow "nox2apic" to disable x2apic mode setup by BIOS
x86, x2apic: Fallback to xapic when BIOS doesn't setup interrupt-remapping
x86, acpi: Skip acpi x2apic entries if the x2apic feature is not present
x86, apic: Add probe() for apic_flat
x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86'
x86: Convert per-cpu counter icr_read_retry_count into a member of irq_stat
x86: Add per-cpu stat counter for APIC ICR read tries
pci, x86/io-apic: Allow PCI_IOAPIC to be user configurable on x86
x86: Fix the !CONFIG_NUMA build of the new CPU ID fixup code support
x86: Add NumaChip support
x86: Add x86_init platform override to fix up NUMA core numbering
x86: Make flat_init_apic_ldr() available
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
and it fixes the build error in the arch/x86/kernel/microcode_core.c
file, that the merge did not catch.
The microcode_core.c patch was provided by Stephen Rothwell
<sfr@canb.auug.org.au> who was invaluable in the merge issues involved
with the large sysdev removal process in the driver-core tree.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (106 commits)
perf kvm: Fix copy & paste error in description
perf script: Kill script_spec__delete
perf top: Fix a memory leak
perf stat: Introduce get_ratio_color() helper
perf session: Remove impossible condition check
perf tools: Fix feature-bits rework fallout, remove unused variable
perf script: Add generic perl handler to process events
perf tools: Use for_each_set_bit() to iterate over feature flags
perf tools: Unify handling of features when writing feature section
perf report: Accept fifos as input file
perf tools: Moving code in some files
perf tools: Fix out-of-bound access to struct perf_session
perf tools: Continue processing header on unknown features
perf tools: Improve macros for struct feature_ops
perf: builtin-record: Document and check that mmap_pages must be a power of two.
perf: builtin-record: Provide advice if mmap'ing fails with EPERM.
perf tools: Fix truncated annotation
perf script: look up thread using tid instead of pid
perf tools: Look up thread names for system wide profiling
perf tools: Fix comm for processes with named threads
...
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
cpu: Export cpu_up()
rcu: Apply ACCESS_ONCE() to rcu_boost() return value
Revert "rcu: Permit rt_mutex_unlock() with irqs disabled"
docs: Additional LWN links to RCU API
rcu: Augment rcu_batch_end tracing for idle and callback state
rcu: Add rcutorture tests for srcu_read_lock_raw()
rcu: Make rcutorture test for hotpluggability before offlining CPUs
driver-core/cpu: Expose hotpluggability to the rest of the kernel
rcu: Remove redundant rcu_cpu_stall_suppress declaration
rcu: Adaptive dyntick-idle preparation
rcu: Keep invoking callbacks if CPU otherwise idle
rcu: Irq nesting is always 0 on rcu_enter_idle_common
rcu: Don't check irq nesting from rcu idle entry/exit
rcu: Permit dyntick-idle with callbacks pending
rcu: Document same-context read-side constraints
rcu: Identify dyntick-idle CPUs on first force_quiescent_state() pass
rcu: Remove dynticks false positives and RCU failures
rcu: Reduce latency of rcu_prepare_for_idle()
rcu: Eliminate RCU_FAST_NO_HZ grace-period hang
rcu: Avoid needlessly IPIing CPUs at GP end
...