Pull x86 paravirt changes from Ingo Molnar:
"Hypervisor signature detection cleanup and fixes - the goal is to make
KVM guests run better on MS/Hyperv and to generalize and factor out
the code a bit"
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Correctly detect hypervisor
x86, kvm: Switch to use hypervisor_cpuid_base()
xen: Switch to use hypervisor_cpuid_base()
x86: Introduce hypervisor_cpuid_base()
Pull x86 mm changes from Ingo Molnar:
"Misc smaller fixes:
- a parse_setup_data() boot crash fix
- a memblock and an __early_ioremap cleanup
- turn the always-on CONFIG_ARCH_MEMORY_PROBE=y into a configurable
option and turn it off - it's an unrobust debug facility, it
shouldn't be enabled by default"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: avoid remapping data in parse_setup_data()
x86: Use memblock_set_current_limit() to set limit for memblock.
mm: Remove unused variable idx0 in __early_ioremap()
mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default
Pull x86 relocation changes from Ingo Molnar:
"This tree contains a single change, ELF relocation handling in C - one
of the kernel randomization patches that makes sense even without
randomization present upstream"
* 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, relocs: Move ELF relocation handling to C
Pull x86 fb changes from Ingo Molnar:
"This tree includes preparatory patches for SimpleDRM driver support,
by David Herrmann. They clean up x86 framebuffer support by creating
simplefb devices wherever possible. More background can be found at
http://lwn.net/Articles/558104/"
* 'x86-fb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
fbdev: fbcon: select VT_HW_CONSOLE_BINDING
fbdev: efifb: bind to efi-framebuffer
fbdev: vesafb: bind to platform-framebuffer device
fbdev: simplefb: add common x86 RGB formats
x86: sysfb: move EFI quirks from efifb to sysfb
x86: provide platform-devices for boot-framebuffers
fbdev: simplefb: mark as fw and allocate apertures
fbdev: simplefb: add init through platform_data
Pull x86 cpu feature fixes from Ingo Molnar:
"Two small cpufeature support updates"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Fix override new_cpu_data.x86 with 486
x86, cpufeature: Use new CC_HAVE_ASM_GOTO
Pull x86/asmlinkage changes from Ingo Molnar:
"As a preparation for Andi Kleen's LTO patchset (link time
optimizations using GCC's -flto which build time optimization has
steadily increased in quality over the past few years and might
eventually be usable for the kernel too) this tree includes a handful
of preparatory patches that make function calling convention
annotations consistent again:
- Mark every function without arguments (or 64bit only) that is used
by assembly code with asmlinkage()
- Mark every function with parameters or variables that is used by
assembly code as __visible.
For the vanilla kernel this has documentation, consistency and
debuggability advantages, for the time being"
* 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asmlinkage: Fix warning in xen asmlinkage change
x86, asmlinkage, vdso: Mark vdso variables __visible
x86, asmlinkage, power: Make various symbols used by the suspend asm code visible
x86, asmlinkage: Make dump_stack visible
x86, asmlinkage: Make 64bit checksum functions visible
x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops
x86, asmlinkage, apm: Make APM data structure used from assembler visible
x86, asmlinkage: Make syscall tables visible
x86, asmlinkage: Make several variables used from assembler/linker script visible
x86, asmlinkage: Make kprobes code visible and fix assembler code
x86, asmlinkage: Make various syscalls asmlinkage
x86, asmlinkage: Make 32bit/64bit __switch_to visible
x86, asmlinkage: Make _*_start_kernel visible
x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible
x86, asmlinkage: Change dotraplinkage into __visible on 32bit
x86: Fix sys_call_table type in asm/syscall.h
Pull x86/asm changes from Ingo Molnar:
"Main changes:
- Apply low level mutex optimization on x86-64, by Wedson Almeida
Filho.
- Change bitops to be naturally 'long', by H Peter Anvin.
- Add TSX-NI opcodes support to the x86 (instrumentation) decoder, by
Masami Hiramatsu.
- Add clang compatibility adjustments/workarounds, by Jan-Simon
Möller"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, doc: Update uaccess.h comment to reflect clang changes
x86, asm: Fix a compilation issue with clang
x86, asm: Extend definitions of _ASM_* with a raw format
x86, insn: Add new opcodes as of June, 2013
x86/ia32/asm: Remove unused argument in macro
x86, bitops: Change bitops to be native operand size
x86: Use asm-goto to implement mutex fast path on x86-64
Pull x86/apic changes from Ingo Molnar:
"Smaller fixes"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioapic: Check attr against the previous setting when programmed more than once
x86/ioapic/kcrash: Prevent crash_kexec() from deadlocking on ioapic_lock
x86/acpi: Fix incorrect sanity check in acpi_register_lapic()
Pull scheduler changes from Ingo Molnar:
"Various optimizations, cleanups and smaller fixes - no major changes
in scheduler behavior"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix the sd_parent_degenerate() code
sched/fair: Rework and comment the group_imb code
sched/fair: Optimize find_busiest_queue()
sched/fair: Make group power more consistent
sched/fair: Remove duplicate load_per_task computations
sched/fair: Shrink sg_lb_stats and play memset games
sched: Clean-up struct sd_lb_stat
sched: Factor out code to should_we_balance()
sched: Remove one division operation in find_busiest_queue()
sched/cputime: Use this_cpu_add() in task_group_account_field()
cpumask: Fix cpumask leak in partition_sched_domains()
sched/x86: Optimize switch_mm() for multi-threaded workloads
generic-ipi: Kill unnecessary variable - csd_flags
numa: Mark __node_set() as __always_inline
sched/fair: Cleanup: remove duplicate variable declaration
sched/__wake_up_sync_key(): Fix nr_exclusive tasks which lead to WF_SYNC clearing
Pull perf changes from Ingo Molnar:
"As a first remark I'd like to point out that the obsolete '-f'
(--force) option, which has not done anything for several releases,
has been removed from 'perf record' and related utilities. Everyone
please update muscle memory accordingly! :-)
Main changes on the perf kernel side:
- Performance optimizations:
. for trace events, by Steve Rostedt.
. for time values, by Peter Zijlstra
- New hardware support:
. for Intel Silvermont (22nm Atom) CPUs, by Zheng Yan
. for Intel SNB-EP uncore PMUs, by Zheng Yan
- Enhanced hardware support:
. for Intel uncore PMUs: add filter support for QPI boxes, by Zheng Yan
- Core perf events code enhancements and fixes:
. for full-nohz feature handling, by Frederic Weisbecker
. for group events, by Jiri Olsa
. for call chains, by Frederic Weisbecker
. for event stream parsing, by Adrian Hunter
- New ABI details:
. Add attr->mmap2 attribute, by Stephane Eranian
. Add PERF_EVENT_IOC_ID ioctl to return event ID, by Jiri Olsa
. Export u64 time_zero on the mmap header page to allow TSC
calculation, by Adrian Hunter
. Add dummy software event, by Adrian Hunter.
. Add a new PERF_SAMPLE_IDENTIFIER to make samples always
parseable, by Adrian Hunter.
. Make Power7 events available via sysfs, by Runzhen Wang.
- Code cleanups and refactorings:
. for nohz-full, by Frederic Weisbecker
. for group events, by Jiri Olsa
- Documentation updates:
. for perf_event_type, by Peter Zijlstra
Main changes on the perf tooling side (some of these tooling changes
utilize the above kernel side changes):
- Lots of 'perf trace' enhancements:
. Make 'perf trace' command line arguments consistent with
'perf record', by David Ahern.
. Allow specifying syscalls a la strace, by Arnaldo Carvalho de Melo.
. Add --verbose and -o/--output options, by Arnaldo Carvalho de Melo.
. Support ! in -e expressions, to filter a list of syscalls,
by Arnaldo Carvalho de Melo.
. Arg formatting improvements to allow masking arguments in
syscalls such as futex and open, where the some arguments are
ignored and thus should not be printed depending on other args,
by Arnaldo Carvalho de Melo.
. Beautify futex open, openat, open_by_handle_at, lseek and futex
syscalls, by Arnaldo Carvalho de Melo.
. Add option to analyze events in a file versus live, so that
one can do:
[root@zoo ~]# perf record -a -e raw_syscalls:* sleep 1
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 25.150 MB perf.data (~1098836 samples) ]
[root@zoo ~]# perf trace -i perf.data -e futex --duration 1
17.799 ( 1.020 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, ua
113.344 (95.429 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 4294967
133.778 ( 1.042 ms): 18004 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 429496
[root@zoo ~]#
By David Ahern.
. Honor target pid / tid options when analyzing a file, by David Ahern.
. Introduce better formatting of syscall arguments, including so
far beautifiers for mmap, madvise, syscall return values,
by Arnaldo Carvalho de Melo.
. Handle HUGEPAGE defines in the mmap beautifier, by David Ahern.
- 'perf report/top' enhancements:
. Do annotation using /proc/kcore and /proc/kallsyms when
available, removing the forced need for a vmlinux file kernel
assembly annotation. This also improves this use case because
vmlinux has just the initial kernel image, not what is actually
in use after various code patchings by things like alternatives.
By Adrian Hunter.
. Add --ignore-callees=<regex> option to collapse undesired parts
of call graphs, by Greg Price.
. Simplify symbol filtering by doing it at machine class level,
by Adrian Hunter.
. Add support for callchains in the gtk UI, by Namhyung Kim.
. Add --objdump option to 'perf top', by Sukadev Bhattiprolu.
- 'perf kvm' enhancements:
. Add option to print only events that exceed a specified time
duration, by David Ahern.
. Improve stack trace printing, by David Ahern.
. Update documentation of the live command, by David Ahern
. Add perf kvm stat live mode that combines aspects of 'perf kvm
stat' record and report, by David Ahern.
. Add option to analyze specific VM in perf kvm stat report, by
David Ahern.
. Do not require /lib/modules/* on a guest, by Jason Wessel.
- 'perf script' enhancements:
. Fix symbol offset computation for some dsos, by David Ahern.
. Fix named threads support, by David Ahern.
. Don't install scripting files files when perl/python support
is disabled, by Arnaldo Carvalho de Melo.
- 'perf test' enhancements:
. Add various improvements and fixes to the "vmlinux matches
kallsyms" 'perf test' entry, related to the /proc/kcore
annotation feature. By Adrian Hunter.
. Add sample parsing test, by Adrian Hunter.
. Add test for reading object code, by Adrian Hunter.
. Add attr record group sampling test, by Jiri Olsa.
. Misc testing infrastructure improvements and other details,
by Jiri Olsa.
- 'perf list' enhancements:
. Skip unsupported hardware events, by Namhyung Kim.
. List pmu events, by Andi Kleen.
- 'perf diff' enhancements:
. Add support for more than two files comparison, by Jiri Olsa.
- 'perf sched' enhancements:
. Various improvements, including removing reliance on some
scheduler tracepoints that provide the same information as the
PERF_RECORD_{FORK,EXIT} events. By David Ahern.
. Remove odd build stall by moving a large struct initialization
from a local variable to a global one, by Namhyung Kim.
- 'perf stat' enhancements:
. Add --initial-delay option to skip measuring for a defined
startup phase, by Andi Kleen.
- Generic perf tooling infrastructure/plumbing changes:
. Tidy up sample parsing validation, by Adrian Hunter.
. Fix up jobserver setup in libtraceevent Makefile.
by Arnaldo Carvalho de Melo.
. Debug improvements, by Adrian Hunter.
. Fix correlation of samples coming after PERF_RECORD_EXIT event,
by David Ahern.
. Improve robustness of the topology parsing code,
by Stephane Eranian.
. Add group leader sampling, that allows just one event in a group
to sample while the other events have just its values read,
by Jiri Olsa.
. Add support for a new modifier "D", which requests that the
event, or group of events, be pinned to the PMU.
By Michael Ellerman.
. Support callchain sorting based on addresses, by Andi Kleen
. Prep work for multi perf data file storage, by Jiri Olsa.
. libtraceevent cleanups, by Namhyung Kim.
And lots and lots of other fixes and code reorganizations that did not
make it into the list, see the shortlog, diffstat and the Git log for
details!"
[ Also merge a leftover from the 3.11 cycle ]
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Prevent race in unthrottling code
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (237 commits)
perf trace: Tell arg formatters the arg index
perf trace: Add beautifier for open's flags arg
perf trace: Add beautifier for lseek's whence arg
perf tools: Fix symbol offset computation for some dsos
perf list: Skip unsupported events
perf tests: Add 'keep tracking' test
perf tools: Add support for PERF_COUNT_SW_DUMMY
perf: Add a dummy software event to keep tracking
perf trace: Add beautifier for futex 'operation' parm
perf trace: Allow syscall arg formatters to mask args
perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node()
perf: Export struct perf_branch_entry to userspace
perf: Add attr->mmap2 attribute to an event
perf/x86: Add Silvermont (22nm Atom) support
perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X
perf trace: Handle missing HUGEPAGE defines
perf trace: Honor target pid / tid options when analyzing a file
perf trace: Add option to analyze events in a file versus live
perf evlist: Add tracepoint lookup by name
perf tests: Add a sample parsing test
...
1) ACPI-based PCI hotplug (ACPIPHP) subsystem rework and introduction
of Intel Thunderbolt support on systems that use ACPI for signalling
Thunderbolt hotplug events. This also should make ACPIPHP work in
some cases in which it was known to have problems. From
Rafael J Wysocki, Mika Westerberg and Kirill A Shutemov.
2) ACPI core code cleanups and dock station support cleanups from
Jiang Liu and Rafael J Wysocki.
3) Fixes for locking problems related to ACPI device hotplug from
Rafael J Wysocki.
4) ACPICA update to version 20130725 includig fixes, cleanups, support
for more than 256 GPEs per GPE block and a change to make the ACPI
PM Timer optional (we've seen systems without the PM Timer in the
field already). One of the fixes, related to the DeRefOf operator,
is necessary to prevent some Windows 8 oriented AML from causing
problems to happen. From Bob Moore, Lv Zheng, and Jung-uk Kim.
5) Removal of the old and long deprecated /proc/acpi/event interface
and related driver changes from Thomas Renninger.
6) ACPI and Xen changes to make the reduced hardware sleep work with
the latter from Ben Guthro.
7) ACPI video driver cleanups and a blacklist of systems that should
not tell the BIOS that they are compatible with Windows 8 (or ACPI
backlight and possibly other things will not work on them). From
Felipe Contreras.
8) Assorted ACPI fixes and cleanups from Aaron Lu, Hanjun Guo,
Kuppuswamy Sathyanarayanan, Lan Tianyu, Sachin Kamat, Tang Chen,
Toshi Kani, and Wei Yongjun.
9) cpufreq ondemand governor target frequency selection change to
reduce oscillations between min and max frequencies (essentially,
it causes the governor to choose target frequencies proportional
to load) from Stratos Karafotis.
10) cpufreq fixes allowing sysfs attributes file permissions to be
preserved over suspend/resume cycles Srivatsa S Bhat.
11) Removal of Device Tree parsing for CPU device nodes from multiple
cpufreq drivers that required some changes related to
of_get_cpu_node() to be made in a few architectures and in the
driver core. From Sudeep KarkadaNagesha.
12) cpufreq core fixes and cleanups related to mutual exclusion and
driver module references from Viresh Kumar, Lukasz Majewski and
Rafael J Wysocki.
13) Assorted cpufreq fixes and cleanups from Amit Daniel Kachhap,
Bartlomiej Zolnierkiewicz, Hanjun Guo, Jingoo Han, Joseph Lo,
Julia Lawall, Li Zhong, Mark Brown, Sascha Hauer, Stephen Boyd,
Stratos Karafotis, and Viresh Kumar.
14) Fixes to prevent race conditions in coupled cpuidle from happening
from Colin Cross.
15) cpuidle core fixes and cleanups from Daniel Lezcano and
Tuukka Tikkanen.
16) Assorted cpuidle fixes and cleanups from Daniel Lezcano,
Geert Uytterhoeven, Jingoo Han, Julia Lawall, Linus Walleij,
and Sahara.
17) System sleep tracing changes from Todd E Brandt and Shuah Khan.
18) PNP subsystem conversion to using struct dev_pm_ops for power
management from Shuah Khan.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABCAAGBQJSJcKhAAoJEKhOf7ml8uNsplIQAJSOshxhkkemvFOuHZ+0YIbh
R9aufjXeDkMDBi8YtU+tB7ERth1j+0LUSM0NTnP51U7e+7eSGobA9s5jSZQj2l7r
HFtnSOegLuKAfqwgfSLK91xa1rTFdfW0Kych9G2nuHtBIt6P0Oc59Cb5M0oy6QXs
nVtaDEuU//tmO71+EF5HnMJHabRTrpvtn/7NbDUpU7LZYpWJrHJFT9xt1rXNab7H
YRCATPm3kXGRg58Doc3EZE4G3D7DLvq74jWMaI089X/m5Pg1G6upqArypOy6oxdP
p2FEzYVrb2bi8fakXp7BBeO1gCJTAqIgAkbSSZHLpGhFaeEMmb9/DWPXdm2TjzMV
c1EEucvsqZWoprXgy12i5Hk814xN8d8nBBLg/UYiRJ44nc/hevXfyE9ZYj6bkseJ
+GNHmZIa1QYC05nnGli4+W4kHns8EZf/gmvIxnPuco1RN2yMWagrud5/G6Dr9M2B
hzJV6qauLVzgZso4oe79zv9aVxe/dPHKANLD/sg23WBiJJbJF1ocBlnj2Xlbpqze
pmMUWGiO/gUiS0fmpW/lAJauza5jFmSCjE4E8R0Gyn0j4YXjmMhdEanaU6J3VuCi
yVgEzYEth4sowq4AflMMLKYN+WmozDnK7taZRGmT0t+EKRFKLT6EgnNrkQgs1vKl
oawD9LM4fZ8E0yroOEme
=CgqW
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
1) ACPI-based PCI hotplug (ACPIPHP) subsystem rework and introduction
of Intel Thunderbolt support on systems that use ACPI for signalling
Thunderbolt hotplug events. This also should make ACPIPHP work in
some cases in which it was known to have problems. From
Rafael J Wysocki, Mika Westerberg and Kirill A Shutemov.
2) ACPI core code cleanups and dock station support cleanups from
Jiang Liu and Rafael J Wysocki.
3) Fixes for locking problems related to ACPI device hotplug from
Rafael J Wysocki.
4) ACPICA update to version 20130725 includig fixes, cleanups, support
for more than 256 GPEs per GPE block and a change to make the ACPI
PM Timer optional (we've seen systems without the PM Timer in the
field already). One of the fixes, related to the DeRefOf operator,
is necessary to prevent some Windows 8 oriented AML from causing
problems to happen. From Bob Moore, Lv Zheng, and Jung-uk Kim.
5) Removal of the old and long deprecated /proc/acpi/event interface
and related driver changes from Thomas Renninger.
6) ACPI and Xen changes to make the reduced hardware sleep work with
the latter from Ben Guthro.
7) ACPI video driver cleanups and a blacklist of systems that should
not tell the BIOS that they are compatible with Windows 8 (or ACPI
backlight and possibly other things will not work on them). From
Felipe Contreras.
8) Assorted ACPI fixes and cleanups from Aaron Lu, Hanjun Guo,
Kuppuswamy Sathyanarayanan, Lan Tianyu, Sachin Kamat, Tang Chen,
Toshi Kani, and Wei Yongjun.
9) cpufreq ondemand governor target frequency selection change to
reduce oscillations between min and max frequencies (essentially,
it causes the governor to choose target frequencies proportional
to load) from Stratos Karafotis.
10) cpufreq fixes allowing sysfs attributes file permissions to be
preserved over suspend/resume cycles Srivatsa S Bhat.
11) Removal of Device Tree parsing for CPU device nodes from multiple
cpufreq drivers that required some changes related to
of_get_cpu_node() to be made in a few architectures and in the
driver core. From Sudeep KarkadaNagesha.
12) cpufreq core fixes and cleanups related to mutual exclusion and
driver module references from Viresh Kumar, Lukasz Majewski and
Rafael J Wysocki.
13) Assorted cpufreq fixes and cleanups from Amit Daniel Kachhap,
Bartlomiej Zolnierkiewicz, Hanjun Guo, Jingoo Han, Joseph Lo,
Julia Lawall, Li Zhong, Mark Brown, Sascha Hauer, Stephen Boyd,
Stratos Karafotis, and Viresh Kumar.
14) Fixes to prevent race conditions in coupled cpuidle from happening
from Colin Cross.
15) cpuidle core fixes and cleanups from Daniel Lezcano and
Tuukka Tikkanen.
16) Assorted cpuidle fixes and cleanups from Daniel Lezcano,
Geert Uytterhoeven, Jingoo Han, Julia Lawall, Linus Walleij,
and Sahara.
17) System sleep tracing changes from Todd E Brandt and Shuah Khan.
18) PNP subsystem conversion to using struct dev_pm_ops for power
management from Shuah Khan.
* tag 'pm+acpi-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (217 commits)
cpufreq: Don't use smp_processor_id() in preemptible context
cpuidle: coupled: fix race condition between pokes and safe state
cpuidle: coupled: abort idle if pokes are pending
cpuidle: coupled: disable interrupts after entering safe state
ACPI / hotplug: Remove containers synchronously
driver core / ACPI: Avoid device hot remove locking issues
cpufreq: governor: Fix typos in comments
cpufreq: governors: Remove duplicate check of target freq in supported range
cpufreq: Fix timer/workqueue corruption due to double queueing
ACPI / EC: Add ASUSTEK L4R to quirk list in order to validate ECDT
ACPI / thermal: Add check of "_TZD" availability and evaluating result
cpufreq: imx6q: Fix clock enable balance
ACPI: blacklist win8 OSI for buggy laptops
cpufreq: tegra: fix the wrong clock name
cpuidle: Change struct menu_device field types
cpuidle: Add a comment warning about possible overflow
cpuidle: Fix variable domains in get_typical_interval()
cpuidle: Fix menu_device->intervals type
cpuidle: CodingStyle: Break up multiple assignments on single line
cpuidle: Check called function parameter in get_typical_interval()
...
Here's the big driver core pull request for 3.12-rc1.
Lots of tiny changes here fixing up the way sysfs attributes are
created, to try to make drivers simpler, and fix a whole class race
conditions with creations of device attributes after the device was
announced to userspace.
All the various pieces are acked by the different subsystem maintainers.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.21 (GNU/Linux)
iEYEABECAAYFAlIlIPcACgkQMUfUDdst+ynUMwCaAnITsxyDXYQ4DqEsz8EcOtMk
718AoLrgnUZs3B+70AT34DVktg4HSThk
=USl9
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core patches from Greg KH:
"Here's the big driver core pull request for 3.12-rc1.
Lots of tiny changes here fixing up the way sysfs attributes are
created, to try to make drivers simpler, and fix a whole class race
conditions with creations of device attributes after the device was
announced to userspace.
All the various pieces are acked by the different subsystem
maintainers"
* tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (119 commits)
firmware loader: fix pending_fw_head list corruption
drivers/base/memory.c: introduce help macro to_memory_block
dynamic debug: line queries failing due to uninitialized local variable
sysfs: sysfs_create_groups returns a value.
debugfs: provide debugfs_create_x64() when disabled
rbd: convert bus code to use bus_groups
firmware: dcdbas: use binary attribute groups
sysfs: add sysfs_create/remove_groups for when SYSFS is not enabled
driver core: add #include <linux/sysfs.h> to core files.
HID: convert bus code to use dev_groups
Input: serio: convert bus code to use drv_groups
Input: gameport: convert bus code to use drv_groups
driver core: firmware: use __ATTR_RW()
driver core: core: use DEVICE_ATTR_RO
driver core: bus: use DRIVER_ATTR_WO()
driver core: create write-only attribute macros for devices and drivers
sysfs: create __ATTR_WO()
driver-core: platform: convert bus code to use dev_groups
workqueue: convert bus code to use dev_groups
MEI: convert bus code to use dev_groups
...
Instead of taking the spinlock, the lockless versions atomically check
that the lock is not taken, and do the reference count update using a
cmpxchg() loop. This is semantically identical to doing the reference
count update protected by the lock, but avoids the "wait for lock"
contention that you get when accesses to the reference count are
contended.
Note that a "lockref" is absolutely _not_ equivalent to an atomic_t.
Even when the lockref reference counts are updated atomically with
cmpxchg, the fact that they also verify the state of the spinlock means
that the lockless updates can never happen while somebody else holds the
spinlock.
So while "lockref_put_or_lock()" looks a lot like just another name for
"atomic_dec_and_lock()", and both optimize to lockless updates, they are
fundamentally different: the decrement done by atomic_dec_and_lock() is
truly independent of any lock (as long as it doesn't decrement to zero),
so a locked region can still see the count change.
The lockref structure, in contrast, really is a *locked* reference
count. If you hold the spinlock, the reference count will be stable and
you can modify the reference count without using atomics, because even
the lockless updates will see and respect the state of the lock.
In order to enable the cmpxchg lockless code, the architecture needs to
do three things:
(1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit
in an aligned u64, and have a "cmpxchg()" implementation that works
on such a u64 data type.
(2) define a helper function to test for a spinlock being unlocked
("arch_spin_value_unlocked()")
(3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its
Kconfig file.
This enables it for x86-64 (but not 32-bit, we'd need to make sure
cmpxchg() turns into the proper cmpxchg8b in order to enable it for
32-bit mode).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update comment in uaccess.h to reflect the changes for clang support:
gcc only cares about the base register (most architectures don't
encode the size of the operation in the operands like x86 does, and so
it is treated effectively like a register number), whereas clang tries
to enforce the size -- but not for register pairs.
Link: http://lkml.kernel.org/r/1377803585-5913-3-git-send-email-dl9pf@gmx.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jan-Simon Möller <dl9pf@gmx.de>
Clang does not support the "shortcut" we're taking here for gcc (see below).
The patch uses the macro _ASM_DX to do the job.
From arch/x86/include/asm/uaccess.h:
/*
* Careful: we have to cast the result to the type of the pointer
* for sign reasons.
*
* The use of %edx as the register specifier is a bit of a
* simplification, as gcc only cares about it as the starting point
* and not size: for a 64-bit value it will use %ecx:%edx on 32 bits
* (%ecx being the next register in gcc's x86 register sequence), and
* %rdx on 64 bits.
*/
[ hpa: I consider this a compatibility bug in clang as this reflects a
bit of a misunderstanding about how register strings are used by
gcc, but the workaround is straightforward and there is no
particular reason to not do it. ]
Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de>
Link: http://lkml.kernel.org/r/1377803585-5913-3-git-send-email-dl9pf@gmx.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The __ASM_* macros (e.g. __ASM_DX) are used to return the proper
register name (e.g. edx for 32bit / rdx for 64bit). We want to use
this also in arch/x86/include/asm/uaccess.h / get_user() . For this
to work, we need a raw form as both gcc and clang choke on the
whitespace in a register asm() statement, and the __ASM_FORM macro
surrounds the argument with blanks. A new macro, __ASM_FORM_RAW was
added and we change __ASM_REG to use the new RAW form.
Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de>
Link: http://lkml.kernel.org/r/1377803585-5913-2-git-send-email-dl9pf@gmx.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* pm-cpufreq: (60 commits)
cpufreq: pmac32-cpufreq: remove device tree parsing for cpu nodes
cpufreq: pmac64-cpufreq: remove device tree parsing for cpu nodes
cpufreq: maple-cpufreq: remove device tree parsing for cpu nodes
cpufreq: arm_big_little: remove device tree parsing for cpu nodes
cpufreq: kirkwood-cpufreq: remove device tree parsing for cpu nodes
cpufreq: spear-cpufreq: remove device tree parsing for cpu nodes
cpufreq: highbank-cpufreq: remove device tree parsing for cpu nodes
cpufreq: cpufreq-cpu0: remove device tree parsing for cpu nodes
cpufreq: imx6q-cpufreq: remove device tree parsing for cpu nodes
drivers/bus: arm-cci: avoid parsing DT for cpu device nodes
ARM: mvebu: remove device tree parsing for cpu nodes
ARM: topology: remove hwid/MPIDR dependency from cpu_capacity
of/device: add helper to get cpu device node from logical cpu index
driver/core: cpu: initialize of_node in cpu's device struture
ARM: DT/kernel: define ARM specific arch_match_cpu_phys_id
of: move of_get_cpu_node implementation to DT core library
powerpc: refactor of_get_cpu_node to support other architectures
openrisc: remove undefined of_get_cpu_node declaration
microblaze: remove undefined of_get_cpu_node declaration
cpufreq: fix bad unlock balance on !CONFIG_SMP
...
Prevent crash_kexec() from deadlocking on ioapic_lock. When
crash_kexec() is executed on a CPU, the CPU will take ioapic_lock
in disable_IO_APIC(). So if the cpu gets an NMI while locking
ioapic_lock, a deadlock will happen.
In this patch, ioapic_lock is zapped/initialized before disable_IO_APIC().
You can reproduce this deadlock the following way:
1. Add mdelay(1000) after raw_spin_lock_irqsave() in
native_ioapic_set_affinity()@arch/x86/kernel/apic/io_apic.c
Although the deadlock can occur without this modification, it will increase
the potential of the deadlock problem.
2. Build and install the kernel
3. Set up the OS which will run panic() and kexec when NMI is injected
# echo "kernel.unknown_nmi_panic=1" >> /etc/sysctl.conf
# vim /etc/default/grub
add "nmi_watchdog=0 crashkernel=256M" in GRUB_CMDLINE_LINUX line
# grub2-mkconfig
4. Reboot the OS
5. Run following command for each vcpu on the guest
# while true; do echo <CPU num> > /proc/irq/<IO-APIC-edge or IO-APIC-fasteoi>/smp_affinitity; done;
By running this command, cpus will get ioapic_lock for setting affinity.
6. Inject NMI (push a dump button or execute 'virsh inject-nmi <domain>' if you
use VM). After injecting NMI, panic() is called in an nmi-handler context.
Then, kexec will normally run in panic(), but the operation will be stopped
by deadlock on ioapic_lock in crash_kexec()->machine_crash_shutdown()->
native_machine_crash_shutdown()->disable_IO_APIC()->clear_IO_APIC()->
clear_IO_APIC_pin()->ioapic_read_entry().
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/20130820070107.28245.83806.stgit@yunodevel
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Ingo Molnar:
"Two AMD microcode loader fixes and an OLPC firmware support fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, microcode, AMD: Fix early microcode loading
x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86
x86: Don't clear olpc_ofw_header when sentinel is detected
Merge a bunch of fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
fs/proc/task_mmu.c: fix buffer overflow in add_page_map()
arch: *: Kconfig: add "kernel/Kconfig.freezer" to "arch/*/Kconfig"
ocfs2: fix null pointer dereference in ocfs2_dir_foreach_blk_id()
x86 get_unmapped_area(): use proper mmap base for bottom-up direction
ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page
ocfs2: Revert 40bd62e to avoid regression in extended allocation
drivers/rtc/rtc-stmp3xxx.c: provide timeout for potentially endless loop polling a HW bit
hugetlb: fix lockdep splat caused by pmd sharing
aoe: adjust ref of head for compound page tails
microblaze: fix clone syscall
mm: save soft-dirty bits on file pages
mm: save soft-dirty bits on swapped pages
memcg: don't initialize kmem-cache destroying work for root caches
from accessing cpu_data too early, i.e. before smp_store_cpu_info()
has copied the boot_cpu_data ontop and overwritten an already empty
structure (which we shouldn't access that early in the first place
anyway).
The second patch is kinda largish for that late in the game but it
shouldn't be problematic because we're simply switching from using
cpu_data to use the CPU family number directly and thus again, not use
uninitialized cpu_data structure.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSCT0UAAoJEBLB8Bhh3lVKWNsQAJbu1y79mc2vHR15NqMGXVoP
F6sUNC3iUnNjxlUxWn3PPynsJS9zFTBEo1S6ILnsfuMWliDoH+4psHLMmG5F20TV
XJCSS9s3fdsaJZiNa7JXggdI80xU3y8uB/z3nacRDKgcdw3tX+ITP4SdPgNF5F3k
B/XGAFxyXfl9M9SRFMIO4GS9UyFMsDQP0VWMRhcosY1/Fj5bc9Uds1ngP9E4XMe9
jB1pIzbvtDvtolYSdShwFE9M0os90TYPHVSgriYYXHLZPryZC8mWkc7GwSTl4bG9
EwBGXv79SJWics9vWM2n9EQbFt46fUAyhpjyOo/fqVp4L5XjKuV7UyeT7X+bsY0q
b3z1jWzBxGZ5ltE2BmZ8tVnfKgwYJlKgAf8FbSrylhtBInP80Wau1aYTzw50h9D2
lmjO/rSOhm2FszdeFHG/IeVbSZJbSFrUdwm51nx2xF1qG0MXldy9ehJ4pO3Ui2XA
d0z2y83ZxOYV723fTCa1/5C5xAes7LjvxftM32G9QTu7R5XnvVrfehVfPh9K1DJA
nIEH6pbM358/PT+/q7BCPTnH7KVi+mE633YERDOfAgz/NFnVKfKzLBM0+AyFNHjk
a4xFrOqVbTctW1Chv8Nh1457A2+gcT8yeju1XjbLE8IMqKOGyMt0gnRZlgMYj8BH
WYKWi2q0LaDwsOcaQ9bm
=cxbY
-----END PGP SIGNATURE-----
Merge tag 'amd_ucode_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent
Pull AMD microcode fixes from Borislav Petkov:
" Those are basically two fixes which correct the AMD early ucode loader
from accessing cpu_data too early, i.e. before smp_store_cpu_info()
has copied the boot_cpu_data ontop and overwritten an already empty
structure (which we shouldn't access that early in the first place
anyway).
The second patch is kinda largish for that late in the game but it
shouldn't be problematic because we're simply switching from using
cpu_data to use the CPU family number directly and thus again, not use
uninitialized cpu_data structure. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Type SETUP_PCI, added by setup_efi_pci(), may advertise a ROM size
larger than early_memremap() is able to handle, which is currently
limited to 256kB. If this occurs it leads to a NULL dereference in
parse_setup_data().
To avoid this, remap the setup_data header and allow parsing functions
for individual types to handle their own data remapping.
Signed-off-by: Linn Crosetto <linn@hp.com>
Link: http://lkml.kernel.org/r/1376430401-67445-1-git-send-email-linn@hp.com
Acked-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Andy reported that if file page get reclaimed we lose the soft-dirty bit
if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get
encoded into pte entry. Thus when #pf happens on such non-present pte
we can restore it back.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set
get swapped out, the bit is getting lost and no longer available when
pte read back.
To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in
pte entry for the page being swapped out. When such page is to be read
back from a swap cache we check for bit presence and if it's there we
clear it and restore the former _PAGE_SOFT_DIRTY bit back.
One of the problem was to find a place in pte entry where we can save
the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The _PAGE_PSE was
chosen for that, it doesn't intersect with swap entry format stored in
pte.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is only theoretical, but after try_to_wake_up(p) was changed
to check p->state under p->pi_lock the code like
__set_current_state(TASK_INTERRUPTIBLE);
schedule();
can miss a signal. This is the special case of wait-for-condition,
it relies on try_to_wake_up/schedule interaction and thus it does
not need mb() between __set_current_state() and if(signal_pending).
However, this __set_current_state() can move into the critical
section protected by rq->lock, now that try_to_wake_up() takes
another lock we need to ensure that it can't be reordered with
"if (signal_pending(current))" check inside that section.
The patch is actually one-liner, it simply adds smp_wmb() before
spin_lock_irq(rq->lock). This is what try_to_wake_up() already
does by the same reason.
We turn this wmb() into the new helper, smp_mb__before_spinlock(),
for better documentation and to allow the architectures to change
the default implementation.
While at it, kill smp_mb__after_lock(), it has no callers.
Perhaps we can also add smp_mb__before/after_spinunlock() for
prepare_to_wait().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
load_microcode_amd() (and the helper it is using) should not have an
cpu parameter. The microcode loading does not depend on the CPU wrt the
patches loaded since they will end up in a global list for all CPUs
anyway.
The change from cpu to x86family in load_microcode_amd()
now allows to drop the code messing with cpu_data(cpu) from
collect_cpu_info_amd_early(), which is wrong anyway because at that
point the per-cpu cpu_info is not yet setup (These values would later be
overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info()).
Fold the rest of collect_cpu_info_amd_early() into load_ucode_amd_ap(),
because its only used at one place and without the cpuinfo_x86 accesses
it was not much left.
Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com>
[ Fengguang: build fix ]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
[ Boris: adapt it to current tree. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
OpenFirmware wasn't quite following the protocol described in boot.txt
and the kernel has detected this through use of the sentinel value
in boot_params. OFW does zero out almost all of the stuff that it should
do, but not the sentinel.
This causes the kernel to clear olpc_ofw_header, which breaks x86 OLPC
support.
OpenFirmware has now been fixed. However, it would be nice if we could
maintain Linux compatibility with old firmware versions. To do that, we just
have to avoid zeroing out olpc_ofw_header.
OFW does not write to any other parts of the header that are being zapped
by the sentinel-detection code, and all users of olpc_ofw_header are
somewhat protected through checking for the OLPC_OFW_SIG magic value
before using it. So this should not cause any problems for anyone.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/20130809221420.618E6FAB03@dev.laptop.org
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.9+
Moves the relocation handling into C, after decompression. This requires
that the decompressed size is passed to the decompression routine as
well so that relocations can be found. Only kernels that need relocation
support will use the code (currently just x86_32), but this is laying
the ground work for 64-bit using it in support of KASLR.
Based on work by Neill Clift and Michael Davidson.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130708161517.GA4832@www.outflux.net
Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
- Make all the external assembler template symbols __visible
- Move the templates inline assembler code into a top level
assembler statement, not inside a function. This avoids it being
optimized away or cloned.
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-8-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
FWIW I suspect sys_rt_sigreturn/sys_sigreturn should use
standard SYSCALL wrappers. But I didn't do that change in this
patch.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-7-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
These handlers are all referenced from assembler stubs, so need
to be visible.
The handlers without arguments become asmlinkage, the others __visible
to not force regparms(0) on x86-32.
I put it all into a single patch, please let me know if you want
it it split up.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-4-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Mark 32bit dotraplinkage functions as __visible for LTO.
64bit already is using asmlinkage which includes it.
v2: Clean up (M.Marek)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-3-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Make the sys_call_table type defined in asm/syscall.h match
the definition in syscall_64.c
v2: include asm/syscall.h in syscall_64.c too. I left uml alone
because it doesn't have an syscall.h on its own and including
the native one leads to other errors.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-2-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Richard Weinberger <richard@nod.at>
We try to handle the hypervisor compatibility mode by detecting hypervisor
through a specific order. This is not robust, since hypervisors may implement
each others features.
This patch tries to handle this situation by always choosing the last one in the
CPUID leaves. This is done by letting .detect() return a priority instead of
true/false and just re-using the CPUID leaf where the signature were found as
the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who
has the highest priority. Other sophisticated detection method could also be
implemented on top.
Suggested by H. Peter Anvin and Paolo Bonzini.
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Doug Covelli <dcovelli@vmware.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Hecht <dhecht@vmware.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: http://lkml.kernel.org/r/1374742475-2485-4-git-send-email-jasowang@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Switch to use hypervisor_cpuid_base() to detect KVM.
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: http://lkml.kernel.org/r/1374742475-2485-3-git-send-email-jasowang@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Switch to use hypervisor_cpuid_base() to detect Xen.
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: http://lkml.kernel.org/r/1374742475-2485-2-git-send-email-jasowang@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This patch introduce hypervisor_cpuid_base() which loop test the hypervisor
existence function until the signature match and check the number of leaves if
required. This could be used by Xen/KVM guest to detect the existence of
hypervisor.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: http://lkml.kernel.org/r/1374742475-2485-1-git-send-email-jasowang@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The EFI FB quirks from efifb.c are useful for simple-framebuffer devices
as well. Apply them by default so we can convert efifb.c to use
efi-framebuffer platform devices.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Link: http://lkml.kernel.org/r/1375445127-15480-5-git-send-email-dh.herrmann@gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The current situation regarding boot-framebuffers (VGA, VESA/VBE, EFI) on
x86 causes troubles when loading multiple fbdev drivers. The global
"struct screen_info" does not provide any state-tracking about which
drivers use the FBs. request_mem_region() theoretically works, but
unfortunately vesafb/efifb ignore it due to quirks for broken boards.
Avoid this by creating a platform framebuffer devices with a pointer
to the "struct screen_info" as platform-data. Drivers can now create
platform-drivers and the driver-core will refuse multiple drivers being
active simultaneously.
We keep the screen_info available for backwards-compatibility. Drivers
can be converted in follow-up patches.
Different devices are created for VGA/VESA/EFI FBs to allow multiple
drivers to be loaded on distro kernels. We create:
- "vesa-framebuffer" for VBE/VESA graphics FBs
- "efi-framebuffer" for EFI FBs
- "platform-framebuffer" for everything else
This allows to load vesafb, efifb and others simultaneously and each
picks up only the supported FB types.
Apart from platform-framebuffer devices, this also introduces a
compatibility option for "simple-framebuffer" drivers which recently got
introduced for OF based systems. If CONFIG_X86_SYSFB is selected, we
try to match the screen_info against a simple-framebuffer supported
format. If we succeed, we create a "simple-framebuffer" device instead
of a platform-framebuffer.
This allows to reuse the simplefb.c driver across architectures and also
to introduce a SimpleDRM driver. There is no need to have vesafb.c,
efifb.c, simplefb.c and more just to have architecture specific quirks
in their setup-routines.
Instead, we now move the architecture specific quirks into x86-setup and
provide a generic simple-framebuffer. For backwards-compatibility (if
strange formats are used), we still allow vesafb/efifb to be loaded
simultaneously and pick up all remaining devices.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Link: http://lkml.kernel.org/r/1375445127-15480-4-git-send-email-dh.herrmann@gmail.com
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Dick Fowles, Don Zickus and Joe Mario have been working on
improvements to perf, and noticed heavy cache line contention
on the mm_cpumask, running linpack on a 60 core / 120 thread
system.
The cause turned out to be unnecessary atomic accesses to the
mm_cpumask. When in lazy TLB mode, the CPU is only removed from
the mm_cpumask if there is a TLB flush event.
Most of the time, no such TLB flush happens, and the kernel
skips the TLB reload. It can also skip the atomic memory
set & test.
Here is a summary of Joe's test results:
* The __schedule function dropped from 24% of all program cycles down
to 5.5%.
* The cacheline contention/hotness for accesses to that bitmask went
from being the 1st/2nd hottest - down to the 84th hottest (0.3% of
all shared misses which is now quite cold)
* The average load latency for the bit-test-n-set instruction in
__schedule dropped from 10k-15k cycles down to an average of 600 cycles.
* The linpack program results improved from 133 GFlops to 144 GFlops.
Peak GFlops rose from 133 to 153.
Reported-by: Don Zickus <dzickus@redhat.com>
Reported-by: Joe Mario <jmario@redhat.com>
Tested-by: Joe Mario <jmario@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Paul Turner <pjt@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20130731221421.616d3d20@annuminas.surriel.com
[ Made the comments consistent around the modified code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The target frequency calculation method in the ondemand governor has
changed and it is now independent of the measured average frequency.
Consequently, the APERF/MPERF support in cpufreq is not used any
more, so drop it.
[rjw: Changelog]
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>