* acpica:
ACPICA: Update version to 20130725.
ACPICA: Update names for walk_namespace callbacks to clarify usage.
ACPICA: Return error if DerefOf resolves to a null package element.
ACPICA: Make ACPI Power Management Timer (PM Timer) optional.
ACPICA: Fix divergences of the commit - ACPICA: Expose OSI version.
ACPICA: Fix possible fault for methods that optionally have no return value.
ACPICA: DeRefOf operator: Update to fully resolve FieldUnit and BufferField refs.
ACPICA: Emit all unresolved method externals in a text block
ACPICA: Export acpi_tb_validate_rsdp().
ACPI: Add facility to remove all _OSI strings
ACPI: Add facility to disable all _OSI OS vendor strings
ACPICA: Add acpi_update_interfaces() public interface
ACPICA: Update version to 20130626
ACPICA: Fix compiler warnings for casting issues (only some compilers)
ACPICA: Remove restriction of 256 maximum GPEs in any GPE block
ACPICA: Disassembler: Expand maximum output string length to 64K
ACPICA: TableManager: Export acpi_tb_scan_memory_for_rsdp()
ACPICA: Update comments about behavior when _STA does not exist
The swapaccount kernel parameter without any values has been removed by
commit a2c8990aed ("memsw: remove noswapaccount kernel parameter") but
it seems that we didn't get rid of all the left overs.
Make sure that menuconfig help text and kernel-parameters.txt are clear
about value for the paramter and remove the stalled comment which is not
very much useful on its own.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Gergely Risko <gergely@risko.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The virtual console has (undocumented) module parameters to set the
colors for italic and underlined text, but the default text color was
hardcoded for some reason. This made it impossible to change the color
for startup messages, or to set the default for new virtual consoles.
Add a module parameter for that, and document the entire bunch.
Any hacker who thinks that a command prompt on a "black screen with
white font" is not supicious enough can now use the kernel parameter
vt.color=10 to get a nice, evil green.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch changes the "acpi_osi=" boot parameter implementation so
that:
1. "acpi_osi=!" can be used to disable all _OSI OS vendor strings by
default. It is meaningless to specify "acpi_osi=!" multiple
times as it can only affect the default state of the target _OSI
strings.
2. "acpi_osi=!*" can be used to remove all _OSI OS vendor strings
and all _OSI feature group strings. It is useful to specify
"acpi_osi=!*" multiple times through kernel command line to
override the current state of the target _OSI strings.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch introduces "acpi_osi=!" command line to force Linux replying
"UNSUPPORTED" to all of the _OSI strings. This patch is based on an
ACPICA enhancement - the new API acpi_update_interfaces().
The _OSI object provides the platform with the ability to query OSPM
to determine the set of ACPI related interfaces, behaviors, or
features that the operating system supports. The argument passed to
the _OSI is a string like the followings:
1. Feature Group String, examples include
Module Device
Processor Device
3.0 _SCP Extensions
Processor Aggregator Device
...
2. OS Vendor String, examples include
Linux
FreeBSD
Windows
...
There are AML codes provided in the ACPI namespace written in the
following style to determine OSPM interfaces / features:
Method(OSCK)
{
if (CondRefOf(_OSI, Local0))
{
if (\_OSI("Windows"))
{
Return (One)
}
if (\_OSI("Windows 2006"))
{
Return (Ones)
}
Return (Zero)
}
Return (Zero)
}
There is a debugging facility implemented in Linux. Users can pass
"acpi_osi=" boot parameters to the kernel to tune the _OSI evaluation
result so that certain AML codes can be executed. Current
implementation includes:
1. 'acpi_osi=' - this makes CondRefOf(_OSI, Local0) TRUE
2. 'acpi_osi="Windows"' - this makes \_OSI("Windows") TRUE
3. 'acpi_osi="!Windows"' - this makes \_OSI("Windows") FALSE
The function to implement this feature is also used as a quirk mechanism
in the Linux ACPI subystem.
When _OSI is evaluatated by the AML codes, ACPICA replies "SUPPORTED"
to all Windows operating system vendor strings. This is because
Windows operating systems return "SUPPORTED" if the argument to the
_OSI method specifies an earlier version of Windows. Please refer to
the following MSDN document:
How to Identify the Windows Version in ACPI by Using _OSI
http://msdn.microsoft.com/en-us/library/hardware/gg463275.aspx
This adds difficulties when developers want to feed specific Windows
operating system vendor string to the BIOS codes for debugging
purpose, multiple acpi_osi="!xxx" have to be specified in the command
line to force Linux replying "UNSUPPORTED" to the Windows OS vendor
strings listed in the AML codes.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
were added to 3.10, which includes several bug fixes that have been
marked for stable.
As for new features, there were a few, but nothing to write to LWN about.
These include:
New function trigger called "dump" and "cpudump" that will cause
ftrace to dump its buffer to the console when the function is called.
The difference between "dump" and "cpudump" is that "dump" will dump
the entire contents of the ftrace buffer, where as "cpudump" will only
dump the contents of the ftrace buffer for the CPU that called the function.
Another small enhancement is a new sysctl switch called "traceoff_on_warning"
which, when enabled, will disable tracing if any WARN_ON() is triggered.
This is useful if you want to debug what caused a warning and do not
want to risk losing your trace data by the ring buffer overwriting the
data before you can disable it. There's also a kernel command line
option that will make this enabled at boot up called the same thing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJR1uF2AAoJEOdOSU1xswtMJ1IH/2LSiZAKTA2QaRgGQC/5Bb9c
XSOI1HfD/78lmUvTyb0AX8sLpkzZlvIONEQ/WaZUFo1Zjbrl45zJUwMkTE9uImEg
ZqI5x8OiiN6j4XrRbfYn3Ti060H/Jq41pZXa+shh961Vv51ilv/1yyLkoRmnjzuO
JTloPdXDV7icOqqiSdgxSdtUSv59Ef1ZdHgvvsb3aqzMC5btVQPi4kIys0ST1Tr1
pMWBY+UgvH0xYm3gvTR+W6jjDlkVZEH2alkmcinfr+uC1tm9DDqK2HA17Pd5yZ5z
HNdT76lCzf9iqRF5F8HUvUt+PIp76dNNxAt2qpB6APqAuJTojyguxXHDbY/0kzs=
=UvLi
-----END PGP SIGNATURE-----
Merge tag 'trace-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing changes from Steven Rostedt:
"The majority of the changes here are cleanups for the large changes
that were added to 3.10, which includes several bug fixes that have
been marked for stable.
As for new features, there were a few, but nothing to write to LWN
about. These include:
New function trigger called "dump" and "cpudump" that will cause
ftrace to dump its buffer to the console when the function is called.
The difference between "dump" and "cpudump" is that "dump" will dump
the entire contents of the ftrace buffer, where as "cpudump" will only
dump the contents of the ftrace buffer for the CPU that called the
function.
Another small enhancement is a new sysctl switch called
"traceoff_on_warning" which, when enabled, will disable tracing if any
WARN_ON() is triggered. This is useful if you want to debug what
caused a warning and do not want to risk losing your trace data by the
ring buffer overwriting the data before you can disable it. There's
also a kernel command line option that will make this enabled at boot
up called the same thing"
* tag 'trace-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (34 commits)
tracing: Make tracing_open_generic_{tr,tc}() static
tracing: Remove ftrace() function
tracing: Remove TRACE_EVENT_TYPE enum definition
tracing: Make tracer_tracing_{off,on,is_on}() static
tracing: Fix irqs-off tag display in syscall tracing
uprobes: Fix return value in error handling path
tracing: Fix race between deleting buffer and setting events
tracing: Add trace_array_get/put() to event handling
tracing: Get trace_array ref counts when accessing trace files
tracing: Add trace_array_get/put() to handle instance refs better
tracing: Protect ftrace_trace_arrays list in trace_events.c
tracing: Make trace_marker use the correct per-instance buffer
ftrace: Do not run selftest if command line parameter is set
tracing/kprobes: Don't pass addr=ip to perf_trace_buf_submit()
tracing: Use flag buffer_disabled for irqsoff tracer
tracing/kprobes: Turn trace_probe->files into list_head
tracing: Fix disabling of soft disable
tracing: Add missing syscall_metadata comment
tracing: Simplify code for showing of soft disabled flag
tracing/kprobes: Kill probe_enable_lock
...
Merge together the unicore32, arm, and x86 reboot= command line
parameter handling.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull libata updates from Tejun Heo:
"Overview of changes:
- The rest of maintainer email address updates.
- Some core updates - more robust default behavior for port
multipliers, better error reporting for SG_IO commands, and a way
to better work around now ancient and probably pretty rare PATA ->
SATA bridges with ATAPI devices.
- sata_rcar stabilization.
- Some hardware PCI ID additions and one-off low level driver
updates."
* 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (22 commits)
AHCI: use ATA_BUSY
libata-zpodd: must use ata_tf_init()
ahci: AHCI-mode SATA patch for Intel Coleto Creek DeviceIDs
ata_piix: IDE-mode SATA patch for Intel Coleto Creek DeviceIDs
libata: cleanup SAT error translation
ahci: sata: add support for exynos5440 sata
libata: skip SRST for all SIMG [34]7x port-multipliers
ahci: remove pmp link online check in FBS EH
sata highbank: add bit-banged SGPIO driver support
ahci: make ahci_transmit_led_message into a function pointer
sata_rcar: fix compilation warning in sata_rcar_thaw()
sata_highbank: increase retry count but shorten duration for Calxeda controller
ata: use pci_get_drvdata()
ipr: qc_fill_rtf() method should not store alternate status register
sata_rcar: add 'base' local variable to some functions
sata_rcar: correct 'sata_rcar_sht'
sata_rcar: kill superfluous code in sata_rcar_bmdma_fill_sg()
libata: do not limit R-Car SATA driver to shmobile
ata: use platform_{get,set}_drvdata()
AHCI: Make distinct names for ports in /proc/interrupts
...
- Hotplug changes allowing device hot-removal operations to fail
gracefully (instead of crashing the kernel) if they cannot be
carried out completely. From Rafael J Wysocki and Toshi Kani.
- Freezer update from Colin Cross and Mandeep Singh Baines targeted
at making the freezing of tasks a bit less heavy weight operation.
- cpufreq resume fix from Srivatsa S Bhat for a regression introduced
during the 3.10 cycle causing some cpufreq sysfs attributes to
return wrong values to user space after resume.
- New freqdomain_cpus sysfs attribute for the acpi-cpufreq driver to
provide information previously available via related_cpus from
Lan Tianyu.
- cpufreq fixes and cleanups from Viresh Kumar, Jacob Shin,
Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia, Arnd Bergmann, and
Tang Yuantian.
- Fix for an ACPICA regression causing suspend/resume issues to
appear on some systems introduced during the 3.4 development cycle
from Lv Zheng.
- ACPICA fixes and cleanups from Bob Moore, Tomasz Nowicki, Lv Zheng,
Chao Guan, and Zhang Rui.
- New cupidle driver for Xilinx Zynq processors from Michal Simek.
- cpuidle fixes and cleanups from Daniel Lezcano.
- Changes to make suspend/resume work correctly in Xen guests from
Konrad Rzeszutek Wilk.
- ACPI device power management fixes and cleanups from Fengguang Wu
and Rafael J Wysocki.
- ACPI documentation updates from Lv Zheng, Aaron Lu and Hanjun Guo.
- Fix for the IA-64 issue that was the reason for reverting commit
9f29ab1 and updates of the ACPI scan code from Rafael J Wysocki.
- Mechanism for adding CMOS RTC address space handlers from Lan Tianyu
(to allow some EC-related breakage to be fixed on some systems).
- Spec-compliant implementation of acpi_os_get_timer() from
Mika Westerberg.
- Modification of do_acpi_find_child() to execute _STA in order to
to avoid situations in which a pointer to a disabled device object
is returned instead of an enabled one with the same _ADR value.
From Jeff Wu.
- Intel BayTrail PCH (Platform Controller Hub) support for the ACPI
Intel Low-Power Subsystems (LPSS) driver and modificaions of that
driver to work around a couple of known BIOS issues from
Mika Westerberg and Heikki Krogerus.
- EC driver fix from Vasiliy Kulikov to make it use get_user() and
put_user() instead of dereferencing user space pointers blindly.
- Assorted ACPI code cleanups from Bjorn Helgaas, Nicholas Mazzuca and
Toshi Kani.
- Modification of the "runtime idle" helper routine to take the return
values of the callbacks executed by it into account and to call
rpm_suspend() if they return 0, which allows some code bloat
reduction to be done, from Rafael J Wysocki and Alan Stern.
- New trace points for PM QoS from Sahara <keun-o.park@windriver.com>.
- PM QoS documentation update from Lan Tianyu.
- Assorted core PM code cleanups and changes from Bernie Thompson,
Bjorn Helgaas, Julius Werner, and Shuah Khan.
- New devfreq driver for the Exynos5-bus device from Abhilash Kesavan.
- Minor devfreq cleanups, fixes and MAINTAINERS update from
MyungJoo Ham, Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and
Wei Yongjun.
- OMAP Adaptive Voltage Scaling (AVS) SmartReflex voltage control
driver updates from Andrii Tseglytskyi and Nishanth Menon.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJR0ZNOAAoJEKhOf7ml8uNsDLYP/0EU4rmvw0TWTITfp6RS1KDE
9GwBn96ZR4Q5bJd9gBCTPSqhHOYMqxWEUp99sn/M2wehG1pk/jw5LO56+2IhM3UZ
g1HDcJ7te2nVT/iXsKiAGTVhU9Rk0aYwoVSknwk27qpIBGxW9w/s5tLX8pY3Q3Zq
wL/7aTPjyL+PFFFEaxgH7qLqsl3DhbtYW5AriUBTkXout/tJ4eO1b7MNBncLDh8X
VQ/0DNCKE95VEJfkO4rk9RKUyVp9GDn0i+HXCD/FS4IA5oYzePdVdNDmXf7g+swe
CGlTZq8pB+oBpDiHl4lxzbNrKQjRNbGnDUkoRcWqn0nAw56xK+vmYnWJhW99gQ/I
fKnvxeLca5po1aiqmC4VSJxZIatFZqLrZAI4dzoCLWY+bGeTnCKmj0/F8ytFnZA2
8IuLLs7/dFOaHXV/pKmpg6FAlFa9CPxoqRFoyqb4M0GjEarADyalXUWsPtG+6xCp
R/p0CISpwk+guKZR/qPhL7M654S7SHrPwd2DPF0KgGsvk+G2GhoB8EzvD8BVp98Z
9siCGCdgKQfJQVI6R0k9aFmn/4gRQIAgyPhkhv9tqULUUkiaXki+/t8kPfnb8O/d
zep+CA57E2G8MYLkDJfpFeKS7GpPD6TIdgFdGmOUC0Y6sl9iTdiw4yTx8O2JM37z
rHBZfYGkJBrbGRu+Q1gs
=VBBq
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"This time the total number of ACPI commits is slightly greater than
the number of cpufreq commits, but Viresh Kumar (who works on cpufreq)
remains the most active patch submitter.
To me, the most significant change is the addition of offline/online
device operations to the driver core (with the Greg's blessing) and
the related modifications of the ACPI core hotplug code. Next are the
freezer updates from Colin Cross that should make the freezing of
tasks a bit less heavy weight.
We also have a couple of regression fixes, a number of fixes for
issues that have not been identified as regressions, two new drivers
and a bunch of cleanups all over.
Highlights:
- Hotplug changes to support graceful hot-removal failures.
It sometimes is necessary to fail device hot-removal operations
gracefully if they cannot be carried out completely. For example,
if memory from a memory module being hot-removed has been allocated
for the kernel's own use and cannot be moved elsewhere, it's
desirable to fail the hot-removal operation in a graceful way
rather than to crash the kernel, but currenty a success or a kernel
crash are the only possible outcomes of an attempted memory
hot-removal. Needless to say, that is not a very attractive
alternative and it had to be addressed.
However, in order to make it work for memory, I first had to make
it work for CPUs and for this purpose I needed to modify the ACPI
processor driver. It's been split into two parts, a resident one
handling the low-level initialization/cleanup and a modular one
playing the actual driver's role (but it binds to the CPU system
device objects rather than to the ACPI device objects representing
processors). That's been sort of like a live brain surgery on a
patient who's riding a bike.
So this is a little scary, but since we found and fixed a couple of
regressions it caused to happen during the early linux-next testing
(a month ago), nobody has complained.
As a bonus we remove some duplicated ACPI hotplug code, because the
ACPI-based CPU hotplug is now going to use the common ACPI hotplug
code.
- Lighter weight freezing of tasks.
These changes from Colin Cross and Mandeep Singh Baines are
targeted at making the freezing of tasks a bit less heavy weight
operation. They reduce the number of tasks woken up every time
during the freezing, by using the observation that the freezer
simply doesn't need to wake up some of them and wait for them all
to call refrigerator(). The time needed for the freezer to decide
to report a failure is reduced too.
Also reintroduced is the check causing a lockdep warining to
trigger when try_to_freeze() is called with locks held (which is
generally unsafe and shouldn't happen).
- cpufreq updates
First off, a commit from Srivatsa S Bhat fixes a resume regression
introduced during the 3.10 cycle causing some cpufreq sysfs
attributes to return wrong values to user space after resume. The
fix is kind of fresh, but also it's pretty obvious once Srivatsa
has identified the root cause.
Second, we have a new freqdomain_cpus sysfs attribute for the
acpi-cpufreq driver to provide information previously available via
related_cpus. From Lan Tianyu.
Finally, we fix a number of issues, mostly related to the
CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean
up some code. The majority of changes from Viresh Kumar with bits
from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia,
Arnd Bergmann, and Tang Yuantian.
- ACPICA update
A usual bunch of updates from the ACPICA upstream.
During the 3.4 cycle we introduced support for ACPI 5 extended
sleep registers, but they are only supposed to be used if the
HW-reduced mode bit is set in the FADT flags and the code attempted
to use them without checking that bit. That caused suspend/resume
regressions to happen on some systems. Fix from Lv Zheng causes
those registers to be used only if the HW-reduced mode bit is set.
Apart from this some other ACPICA bugs are fixed and code cleanups
are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and
Zhang Rui.
- cpuidle updates
New driver for Xilinx Zynq processors is added by Michal Simek.
Multidriver support simplification, addition of some missing
kerneldoc comments and Kconfig-related fixes come from Daniel
Lezcano.
- ACPI power management updates
Changes to make suspend/resume work correctly in Xen guests from
Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and
cleanups and fixes of the ACPI device power state selection
routine.
- ACPI documentation updates
Some previously missing pieces of ACPI documentation are added by
Lv Zheng and Aaron Lu (hopefully, that will help people to
uderstand how the ACPI subsystem works) and one outdated doc is
updated by Hanjun Guo.
- Assorted ACPI updates
We finally nailed down the IA-64 issue that was the reason for
reverting commit 9f29ab11dd ("ACPI / scan: do not match drivers
against objects having scan handlers"), so we can fix it and move
the ACPI scan handler check added to the ACPI video driver back to
the core.
A mechanism for adding CMOS RTC address space handlers is
introduced by Lan Tianyu to allow some EC-related breakage to be
fixed on some systems.
A spec-compliant implementation of acpi_os_get_timer() is added by
Mika Westerberg.
The evaluation of _STA is added to do_acpi_find_child() to avoid
situations in which a pointer to a disabled device object is
returned instead of an enabled one with the same _ADR value. From
Jeff Wu.
Intel BayTrail PCH (Platform Controller Hub) support is added to
the ACPI driver for Intel Low-Power Subsystems (LPSS) and that
driver is modified to work around a couple of known BIOS issues.
Changes from Mika Westerberg and Heikki Krogerus.
The EC driver is fixed by Vasiliy Kulikov to use get_user() and
put_user() instead of dereferencing user space pointers blindly.
Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi
Kani.
- Assorted power management updates
The "runtime idle" helper routine is changed to take the return
values of the callbacks executed by it into account and to call
rpm_suspend() if they return 0, which allows us to reduce the
overall code bloat a bit (by dropping some code that's not
necessary any more after that modification).
The runtime PM documentation is updated by Alan Stern (to reflect
the "runtime idle" behavior change).
New trace points for PM QoS are added by Sahara
(<keun-o.park@windriver.com>).
PM QoS documentation is updated by Lan Tianyu.
Code cleanups are made and minor issues are addressed by Bernie
Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan.
- devfreq updates
New driver for the Exynos5-bus device from Abhilash Kesavan.
Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham,
Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun.
- OMAP power management updates
Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver
updates from Andrii Tseglytskyi and Nishanth Menon."
* tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
cpufreq: Fix cpufreq regression after suspend/resume
ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state()
PM / Sleep: Warn about system time after resume with pm_trace
cpufreq: don't leave stale policy pointer in cdbs->cur_policy
acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
cpufreq: make sure frequency transitions are serialized
ACPI: implement acpi_os_get_timer() according the spec
ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
ACPI: Add CMOS RTC Operation Region handler support
ACPI / processor: Drop unused variable from processor_perflib.c
cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases
...
Pull security subsystem updates from James Morris:
"In this update, Smack learns to love IPv6 and to mount a filesystem
with a transmutable hierarchy (i.e. security labels are inherited
from parent directory upon creation rather than creating process).
The rest of the changes are maintenance"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (37 commits)
tpm/tpm_i2c_infineon: Remove unused header file
tpm: tpm_i2c_infinion: Don't modify i2c_client->driver
evm: audit integrity metadata failures
integrity: move integrity_audit_msg()
evm: calculate HMAC after initializing posix acl on tmpfs
maintainers: add Dmitry Kasatkin
Smack: Fix the bug smackcipso can't set CIPSO correctly
Smack: Fix possible NULL pointer dereference at smk_netlbl_mls()
Smack: Add smkfstransmute mount option
Smack: Improve access check performance
Smack: Local IPv6 port based controls
tpm: fix regression caused by section type conflict of tpm_dev_release() in ppc builds
maintainers: Remove Kent from maintainers
tpm: move TPM_DIGEST_SIZE defintion
tpm_tis: missing platform_driver_unregister() on error in init_tis()
security: clarify cap_inode_getsecctx description
apparmor: no need to delay vfree()
apparmor: fix fully qualified name parsing
apparmor: fix setprocattr arg processing for onexec
apparmor: localize getting the security context to a few macros
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRzhAdAAoJEKurIx+X31iByQEP/1eAEHNiSCrrKUm/ovGCiVBZ
w+ZRJkcKYBj+/3cQUeMds7OevmM8fSU5t+4zGpikobQ75xCjz7vJ6nQNLrSylqQG
Z4weVme6a529szRwv6H9an+psj5wP0yWAbCgBSw4/O3cogDS0eMD8T6ZhEi8mWJ9
PRSdqOLdxl7AZpJpJpP3GF7VQLJLDbVv56FcKgAZ3mtFM6qr7HFiTtnTc+Jl33V5
bM7v23HXyoyRk+JOUMtO+4BlrkKzbtL4Zh8aGw4s71QX3zZbP0BIw9sKffNMD+2k
oYXOcJjTwbtxQqXKoSKOtvLUQzf/SDqzlFf7yU38Spw+nRECPhVVRO0aAXKOmFKD
74qyNuW6ZXMw7xH4M7iG6MuM/cZIvueDtp0RSZVc6Gmb5CGb1aXz4GeotvKF5Bev
nVL65UVo6XwfB4m77ayPYqfD6KX0BsWHdCzWABNl8WbgyCY5dXRJJs/c9iLCl05I
EXRgWLJK/H/wOY9mj3NGcvRicYJigZP2luJEdwn8mVlpZl+3wstDcgq9Ijxr0LGi
bj2H9ro3vSaXpAnJw3FR9QbXgH/8Nbb8sV7yBddy9zntrnfEfvrYtrhOKR+fPrQL
C5zXeaXGEnqCEcYNESSjyhDCa1az49BlJC5ebEHHqdviHzRpVVi/EG920GViWw+D
z+z25N5hi4yoUF/+1vc/
=Dm06
-----END PGP SIGNATURE-----
Merge tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull thermal power-limit update from Tony Luck:
"Thermal limit warnings are too scary and cause unnecessary concern"
* tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
x86 thermal: Disable power limit notification interrupt by default
x86 thermal: Delete power-limit-notification console messages
Pull workqueue changes from Tejun Heo:
"Surprisingly, Lai and I didn't break too many things implementing
custom pools and stuff last time around and there aren't any follow-up
changes necessary at this point.
The only change in this pull request is Viresh's patches to make some
per-cpu workqueues to behave as unbound workqueues dependent on a boot
param whose default can be configured via a config option. This leads
to higher processing overhead / lower bandwidth as more work items are
bounced across CPUs; however, it can lead to noticeable powersave in
certain configurations - ~10% w/ idlish constant workload on a
big.LITTLE configuration according to Viresh.
This is because per-cpu workqueues interfere with how the scheduler
perceives whether or not each CPU is idle by forcing pinned tasks on
them, which makes the scheduler's power-aware scheduling decisions
less effective.
Its effectiveness is likely less pronounced on homogenous
configurations and this type of optimization can probably be made
automatic; however, the changes are pretty minimal and the affected
workqueues are clearly marked, so it's an easy gain for some
configurations for the time being with pretty unintrusive changes."
* 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
fbcon: queue work on power efficient wq
block: queue work on power efficient wq
PHYLIB: queue work on system_power_efficient_wq
workqueue: Add system wide power_efficient workqueues
workqueues: Introduce new flag WQ_POWER_EFFICIENT for power oriented workqueues
Add description for video module's parameter brightness_switch_enabled
into kernel-parameters.txt.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch moves the integrity_audit_msg() function and defintion to
security/integrity/, the parent directory, renames the 'ima_audit'
boot command line option to 'integrity_audit', and fixes the Kconfig
help text to reflect the actual code.
Changelog:
- Fixed ifdef inclusion of integrity_audit_msg() (Fengguang Wu)
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Add a traceoff_on_warning option in both the kernel command line as well
as a sysctl option. When set, any WARN*() function that is hit will cause
the tracing_on variable to be cleared, which disables writing to the
ring buffer.
This is useful especially when tracing a bug with function tracing. When
a warning is hit, the print caused by the warning can flood the trace with
the functions that producing the output for the warning. This can make the
resulting trace useless by either hiding where the bug happened, or worse,
by overflowing the buffer and losing the trace of the bug totally.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The package power limit notification interrupt is primarily for
system diagnosis, and should not be blindly enabled on every
system by default -- particuarly since Linux does nothing in the
handler except count how many times it has been called...
Add a new kernel cmdline parameter "int_pln_enable" for situations where
users want to oberve these events via existing system counters:
$ grep TRM /proc/interrupts
$ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/*
https://bugzilla.kernel.org/show_bug.cgi?id=36182
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Pull block layer fixes from Jens Axboe:
"Outside of bcache (which really isn't super big), these are all
few-liners. There are a few important fixes in here:
- Fix blk pm sleeping when holding the queue lock
- A small collection of bcache fixes that have been done and tested
since bcache was included in this merge window.
- A fix for a raid5 regression introduced with the bio changes.
- Two important fixes for mtip32xx, fixing an oops and potential data
corruption (or hang) due to wrong bio iteration on stacked devices."
* 'for-linus' of git://git.kernel.dk/linux-block:
scatterlist: sg_set_buf() argument must be in linear mapping
raid5: Initialize bi_vcnt
pktcdvd: silence static checker warning
block: remove refs to XD disks from documentation
blkpm: avoid sleep when holding queue lock
mtip32xx: Correctly handle bio->bi_idx != 0 conditions
mtip32xx: Fix NULL pointer dereference during module unload
bcache: Fix error handling in init code
bcache: clarify free/available/unused space
bcache: drop "select CLOSURES"
bcache: Fix incompatible pointer type warning
Some device require DMADIR to be enabled, but are not detected as such
by atapi_id_dmadir. One such example is "Asus Serillel 2"
SATA-host-to-PATA-device bridge: the bridge itself requires DMADIR,
even if the bridged device does not.
As atapi_dmadir module parameter can cause problems with some devices
(as per Tejun Heo's memory), enabling it globally may not be possible
depending on the hardware.
This patch adds atapi_dmadir in the form of a "force" horkage value,
allowing global, per-bus and per-device control.
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Commit d1a6f4f197
"block: delete super ancient PC-XT driver for 1980's hardware"
deleted the XD disk driver, but there are still a few
references to it in the documentation directory. Delete
the remnants and thus also free up the major block device
13 for reuse.
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is no point. We would just squeeze the guest to put more and
more pages in the swap disk without any purpose.
The only time it makes sense to use the selfballooning and shrinking
is when frontswap is being utilized.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If tmem is built-in or a module, the user has the option on
the command line to influence it by doing: tmem.<some option>
instead of having a variety of "nocleancache", and
"nofrontswap". The others: "noselfballooning" and "selfballooning";
and "noselfshrink" are in a different driver xen-selfballoon.c
and the patches:
xen/tmem: Remove the usage of 'noselfshrink' and use 'tmem.selfshrink' bool instead.
xen/tmem: Remove the usage of 'noselfballoon','selfballoon' and use 'tmem.selfballon' bool instead.
remove them.
Also add documentation.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Workqueues can be performance or power-oriented. Currently, most workqueues are
bound to the CPU they were created on. This gives good performance (due to cache
effects) at the cost of potentially waking up otherwise idle cores (Idle from
scheduler's perspective. Which may or may not be physically idle) just to
process some work. To save power, we can allow the work to be rescheduled on a
core that is already awake.
Workqueues created with the WQ_UNBOUND flag will allow some power savings.
However, we don't change the default behaviour of the system. To enable
power-saving behaviour, a new config option CONFIG_WQ_POWER_EFFICIENT needs to
be turned on. This option can also be overridden by the
workqueue.power_efficient boot parameter.
tj: Updated config description and comments. Renamed
CONFIG_WQ_POWER_EFFICIENT to CONFIG_WQ_POWER_EFFICIENT_DEFAULT.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
The updates are mostly about the x86 IOMMUs this time. Exceptions are
the groundwork for the PAMU IOMMU from Freescale (for a PPC platform)
and an extension to the IOMMU group interface. On the x86 side this
includes a workaround for VT-d to disable interrupt remapping on broken
chipsets. On the AMD-Vi side the most important new feature is a kernel
command-line interface to override broken information in IVRS ACPI
tables and get interrupt remapping working this way. Besides that there
are small fixes all over the place.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRh2vAAAoJECvwRC2XARrjbjkP/jzKzeffybUpQsIJF8rs/IEt
hSwqpGLr6WR5FdneEH9fiBIp4pyMDXmuAb/2ZNgB+DgPN3xgqmWVo4WLk7pMo3BS
/xIz/lu7hIX3AtKt807pL9+rPdhGYEJ43Vmr4bW9x0l1kuNXy6fmMLcN5FaPKjV4
p4hY4jOstEgtYQw4wi39/9b4FsYoipZizkOUSdtCzWwTv7jOHH7/Wra8iZyzL6Je
1VlF/efp0ytTcwLdHOfGwPCIlZrQRtQCM4SqdAUG9bOL3ARR9Yu/0iW1295nbLzo
CQX5CfKePvo/fGxki1jcBi+UCyxYKPosB5kCxmh4MAxCg/VzzMsaME/A73tLJa6W
Y29bbjwPoBPMq03HX8S9R5QWY8HpFujUUp+J4TXcKuTgYEV28WfLu1uaeKD716nM
LoXUojov7Cj8ZQZnhyu5l+XNaephBZLfw/8bM6bAxhlKXwAjmLiS5Z+srPl1GJee
5GCV+L94JifHLZaREWh3JFsh9O3W7Wno2++c4JU32uCWJHXH7tMgs2P8n5AY9rnT
Km1a9y6w2MF3Gg9j4y6u75m0XnFTNzYjeJMUtqVlwVhNHhgaXfuIWY63xOQCLJs1
ThTHOjoh0VqONGobR/ywn+0ouo9X07DnWpluyFd+zY3XK0UE0NOu9XMr4i6TWxOf
mlzWoEKxtw36XGHB/FtQ
=MVc/
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"The updates are mostly about the x86 IOMMUs this time.
Exceptions are the groundwork for the PAMU IOMMU from Freescale (for a
PPC platform) and an extension to the IOMMU group interface.
On the x86 side this includes a workaround for VT-d to disable
interrupt remapping on broken chipsets. On the AMD-Vi side the most
important new feature is a kernel command-line interface to override
broken information in IVRS ACPI tables and get interrupt remapping
working this way.
Besides that there are small fixes all over the place."
* tag 'iommu-updates-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (24 commits)
iommu/tegra: Fix printk formats for dma_addr_t
iommu: Add a function to find an iommu group by id
iommu/vt-d: Remove warning for HPET scope type
iommu: Move swap_pci_ref function to drivers/iommu/pci.h.
iommu/vt-d: Disable translation if already enabled
iommu/amd: fix error return code in early_amd_iommu_init()
iommu/AMD: Per-thread IOMMU Interrupt Handling
iommu: Include linux/err.h
iommu/amd: Workaround for ERBT1312
iommu/amd: Document ivrs_ioapic and ivrs_hpet parameters
iommu/amd: Don't report firmware bugs with cmd-line ivrs overrides
iommu/amd: Add ioapic and hpet ivrs override
iommu/amd: Add early maps for ioapic and hpet
iommu/amd: Extend IVRS special device data structure
iommu/amd: Move add_special_device() to __init
iommu: Fix compile warnings with forward declarations
iommu/amd: Properly initialize irq-table lock
iommu/amd: Use AMD specific data structure for irq remapping
iommu/amd: Remove map_sg_no_iommu()
iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets
...
Pull 'full dynticks' support from Ingo Molnar:
"This tree from Frederic Weisbecker adds a new, (exciting! :-) core
kernel feature to the timer and scheduler subsystems: 'full dynticks',
or CONFIG_NO_HZ_FULL=y.
This feature extends the nohz variable-size timer tick feature from
idle to busy CPUs (running at most one task) as well, potentially
reducing the number of timer interrupts significantly.
This feature got motivated by real-time folks and the -rt tree, but
the general utility and motivation of full-dynticks runs wider than
that:
- HPC workloads get faster: CPUs running a single task should be able
to utilize a maximum amount of CPU power. A periodic timer tick at
HZ=1000 can cause a constant overhead of up to 1.0%. This feature
removes that overhead - and speeds up the system by 0.5%-1.0% on
typical distro configs even on modern systems.
- Real-time workload latency reduction: CPUs running critical tasks
should experience as little jitter as possible. The last remaining
source of kernel-related jitter was the periodic timer tick.
- A single task executing on a CPU is a pretty common situation,
especially with an increasing number of cores/CPUs, so this feature
helps desktop and mobile workloads as well.
The cost of the feature is mainly related to increased timer
reprogramming overhead when a CPU switches its tick period, and thus
slightly longer to-idle and from-idle latency.
Configuration-wise a third mode of operation is added to the existing
two NOHZ kconfig modes:
- CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
as a config option. This is the traditional Linux periodic tick
design: there's a HZ tick going on all the time, regardless of
whether a CPU is idle or not.
- CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
periodic tick when a CPU enters idle mode.
- CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
tick when a CPU is idle, also slows the tick down to 1 Hz (one
timer interrupt per second) when only a single task is running on a
CPU.
The .config behavior is compatible: existing !CONFIG_NO_HZ and
CONFIG_NO_HZ=y settings get translated to the new values, without the
user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
default.
This feature is based on a lot of infrastructure work that has been
steadily going upstream in the last 2-3 cycles: related RCU support
and non-periodic cputime support in particular is upstream already.
This tree adds the final pieces and activates the feature. The pull
request is marked RFC because:
- it's marked 64-bit only at the moment - the 32-bit support patch is
small but did not get ready in time.
- it has a number of fresh commits that came in after the merge
window. The overwhelming majority of commits are from before the
merge window, but still some aspects of the tree are fresh and so I
marked it RFC.
- it's a pretty wide-reaching feature with lots of effects - and
while the components have been in testing for some time, the full
combination is still not very widely used. That it's default-off
should reduce its regression abilities and obviously there are no
known regressions with CONFIG_NO_HZ_FULL=y enabled either.
- the feature is not completely idempotent: there is no 100%
equivalent replacement for a periodic scheduler/timer tick. In
particular there's ongoing work to map out and reduce its effects
on scheduler load-balancing and statistics. This should not impact
correctness though, there are no known regressions related to this
feature at this point.
- it's a pretty ambitious feature that with time will likely be
enabled by most Linux distros, and we'd like you to make input on
its design/implementation, if you dislike some aspect we missed.
Without flaming us to crisp! :-)
Future plans:
- there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
the periodic tick altogether when there's a single busy task on a
CPU. We'd first like 1 Hz to be exposed more widely before we go
for the 0 Hz target though.
- once we reach 0 Hz we can remove the periodic tick assumption from
nr_running>=2 as well, by essentially interrupting busy tasks only
as frequently as the sched_latency constraints require us to do -
once every 4-40 msecs, depending on nr_running.
I am personally leaning towards biting the bullet and doing this in
v3.10, like the -rt tree this effort has been going on for too long -
but the final word is up to you as usual.
More technical details can be found in Documentation/timers/NO_HZ.txt"
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
sched: Keep at least 1 tick per second for active dynticks tasks
rcu: Fix full dynticks' dependency on wide RCU nocb mode
nohz: Protect smp_processor_id() in tick_nohz_task_switch()
nohz_full: Add documentation.
cputime_nsecs: use math64.h for nsec resolution conversion helpers
nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
nohz: Reduce overhead under high-freq idling patterns
nohz: Remove full dynticks' superfluous dependency on RCU tree
nohz: Fix unavailable tick_stop tracepoint in dynticks idle
nohz: Add basic tracing
nohz: Select wide RCU nocb for full dynticks
nohz: Disable the tick when irq resume in full dynticks CPU
nohz: Re-evaluate the tick for the new task after a context switch
nohz: Prepare to stop the tick on irq exit
nohz: Implement full dynticks kick
nohz: Re-evaluate the tick from the scheduler IPI
sched: New helper to prevent from stopping the tick in full dynticks
sched: Kick full dynticks CPU that have more than one task enqueued.
perf: New helper to prevent full dynticks CPUs from stopping tick
perf: Kick full dynticks CPU if events rotation is needed
...
This branch contains platform updates for 3.10. Among the highlights:
- Support for the new Atmel Cortex-A5 based platforms (SAMA5D3)
- New support for CSR SiRFatlas6 SoCs
- A handful of updates for NVidia T114 (a.k.a. Tegra 4)
- A bunch of updates for the shmobile platforms
- A handful of updates for davinci
- A few updates for Qualcomm MSM
- Plus a handful of other patches, defconfig updates, etc.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRgg+LAAoJEIwa5zzehBx3ePcP/3NUsSOTRQ2SZIVpyjnWOhkf
RMZiRaVsxrY0BPfDB9E2Vcb6lannKmACTujs/Ux7kJC22BreuFM1PnZoDfhkRuSE
n/nVB1981XJS82z2uONRSZGlUPSGWYzhTTUDJ0nHiBGmIGf5ctnC0iYWp3As3lv9
kNY14H7NkwQ4zBVNEMu7WfW8d2IJgqZJgR9xhZPv5fOZ+LlQmK6VaHWTmQtjyea1
bG1qoJ0dPbfJB4Vnr3a49rBkSJxZUiv8xQucw9+vo+ADRi64M4sZ1Jj2vVyDpqZp
F4fxBNMVvg7xM0TcBbItFFYJBXlUjeT4z+UI5iYjkbnE7EV9ndFeZXHCWX1qzOSy
X/nrJKuoe7ISQanBE9SHS9DpDGlkPDO0Mn0vb1f2VUQOY513pt/D1iFYEucZ6WCN
fWUYtvt5GayidUr55D1U8ssbE0oGt2rizd9x7GUk4KbRVAnUUNopIQAhXrefTrZm
jfdZNDckJ2F3aq8IPjsKuyJTpe61xD4Wvb3P/pEE3Q8fowPF5WIxXV+qjqHQ9vtt
Tz4LkP/YdynVFGmhOwz3QZmPaQItaabaYyCcZ5cVCvt5mdxx5VuHYppafhCPJz+V
KCQpKi1azuIv+sDR+nlGOl6+Ideea3s7TsRudfbmQFp5GsqkqOdJzR9gbbKmJauQ
4JPpRd+4W8wC8zXQnhVY
=HXX3
-----END PGP SIGNATURE-----
Merge tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC platform updates from Olof Johansson:
"This branch contains part 1 of the platform updates for 3.10. Among
the highlights:
- Support for the new Atmel Cortex-A5 based platforms (SAMA5D3)
- New support for CSR SiRFatlas6 SoCs
- A handful of updates for NVidia T114 (a.k.a. Tegra 4)
- A bunch of updates for the shmobile platforms
- A handful of updates for davinci
- A few updates for Qualcomm MSM
- Plus a handful of other patches, defconfig updates, etc."
* tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (135 commits)
ARM: tegra: pm: fix build error w/o PM_SLEEP
ARM: davinci: ensure global variables are declared
ARM: davinci: sram.c: fix incorrect type in assignment
ARM: davinci: da8xx dt: make file local symbols static
ARM: davinci: da8xx: add remoteproc support
ARM: socfpga: Upgrade clk driver for socfpga to make use of dts clock entries
ARM: socfpga: Add clock entries into device tree
ARM: socfpga: Enable soft reset
ARM: EXYNOS: replace cpumask by the corresponding macro
ARM: EXYNOS: handle properly the return values
ARM: EXYNOS: factor out the idle states
ARM: OMAP4: Enable fix for Cortex-A9 erratas
ARM: OMAP2+: Export SoC information to userspace
ARM: OMAP2+: SoC name and revision unification
ARM: OMAP2+: Move common part of late init into common function
ARM: tegra: pm: remove duplicated include from pm.c
ARM: davinci: da850: override mmc DT node device name
ARM: davinci: da850: add mmc DT entries
mmc: davinci_mmc: add DT support
ARM: SAMSUNG: check processor type before cache restoration in resume
...
The full dynticks tree needs the latest RCU and sched
upstream updates in order to fix some dependencies.
Merge a common upstream merge point that has these
updates.
Conflicts:
include/linux/perf_event.h
kernel/rcutree.h
kernel/rcutree_plugin.h
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Pull x86 debug update from Ingo Molnar:
"Two small changes: a documentation update and a constification"
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, early-printk: Update earlyprintk documentation (and kill x86 copy)
x86: Constify a few items
Pull RCU updates from Ingo Molnar:
"The main changes in this cycle are mostly related to preparatory work
for the full-dynticks work:
- Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ take
advantage of numbered callbacks, do callback accelerations based on
numbered callbacks. Posted to LKML at
https://lkml.org/lkml/2013/3/18/960
- RCU documentation updates. Posted to LKML at
https://lkml.org/lkml/2013/3/18/570
- Miscellaneous fixes. Posted to LKML at
https://lkml.org/lkml/2013/3/18/594"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
rcu: Make rcu_accelerate_cbs() note need for future grace periods
rcu: Abstract rcu_start_future_gp() from rcu_nocb_wait_gp()
rcu: Rename n_nocb_gp_requests to need_future_gp
rcu: Push lock release to rcu_start_gp()'s callers
rcu: Repurpose no-CBs event tracing to future-GP events
rcu: Rearrange locking in rcu_start_gp()
rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks
rcu: Accelerate RCU callbacks at grace-period end
rcu: Export RCU_FAST_NO_HZ parameters to sysfs
rcu: Distinguish "rcuo" kthreads by RCU flavor
rcu: Add event tracing for no-CBs CPUs' grace periods
rcu: Add event tracing for no-CBs CPUs' callback registration
rcu: Introduce proper blocking to no-CBs kthreads GP waits
rcu: Provide compile-time control for no-CBs CPUs
rcu: Tone down debugging during boot-up and shutdown.
rcu: Add softirq-stall indications to stall-warning messages
rcu: Documentation update
rcu: Make bugginess of code sample more evident
rcu: Fix hlist_bl_set_first_rcu() annotation
rcu: Delete unused rcu_node "wakemask" field
...
Pull workqueue updates from Tejun Heo:
"A lot of activities on workqueue side this time. The changes achieve
the followings.
- WQ_UNBOUND workqueues - the workqueues which are per-cpu - are
updated to be able to interface with multiple backend worker pools.
This involved a lot of churning but the end result seems actually
neater as unbound workqueues are now a lot closer to per-cpu ones.
- The ability to interface with multiple backend worker pools are
used to implement unbound workqueues with custom attributes.
Currently the supported attributes are the nice level and CPU
affinity. It may be expanded to include cgroup association in
future. The attributes can be specified either by calling
apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if
the workqueue in question is exported through sysfs.
The backend worker pools are keyed by the actual attributes and
shared by any workqueues which share the same attributes. When
attributes of a workqueue are changed, the workqueue binds to the
worker pool with the specified attributes while leaving the work
items which are already executing in its previous worker pools
alone.
This allows converting custom worker pool implementations which
want worker attribute tuning to use workqueues. The writeback pool
is already converted in block tree and there are a couple others
are likely to follow including btrfs io workers.
- WQ_UNBOUND's ability to bind to multiple worker pools is also used
to make it NUMA-aware. Because there's no association between work
item issuer and the specific worker assigned to execute it, before
this change, using unbound workqueue led to unnecessary cross-node
bouncing and it couldn't be helped by autonuma as it requires tasks
to have implicit node affinity and workers are assigned randomly.
After these changes, an unbound workqueue now binds to multiple
NUMA-affine worker pools so that queued work items are executed in
the same node. This is turned on by default but can be disabled
system-wide or for individual workqueues.
Crypto was requesting NUMA affinity as encrypting data across
different nodes can contribute noticeable overhead and doing it
per-cpu was too limiting for certain cases and IO throughput could
be bottlenecked by one CPU being fully occupied while others have
idle cycles.
While the new features required a lot of changes including
restructuring locking, it didn't complicate the execution paths much.
The unbound workqueue handling is now closer to per-cpu ones and the
new features are implemented by simply associating a workqueue with
different sets of backend worker pools without changing queue,
execution or flush paths.
As such, even though the amount of change is very high, I feel
relatively safe in that it isn't likely to cause subtle issues with
basic correctness of work item execution and handling. If something
is wrong, it's likely to show up as being associated with worker pools
with the wrong attributes or OOPS while workqueue attributes are being
changed or during CPU hotplug.
While this creates more backend worker pools, it doesn't add too many
more workers unless, of course, there are many workqueues with unique
combinations of attributes. Assuming everything else is the same,
NUMA awareness costs an extra worker pool per NUMA node with online
CPUs.
There are also a couple things which are being routed outside the
workqueue tree.
- block tree pulled in workqueue for-3.10 so that writeback worker
pool can be converted to unbound workqueue with sysfs control
exposed. This simplifies the code, makes writeback workers
NUMA-aware and allows tuning nice level and CPU affinity via sysfs.
- The conversion to workqueue means that there's no 1:1 association
between a specific worker, which makes writeback folks unhappy as
they want to be able to tell which filesystem caused a problem from
backtrace on systems with many filesystems mounted. This is
resolved by allowing work items to set debug info string which is
printed when the task is dumped. As this change involves unifying
implementations of dump_stack() and friends in arch codes, it's
being routed through Andrew's -mm tree."
* 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits)
workqueue: use kmem_cache_free() instead of kfree()
workqueue: avoid false negative WARN_ON() in destroy_workqueue()
workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity
workqueue: implement NUMA affinity for unbound workqueues
workqueue: introduce put_pwq_unlocked()
workqueue: introduce numa_pwq_tbl_install()
workqueue: use NUMA-aware allocation for pool_workqueues
workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq()
workqueue: map an unbound workqueues to multiple per-node pool_workqueues
workqueue: move hot fields of workqueue_struct to the end
workqueue: make workqueue->name[] fixed len
workqueue: add workqueue->unbound_attrs
workqueue: determine NUMA node of workers accourding to the allowed cpumask
workqueue: drop 'H' from kworker names of unbound worker pools
workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]
workqueue: move pwq_pool_locking outside of get/put_unbound_pool()
workqueue: fix memory leak in apply_workqueue_attrs()
workqueue: fix unbound workqueue attrs hashing / comparison
workqueue: fix race condition in unbound workqueue free path
workqueue: remove pwq_lock which is no longer used
...
existing platforms, as well as adoption of the framework by new
platforms and devices. Some long-needed fixes to the core framework are
here as well as new features such as improved initialization of clocks
from DT as well as framework reentrancy for nested clock operations.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRfqtLAAoJEDqPOy9afJhJsxwP/RLvfeeMIU3804ahVNK2C59h
ehJ06ZP+b0u0A7+YSC7CX1pHXIFW+UoZgYLJiLdV2kEdpOIKMELZyUcEVB97u1Of
TVlsmHfTLv2zVAq/LYRVSKFYeMUd/6RRoq7Cm6hoj638IVeXG7C+8pei2aVZe++t
1ENmb4UGFJ7NLfpE5zQ3fEuIfHfuWA8Od6SmPaV/YG5Io8HgkDGF3/tCJURJGII6
xLN2Rh8qbFktJLVvKe6yLyvUEZiWh8A6HNPyNiFYYGX11wU76zK2wMN3BW6Nn/kW
3PubzISoKRaoCZvuVK+CoLWnhFl2LteFVVmL1TBc/jxJe6q+rLX33sXl1q9K+SLt
POnHf/7nDyO3zbZWgfRR1r3FdeZqdLYw8HVsLcOKFcv9n1UligzuUNml5PklKwNh
BDMmSo5ytS1QPV1e9ZtVrk6IyvDyrenwfDW1Mw43ST6D23FVrivywB4X9ur6WljI
d1/CBvQXQZ11Hd4OAvqRL8QYFJvc5WlERjSd1j6I6XS6xioKOTKMkUC/KpRcCid9
avA6mJ5k/a1jTojvh2wl37paI//OzY0VDlxRSeMZIu9Dsn29DnPlE5CLg535Ovu+
mn9OtLFEDNnlgWCMQYUehGd7ITgtwrB/fxxNeBbMYjDz4AIirR2BIvMR7I8CMTQz
M0rHu8NpwKH6eqC6kAup
=+LO3
-----END PGP SIGNATURE-----
Merge tag 'clk-for-linus-3.10' of git://git.linaro.org/people/mturquette/linux
Pull clock framework update from Michael Turquette:
"The common clock framework changes for 3.10 include many fixes for
existing platforms, as well as adoption of the framework by new
platforms and devices.
Some long-needed fixes to the core framework are here as well as new
features such as improved initialization of clocks from DT as well as
framework reentrancy for nested clock operations."
* tag 'clk-for-linus-3.10' of git://git.linaro.org/people/mturquette/linux: (44 commits)
clk: add clk_ignore_unused option to keep boot clocks on
clk: ux500: fix mismatched types
clk: vexpress: Add separate SP810 driver
clk: si5351: make clk-si5351 depend on CONFIG_OF
clk: export __clk_get_flags for modular clock providers
clk: vt8500: Missing breaks in vtwm_pll_round_rate/_set_rate.
clk: sunxi: Unify oscillator clock
clk: composite: allow fixed rates & fixed dividers
clk: composite: rename 'div' references to 'rate'
clk: add si5351 i2c common clock driver
clk: add device tree fixed-factor-clock binding support
clk: Properly handle notifier return values
clk: ux500: abx500: Define clock tree for ab850x
clk: ux500: Add support for sysctrl clocks
clk: mvebu: Fix valid value range checking for cpu_freq_select
clk: Fixup locking issues for clk_set_parent
clk: Fixup errorhandling for clk_set_parent
clk: Restructure code for __clk_reparent
clk: sunxi: drop an unnecesary kmalloc
clk: sunxi: drop CLK_IGNORE_UNUSED
...
Along with the usual minor fixes and clean ups there are a few major
changes with this pull request.
1) Multiple buffers for the ftrace facility
This feature has been requested by many people over the last few years.
I even heard that Google was about to implement it themselves. I finally
had time and cleaned up the code such that you can now create multiple
instances of the ftrace buffer and have different events go to different
buffers. This way, a low frequency event will not be lost in the noise
of a high frequency event.
Note, currently only events can go to different buffers, the tracers
(ie. function, function_graph and the latency tracers) still can only
be written to the main buffer.
2) The function tracer triggers have now been extended.
The function tracer had two triggers. One to enable tracing when a
function is hit, and one to disable tracing. Now you can record a
stack trace on a single (or many) function(s), take a snapshot of the
buffer (copy it to the snapshot buffer), and you can enable or disable
an event to be traced when a function is hit.
3) A perf clock has been added.
A "perf" clock can be chosen to be used when tracing. This will cause
ftrace to use the same clock as perf uses, and hopefully this will make
it easier to interleave the perf and ftrace data for analysis.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJRfnTPAAoJEOdOSU1xswtMqYYH/1WIdrwXmxHflErnYkCIr3sU
QtYae2K5A1HcgiqOvRJrdWMOt016iMx5CaQQyBFM1vvMiPY0sTWRmwNxDfZzz9LN
10jRvWEzZSLtzl+a9mkFWLEpr5nR/QODOxkWFCnRWscp46sp04LSTxGDYsOnPQZB
sam/AQ1h4xA+DqDBChm9BDEUEPorGleTlN54LBaCGgSFGvrbF+eAg2s4vHNAQAvQ
8d5xjSE9zC7J+FqbVxvJTbKI3+EqKL6hMsJKsKfi0SI+FuxBaFMSltXck5zKyTI4
HpNJzXCmw+v90Tju7oMkPHh6RTbESPCHoGU+wqE52fM6m7oScVeuI/kfc6USwU4=
=W1n+
-----END PGP SIGNATURE-----
Merge tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Along with the usual minor fixes and clean ups there are a few major
changes with this pull request.
1) Multiple buffers for the ftrace facility
This feature has been requested by many people over the last few
years. I even heard that Google was about to implement it themselves.
I finally had time and cleaned up the code such that you can now
create multiple instances of the ftrace buffer and have different
events go to different buffers. This way, a low frequency event will
not be lost in the noise of a high frequency event.
Note, currently only events can go to different buffers, the tracers
(ie function, function_graph and the latency tracers) still can only
be written to the main buffer.
2) The function tracer triggers have now been extended.
The function tracer had two triggers. One to enable tracing when a
function is hit, and one to disable tracing. Now you can record a
stack trace on a single (or many) function(s), take a snapshot of the
buffer (copy it to the snapshot buffer), and you can enable or disable
an event to be traced when a function is hit.
3) A perf clock has been added.
A "perf" clock can be chosen to be used when tracing. This will cause
ftrace to use the same clock as perf uses, and hopefully this will
make it easier to interleave the perf and ftrace data for analysis."
* tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (82 commits)
tracepoints: Prevent null probe from being added
tracing: Compare to 1 instead of zero for is_signed_type()
tracing: Remove obsolete macro guard _TRACE_PROFILE_INIT
ftrace: Get rid of ftrace_profile_bits
tracing: Check return value of tracing_init_dentry()
tracing: Get rid of unneeded key calculation in ftrace_hash_move()
tracing: Reset ftrace_graph_filter_enabled if count is zero
tracing: Fix off-by-one on allocating stat->pages
kernel: tracing: Use strlcpy instead of strncpy
tracing: Update debugfs README file
tracing: Fix ftrace_dump()
tracing: Rename trace_event_mutex to trace_event_sem
tracing: Fix comment about prefix in arch_syscall_match_sym_name()
tracing: Convert trace_destroy_fields() to static
tracing: Move find_event_field() into trace_events.c
tracing: Use TRACE_MAX_PRINT instead of constant
tracing: Use pr_warn_once instead of open coded implementation
ring-buffer: Add ring buffer startup selftest
tracing: Bring Documentation/trace/ftrace.txt up to date
tracing: Add "perf" trace_clock
...
Conflicts:
kernel/trace/ftrace.c
kernel/trace/trace.c
This is primarily useful when there's a driver that doesn't claim clocks
properly, but the bootloader leaves them on. It's not expected to be used
in normal cases, but for bringup and debug it's very useful to have the
option to not gate unclaimed clocks that are still on.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
[mturquette@linaro.org: fixed up trivial merge issue]
Pull kdump fixes from Peter Anvin:
"The kexec/kdump people have found several problems with the support
for loading over 4 GiB that was introduced in this merge cycle. This
is partly due to a number of design problems inherent in the way the
various pieces of kdump fit together (it is pretty horrifically manual
in many places.)
After a *lot* of iterations this is the patchset that was agreed upon,
but of course it is now very late in the cycle. However, because it
changes both the syntax and semantics of the crashkernel option, it
would be desirable to avoid a stable release with the broken
interfaces."
I'm not happy with the timing, since originally the plan was to release
the final 3.9 tomorrow. But apparently I'm doing an -rc8 instead...
* 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kexec: use Crash kernel for Crash kernel low
x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low
x86, kdump: Retore crashkernel= to allocate under 896M
x86, kdump: Set crashkernel_low automatically
Document the new kernel commandline parameters in the
appropriate file.
Reviewed-by: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
We need full dynticks CPU to also be RCU nocb so
that we don't have to keep the tick to handle RCU
callbacks.
Make sure the range passed to nohz_full= boot
parameter is a subset of rcu_nocbs=
The CPUs that fail to meet this requirement will be
excluded from the nohz_full range. This is checked
early in boot time, before any CPU has the opportunity
to stop its tick.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
The timekeeping job must be able to run early on boot
because there may be some pre-SMP (and thus pre-initcalls )
components that rely on it. The IO-APIC is one such users
as it tests the timer health by watching jiffies progression.
Given that it happens before we know the initial online
set, we can't rely on it to select a timekeeper. We need
one before SMP time otherwise we simply crash on boot.
To fix this and keep things simple for now, force the boot CPU
outside of the full dynticks range in any case and do this early
on kernel parameter parsing time.
We might want a trickier solution later, expecially for aSMP
architectures that need to assign housekeeping tasks to arbitrary
low power CPUs.
But it's still first pass KISS time for now.
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Per hpa, use crashkernel=X,high crashkernel=Y,low instead of
crashkernel_hign=X crashkernel_low=Y. As that could be extensible.
-v2: according to Vivek, change delimiter to ;
-v3: let hign and low only handle simple form and it conforms to
description in kernel-parameters.txt
still keep crashkernel=X override any crashkernel=X,high
crashkernel=Y,low
-v4: update get_last_crashkernel returning and add more strict
checking in parse_crashkernel_simple() found by HATAYAMA.
-v5: Change delimiter back to , according to HPA.
also separate parse_suffix from parse_simper according to vivek.
so we can avoid @pos in that path.
-v6: Tight the checking about crashkernel=X,highblahblah,high
found by HTYAYAMA.
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1366089828-19692-5-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Vivek found old kexec-tools does not work new kernel anymore.
So change back crashkernel= back to old behavoir, and add crashkernel_high=
to let user decide if buffer could be above 4G, and also new kexec-tools will
be needed.
-v2: let crashkernel=X override crashkernel_high=
update description about _high will be ignored by crashkernel=X
-v3: update description about kernel-parameters.txt according to Vivek.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1366089828-19692-4-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Chao said that kdump does does work well on his system on 3.8
without extra parameter, even iommu does not work with kdump.
And now have to append crashkernel_low=Y in first kernel to make
kdump work.
We have now modified crashkernel=X to allocate memory beyong 4G (if
available) and do not allocate low range for crashkernel if the user
does not specify that with crashkernel_low=Y. This causes regression
if iommu is not enabled. Without iommu, swiotlb needs to be setup in
first 4G and there is no low memory available to second kernel.
Set crashkernel_low automatically if the user does not specify that.
For system that does support IOMMU with kdump properly, user could
specify crashkernel_low=0 to save that 72M low ram.
-v3: add swiotlb_size() according to Konrad.
-v4: add comments what 8M is for according to hpa.
also update more crashkernel_low= in kernel-parameters.txt
-v5: update changelog according to Vivek.
-v6: Change description about swiotlb referring according to HATAYAMA.
Reported-by: WANG Chao <chaowang@redhat.com>
Tested-by: WANG Chao <chaowang@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1366089828-19692-2-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Using this parameter one can disable the storage_size/2 check if
he is really sure that the UEFI does sane gc and fulfills the spec.
This parameter is useful if a devices uses more than 50% of the
storage by default.
The Intel DQSW67 desktop board is such a sucker for exmaple.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Add remoteproc platform device for controlling the DSP
on da8xx. The patch uses CMA-based reservation of physical
memory block for DSP use. A new kernel command-line parameter
has been added to allow boot-time specification of the physical
memory block.
Signed-off-by: Robert Tivy <rtivy@ti.com>
[nsekhar@ti.com: edit commit message for readability and
style improvements]
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
"Extended nohz" was used as a naming base for the full dynticks
API and Kconfig symbols. It reflects the fact the system tries
to stop the tick in more places than just idle.
But that "extended" name is a bit opaque and vague. Rename it to
"full" makes it clearer what the system tries to do under this
config: try to shutdown the tick anytime it can. The various
constraints that prevent that to happen shouldn't be considered
as fundamental properties of this feature but rather technical
issues that may be solved in the future.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Documentation/kernel-parameters.txt and
Documentation/x86/x86_64/boot-options.txt contain virtually
identical text describing earlyprintk.
This consolidates the two copies and updates the documentation a
bit. No one ever documented the:
earlyprintk=serial,0x1008,115200
syntax, nor mentioned that ARM is now a supported earlyprintk
arch.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rob Landley <rob@landley.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20130410210338.E2930E98@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Unbound workqueues are now NUMA aware. Let's add some control knobs
and update sysfs interface accordingly.
* Add kernel param workqueue.numa_disable which disables NUMA affinity
globally.
* Replace sysfs file "pool_id" with "pool_ids" which contain
node:pool_id pairs. This change is userland-visible but "pool_id"
hasn't seen a release yet, so this is okay.
* Add a new sysf files "numa" which can toggle NUMA affinity on
individual workqueues. This is implemented as attrs->no_numa whichn
is special in that it isn't part of a pool's attributes. It only
affects how apply_workqueue_attrs() picks which pools to use.
After "pool_ids" change, first_pwq() doesn't have any user left.
Removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
The PPC specific kernel parameter 'noresidual' was removed in v2.6.27,
though commit 917f0af9e5 ("powerpc: Remove
arch/ppc and include/asm-ppc").
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Because RCU callbacks are now associated with the number of the grace
period that they must wait for, CPUs can now take advance callbacks
corresponding to grace periods that ended while a given CPU was in
dyntick-idle mode. This eliminates the need to try forcing the RCU
state machine while entering idle, thus reducing the CPU intensiveness
of RCU_FAST_NO_HZ, which should increase its energy efficiency.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Currently, the per-no-CBs-CPU kthreads are named "rcuo" followed by
the CPU number, for example, "rcuo". This is problematic given that
there are either two or three RCU flavors, each of which gets a per-CPU
kthread with exactly the same name. This commit therefore introduces
a one-letter abbreviation for each RCU flavor, namely 'b' for RCU-bh,
'p' for RCU-preempt, and 's' for RCU-sched. This abbreviation is used
to distinguish the "rcuo" kthreads, for example, for CPU 0 we would have
"rcuob/0", "rcuop/0", and "rcuos/0".
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
For extreme usecases such as Real Time or HPC, having
the ability to shutdown the tick when a single task runs
on a CPU is a desired feature:
* Reducing the amount of interrupts improves throughput
for CPU-bound tasks. The CPU is less distracted from its
real job, from an execution time and from the cache point
of views.
* This also improve latency response as we have less critical
sections.
Start with introducing a very simple interface to define
full dynticks CPU: use a boot time option defined cpumask
through the "nohz_extended=" kernel parameter. CPUs that
are part of this range will have their tick shutdown
whenever possible: provided they run a single task and
they don't do kernel activity that require the periodic
tick. These details will be later documented in
Documentation/*
An online CPU must be kept outside this range to handle the
timekeeping.
Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
If debugging the kernel, and the developer wants to use
tracing_snapshot() in places where tracing_snapshot_alloc() may
be difficult (or more likely, the developer is lazy and doesn't
want to bother with tracing_snapshot_alloc() at all), then adding
alloc_snapshot
to the kernel command line parameter will tell ftrace to allocate
the snapshot buffer (if configured) when it allocates the main
tracing buffer.
I also noticed that ring_buffer_expanded and tracing_selftest_disabled
had inconsistent use of boolean "true" and "false" with "0" and "1".
I cleaned that up too.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This adds core architecture support for Imagination's Meta processor
cores, followed by some later miscellaneous arch/metag cleanups and
fixes which I kept separate to ease review:
- Support for basic Meta 1 (ATP) and Meta 2 (HTP) core architecture
- A few fixes all over, particularly for symbol prefixes
- A few privilege protection fixes
- Several cleanups (setup.c includes, split out a lot of metag_ksyms.c)
- Fix some missing exports
- Convert hugetlb to use vm_unmapped_area()
- Copy device tree to non-init memory
- Provide dma_get_sgtable()
Signed-off-by: James Hogan <james.hogan@imgtec.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRMmVXAAoJEKHZs+irPybfivgP/inEXqJyfw59omQdjwvYcU/a
/u0MJ3UKSNS3U+HknfaFCy/Nwk1dqPLjqqyVC1V6AbUPBXlaEwGcimlNRx2uRjdq
Uh96upMLHsNuF/xiiR477g3RwY0egIJdM1R1bGi3mZ3vVrNQGF+wbni6f61xCWGz
M/4rDglpQvE79oLhYdgj6tidZtHQT0YWtERA9W90zkQWXGYmpFPKBKbfZAi5+rKQ
U6Gpg26orUugzXNaxltJEYKE8gjLTppEabx8DARnItZ4zCMy4dw5RBJ35RFvQw6e
eSmfgTy9w9WqBMY2+QMSgU0KQt1IITCzX7OlOXC0jALQJXoU0WWbOELlBVQLCwF1
T0OcR/5ZP/hIlOk5Dh+e9U3AtbASXdMtqA0ZUe78woH1CBf7Nc/0c0vRg23EdMh8
lnHDJxT/UqskoOYLI4kgWbEdLDy4uTh19U2pVi7VCo7ksLB9Bj9Xc8VSKgscSXTl
OwzN+c4Jgtu8FDFTp+Af4AT8pYGJ08j8L2ErsV2sOv3Q44U5WXdrMz3GSgwXj8+4
wZk3HvdkQVkMD5sJCUZgAswaN6BnbB0pHdCz4wMQ8jR/Ogs015Ipk64Ecym9S/4n
uES7PnDtt/4lb5EyX2ScbvdnZTAFTaaP7OOhC77BOQvbQjIW1tkAcxWJqRry86uS
iM0BFgK6Ohx3geqa5Ft0
=65cR
-----END PGP SIGNATURE-----
Merge tag 'metag-v3.9-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag
Pull new ImgTec Meta architecture from James Hogan:
"This adds core architecture support for Imagination's Meta processor
cores, followed by some later miscellaneous arch/metag cleanups and
fixes which I kept separate to ease review:
- Support for basic Meta 1 (ATP) and Meta 2 (HTP) core architecture
- A few fixes all over, particularly for symbol prefixes
- A few privilege protection fixes
- Several cleanups (setup.c includes, split out a lot of
metag_ksyms.c)
- Fix some missing exports
- Convert hugetlb to use vm_unmapped_area()
- Copy device tree to non-init memory
- Provide dma_get_sgtable()"
* tag 'metag-v3.9-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag: (61 commits)
metag: Provide dma_get_sgtable()
metag: prom.h: remove declaration of metag_dt_memblock_reserve()
metag: copy devicetree to non-init memory
metag: cleanup metag_ksyms.c includes
metag: move mm/init.c exports out of metag_ksyms.c
metag: move usercopy.c exports out of metag_ksyms.c
metag: move setup.c exports out of metag_ksyms.c
metag: move kick.c exports out of metag_ksyms.c
metag: move traps.c exports out of metag_ksyms.c
metag: move irq enable out of irqflags.h on SMP
genksyms: fix metag symbol prefix on crc symbols
metag: hugetlb: convert to vm_unmapped_area()
metag: export clear_page and copy_page
metag: export metag_code_cache_flush_all
metag: protect more non-MMU memory regions
metag: make TXPRIVEXT bits explicit
metag: kernel/setup.c: sort includes
perf: Enable building perf tools for Meta
metag: add boot time LNKGET/LNKSET check
metag: add __init to metag_cache_probe()
...
Add basic metag documentation. This includes an outline description of
the ABIs (including syscall ABI) and calling conventions, similar to the
one in Documentation/frv/.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Rob Landley <rob@landley.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: linux-doc@vger.kernel.org
Tim found:
WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
Hardware name: S2600CP
sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
smpboot: Booting Node 1, Processors #1
Modules linked in:
Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
Call Trace:
set_cpu_sibling_map+0x279/0x449
start_secondary+0x11d/0x1e5
Don Morris reproduced on a HP z620 workstation, and bisected it to
commit e8d1955258 ("acpi, memory-hotplug: parse SRAT before memblock
is ready")
It turns out movable_map has some problems, and it breaks several things
1. numa_init is called several times, NOT just for srat. so those
nodes_clear(numa_nodes_parsed)
memset(&numa_meminfo, 0, sizeof(numa_meminfo))
can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy.
and make fall back path working.
2. simply split acpi_numa_init to early_parse_srat.
a. that early_parse_srat is NOT called for ia64, so you break ia64.
b. for (i = 0; i < MAX_LOCAL_APIC; i++)
set_apicid_to_node(i, NUMA_NO_NODE)
still left in numa_init. So it will just clear result from early_parse_srat.
it should be moved before that....
c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
early before override from INITRD is settled.
3. that patch TITLE is total misleading, there is NO x86 in the title,
but it changes critical x86 code. It caused x86 guys did not
pay attention to find the problem early. Those patches really should
be routed via tip/x86/mm.
4. after that commit, following range can not use movable ram:
a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
b. initrd... it will be freed after booting, so it could be on movable...
c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
anymore.
d. init_mem_mapping: can not put page table high anymore.
e. initmem_init: vmemmap can not be high local node anymore. That is
not good.
If node is hotplugable, the mem related range like page table and
vmemmap could be on the that node without problem and should be on that
node.
We have workaround patch that could fix some problems, but some can not
be fixed.
So just remove that offending commit and related ones including:
f7210e6c4a ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
protect movablecore_map in memblock_overlaps_region().")
01a178a94e ("acpi, memory-hotplug: support getting hotplug info from
SRAT")
27168d38fa ("acpi, memory-hotplug: extend movablemem_map ranges to
the end of node")
e8d1955258 ("acpi, memory-hotplug: parse SRAT before memblock is
ready")
fb06bc8e5f ("page_alloc: bootmem limit with movablecore_map")
42f47e27e7 ("page_alloc: make movablemem_map have higher priority")
6981ec3114 ("page_alloc: introduce zone_movable_limit[] to keep
movable limit for nodes")
34b71f1e04 ("page_alloc: add movable_memmap kernel parameter")
4d59a75125 ("x86: get pg_data_t's memory from other node")
Later we should have patches that will make sure kernel put page table
and vmemmap on local node ram instead of push them down to node0. Also
need to find way to put other kernel used ram to local node ram.
Reported-by: Tim Gardner <tim.gardner@canonical.com>
Reported-by: Don Morris <don.morris@hp.com>
Bisected-by: Don Morris <don.morris@hp.com>
Tested-by: Don Morris <don.morris@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull more x86 fixes from Peter Anvin:
"Additional x86 fixes. Three of these patches are pure documentation,
two are pretty trivial; the remaining one fixes boot problems on some
non-BIOS machines."
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Make sure we can boot in the case the BDA contains pure garbage
x86, efi: Mark disable_runtime as __initdata
x86, doc: Fix incorrect comment about 64-bit code segment descriptors
doc, kernel-parameters: Document 'console=hvc<n>'
doc, xen: Mention 'earlyprintk=xen' in the documentation.
ACPI: Overriding ACPI tables via initrd only works with an initrd and on X86
Host bridge hotplug
- Major overhaul of ACPI host bridge add/start (Rafael Wysocki, Yinghai Lu)
- Major overhaul of PCI/ACPI binding (Rafael Wysocki, Yinghai Lu)
- Split out ACPI host bridge and ACPI PCI device hotplug (Yinghai Lu)
- Stop caching _PRT and make independent of bus numbers (Yinghai Lu)
PCI device hotplug
- Clean up cpqphp dead code (Sasha Levin)
- Disable ARI unless device and upstream bridge support it (Yijing Wang)
- Initialize all hot-added devices (not functions 0-7) (Yijing Wang)
Power management
- Don't touch ASPM if disabled (Joe Lawrence)
- Fix ASPM link state management (Myron Stowe)
Miscellaneous
- Fix PCI_EXP_FLAGS accessor (Alex Williamson)
- Disable Bus Master in pci_device_shutdown (Konstantin Khlebnikov)
- Document hotplug resource and MPS parameters (Yijing Wang)
- Add accessor for PCIe capabilities (Myron Stowe)
- Drop pciehp suspend/resume messages (Paul Bolle)
- Make pci_slot built-in only (not a module) (Jiang Liu)
- Remove unused PCI/ACPI bind ops (Jiang Liu)
- Removed used pci_root_bus (Bjorn Helgaas)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRKS3hAAoJEFmIoMA60/r8xxoP/j1CS4oCZAnBIVT9fKBkis+/
CENcfHIUKj6J9iMfJEVvqBELvqaLqtpeNwAGMcGPxV7VuT3K1QumChfaTpRDP0HC
VDRmrjcmfenEK+YPOG7acsrTyqk2wjpLOyu9MKRxtC5u7tF6376KQpkEFpO4haL4
eUHTxfE76OkrPBSvx3+PUSf6jqrvrNbjX8K6HdDVVlm3sVAQKmYJU/Wphv2NPOqa
CAMyCzEGybFjr8hDRwvWgr+06c718GMwQUbnrPdHXAe7lMNMrN/XVBmU9ABN3Aas
icd3lrDs+yPObgcO/gT8+sAZErCtdJ9zuHGYHdYpRbIQj/5JT4TMk7tw/Bj7vKY9
Mqmho9GR5YmYTRN9f1r+2n5AQ/KYWXJDrRNOnt5/ys5BOM3vwJ7WJ902zpSwtFQp
nLX+oD/hLfzpnoIQGDuBAoAXp2Kam3XWRgVvG78buRNrPj+kUzimk14a8qQeY+CB
El6UKuwi5Uv/qgs1gAqqjmZmsAkon2DnsRZa6Fl8NTkDlis7LY4gp9OU38ySFpB+
PhCmRyCZmDDqTVtwj6XzR3nPQ5LBSbvsTfgMxYMIUSXHa06tyb2q5p4mEIas0OmU
RKaP5xQqZuTgD8fbdYrx0xgSrn7JHt/j/X//Qs6unlLCWhlpm3LjJZKxyw2FwBGr
o4Lci+PiBh3MowCrju9D
=ER3b
-----END PGP SIGNATURE-----
Merge tag 'pci-v3.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI changes from Bjorn Helgaas:
"Host bridge hotplug
- Major overhaul of ACPI host bridge add/start (Rafael Wysocki, Yinghai Lu)
- Major overhaul of PCI/ACPI binding (Rafael Wysocki, Yinghai Lu)
- Split out ACPI host bridge and ACPI PCI device hotplug (Yinghai Lu)
- Stop caching _PRT and make independent of bus numbers (Yinghai Lu)
PCI device hotplug
- Clean up cpqphp dead code (Sasha Levin)
- Disable ARI unless device and upstream bridge support it (Yijing Wang)
- Initialize all hot-added devices (not functions 0-7) (Yijing Wang)
Power management
- Don't touch ASPM if disabled (Joe Lawrence)
- Fix ASPM link state management (Myron Stowe)
Miscellaneous
- Fix PCI_EXP_FLAGS accessor (Alex Williamson)
- Disable Bus Master in pci_device_shutdown (Konstantin Khlebnikov)
- Document hotplug resource and MPS parameters (Yijing Wang)
- Add accessor for PCIe capabilities (Myron Stowe)
- Drop pciehp suspend/resume messages (Paul Bolle)
- Make pci_slot built-in only (not a module) (Jiang Liu)
- Remove unused PCI/ACPI bind ops (Jiang Liu)
- Removed used pci_root_bus (Bjorn Helgaas)"
* tag 'pci-v3.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (51 commits)
PCI/ACPI: Don't cache _PRT, and don't associate them with bus numbers
PCI: Fix PCI Express Capability accessors for PCI_EXP_FLAGS
ACPI / PCI: Make pci_slot built-in only, not a module
PCI/PM: Clear state_saved during suspend
PCI: Use atomic_inc_return() rather than atomic_add_return()
PCI: Catch attempts to disable already-disabled devices
PCI: Disable Bus Master unconditionally in pci_device_shutdown()
PCI: acpiphp: Remove dead code for PCI host bridge hotplug
PCI: acpiphp: Create companion ACPI devices before creating PCI devices
PCI: Remove unused "rc" in virtfn_add_bus()
PCI: pciehp: Drop suspend/resume ENTRY messages
PCI/ASPM: Don't touch ASPM if forcibly disabled
PCI/ASPM: Deallocate upstream link state even if device is not PCIe
PCI: Document MPS parameters pci=pcie_bus_safe, pci=pcie_bus_perf, etc
PCI: Document hpiosize= and hpmemsize= resource reservation parameters
PCI: Use PCI Express Capability accessor
PCI: Introduce accessor to retrieve PCIe Capabilities Register
PCI: Put pci_dev in device tree as early as possible
PCI: Skip attaching driver in device_add()
PCI: acpiphp: Keep driver loaded even if no slots found
...
Both the PowerPC hypervisor and Xen hypervisor can utilize the
hvc driver.
Cc: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1361825650-14031-3-git-send-email-konrad.wilk@oracle.com
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The earlyprintk for Xen PV guests utilizes a simple hypercall
(console_io) to provide output to Xen emergency console.
Note that the Xen hypervisor should be booted with 'loglevel=all'
to output said information.
Reported-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1361825650-14031-2-git-send-email-konrad.wilk@oracle.com
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
We now provide an option for users who don't want to specify physical
memory address in kernel commandline.
/*
* For movablemem_map=acpi:
*
* SRAT: |_____| |_____| |_________| |_________| ......
* node id: 0 1 1 2
* hotpluggable: n y y n
* movablemem_map: |_____| |_________|
*
* Using movablemem_map, we can prevent memblock from allocating memory
* on ZONE_MOVABLE at boot time.
*/
So user just specify movablemem_map=acpi, and the kernel will use
hotpluggable info in SRAT to determine which memory ranges should be set
as ZONE_MOVABLE.
If all the memory ranges in SRAT is hotpluggable, then no memory can be
used by kernel. But before parsing SRAT, memblock has already reserve
some memory ranges for other purposes, such as for kernel image, and so
on. We cannot prevent kernel from using these memory. So we need to
exclude these ranges even if these memory is hotpluggable.
Furthermore, there could be several memory ranges in the single node
which the kernel resides in. We may skip one range that have memory
reserved by memblock, but if the rest of memory is too small, then the
kernel will fail to boot. So, make the whole node which the kernel
resides in un-hotpluggable. Then the kernel has enough memory to use.
NOTE: Using this way will cause NUMA performance down because the
whole node will be set as ZONE_MOVABLE, and kernel cannot use memory
on it. If users don't want to lose NUMA performance, just don't use
it.
[akpm@linux-foundation.org: fix warning]
[akpm@linux-foundation.org: use strcmp()]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add functions to parse movablemem_map boot option. Since the option
could be specified more then once, all the maps will be stored in the
global variable movablemem_map.map array.
And also, we keep the array in monotonic increasing order by start_pfn.
And merge all overlapped ranges.
[akpm@linux-foundation.org: improve comment]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: remove unneeded parens]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 mm changes from Peter Anvin:
"This is a huge set of several partly interrelated (and concurrently
developed) changes, which is why the branch history is messier than
one would like.
The *really* big items are two humonguous patchsets mostly developed
by Yinghai Lu at my request, which completely revamps the way we
create initial page tables. In particular, rather than estimating how
much memory we will need for page tables and then build them into that
memory -- a calculation that has shown to be incredibly fragile -- we
now build them (on 64 bits) with the aid of a "pseudo-linear mode" --
a #PF handler which creates temporary page tables on demand.
This has several advantages:
1. It makes it much easier to support things that need access to data
very early (a followon patchset uses this to load microcode way
early in the kernel startup).
2. It allows the kernel and all the kernel data objects to be invoked
from above the 4 GB limit. This allows kdump to work on very large
systems.
3. It greatly reduces the difference between Xen and native (Xen's
equivalent of the #PF handler are the temporary page tables created
by the domain builder), eliminating a bunch of fragile hooks.
The patch series also gets us a bit closer to W^X.
Additional work in this pull is the 64-bit get_user() work which you
were also involved with, and a bunch of cleanups/speedups to
__phys_addr()/__pa()."
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits)
x86, mm: Move reserving low memory later in initialization
x86, doc: Clarify the use of asm("%edx") in uaccess.h
x86, mm: Redesign get_user with a __builtin_choose_expr hack
x86: Be consistent with data size in getuser.S
x86, mm: Use a bitfield to mask nuisance get_user() warnings
x86/kvm: Fix compile warning in kvm_register_steal_time()
x86-32: Add support for 64bit get_user()
x86-32, mm: Remove reference to alloc_remap()
x86-32, mm: Remove reference to resume_map_numa_kva()
x86-32, mm: Rip out x86_32 NUMA remapping code
x86/numa: Use __pa_nodebug() instead
x86: Don't panic if can not alloc buffer for swiotlb
mm: Add alloc_bootmem_low_pages_nopanic()
x86, 64bit, mm: hibernate use generic mapping_init
x86, 64bit, mm: Mark data/bss/brk to nx
x86: Merge early kernel reserve for 32bit and 64bit
x86: Add Crash kernel low reservation
x86, kdump: Remove crashkernel range find limit for 64bit
memblock: Add memblock_mem_size()
x86, boot: Not need to check setup_header version for setup_data
...
When intel_pstate is configured into the kernel it will become the
preferred scaling driver for processors that it supports. Allow the
user to override this by adding:
intel_pstate=disable
on the kernel command line.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove 32-bit x86 a cmdline param "no-hlt",
and the cpuinfo_x86.hlt_works_ok that it sets.
If a user wants to avoid HLT, then "idle=poll"
is much more useful, as it avoids invocation of HLT
in idle, while "no-hlt" failed to do so.
Indeed, hlt_works_ok was consulted in only 3 places.
First, in /proc/cpuinfo where "hlt_bug yes"
would be printed if and only if the user booted
the system with "no-hlt" -- as there was no other code
to set that flag.
Second, check_hlt() would not invoke halt() if "no-hlt"
were on the cmdline.
Third, it was consulted in stop_this_cpu(), which is invoked
by native_machine_halt()/reboot_interrupt()/smp_stop_nmi_callback() --
all cases where the machine is being shutdown/reset.
The flag was not consulted in the more frequently invoked
play_dead()/hlt_play_dead() used in processor offline and suspend.
Since Linux-3.0 there has been a run-time notice upon "no-hlt" invocations
indicating that it would be removed in 2012.
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
mwait_idle() is a C1-only idle loop intended to be more efficient
than HLT, starting on Pentium-4 HT-enabled processors.
But mwait_idle() has been replaced by the more general
mwait_idle_with_hints(), which handles both C1 and deeper C-states.
ACPI processor_idle and intel_idle use only mwait_idle_with_hints(),
and no longer use mwait_idle().
Here we simplify the x86 native idle code by removing mwait_idle(),
and the "idle=mwait" bootparam used to invoke it.
Since Linux 3.0 there has been a boot-time warning when "idle=mwait"
was invoked saying it would be removed in 2012. This removal
was also noted in the (now removed:-) feature-removal-schedule.txt.
After this change, kernels configured with
(CONFIG_ACPI=n && CONFIG_INTEL_IDLE=n) when run on hardware
that supports MWAIT will simply use HLT. If MWAIT is desired
on those systems, cpuidle and the cpuidle drivers above
can be enabled.
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
Document PCIe bus MPS parameters pcie_bus_tune_off, pcie_bus_safe,
pcie_bus_peer2peer, pcie_bus_perf.
These parameters were introduced by Jon Mason <jdmason@kudzu.us> at commit
5f39e6705 and commit b03e7495a8.
[bhelgaas: mention hot-add for pcie_bus_peer2peer]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
During kdump kernel's booting stage, it need to find low ram for
swiotlb buffer when system does not support intel iommu/dmar remapping.
kexed-tools is appending memmap=exactmap and range from /proc/iomem
with "Crash kernel", and that range is above 4G for 64bit after boot
protocol 2.12.
We need to add another range in /proc/iomem like "Crash kernel low",
so kexec-tools could find that info and append to kdump kernel
command line.
Try to reserve some under 4G if the normal "Crash kernel" is above 4G.
User could specify the size with crashkernel_low=XX[KMG].
-v2: fix warning that is found by Fengguang's test robot.
-v3: move out get_mem_size change to another patch, to solve compiling
warning that is found by Borislav Petkov <bp@alien8.de>
-v4: user must specify crashkernel_low if system does not support
intel or amd iommu.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-31-git-send-email-yinghai@kernel.org
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The as-documented rcu_nocb_poll will fail to enable this feature
for two reasons. (1) there is an extra "s" in the documented
name which is not in the code, and (2) since it uses module_param,
it really is expecting a prefix, akin to "rcutree.fanout_leaf"
and the prefix isn't documented.
However, there are several reasons why we might not want to
simply fix the typo and add the prefix:
1) we'd end up with rcutree.rcu_nocb_poll, and rather probably make
a change to rcutree.nocb_poll
2) if we did #1, then the prefix wouldn't be consistent with the
rcu_nocbs=<cpumap> parameter (i.e. one with, one without prefix)
3) the use of module_param in a header file is less than desired,
since it isn't immediately obvious that it will get processed
via rcutree.c and get the prefix from that (although use of
module_param_named() could clarify that.)
4) the implied export of /sys/module/rcutree/parameters/rcu_nocb_poll
data to userspace via module_param() doesn't really buy us anything,
as it is read-only and we can tell if it is enabled already without
it, since there is a printk at early boot telling us so.
In light of all that, just change it from a module_param() to an
early_setup() call, and worry about adding it to /sys later on if
we decide to allow a dynamic setting of it.
Also change the variable to be tagged as read_mostly, since it
will only ever be fiddled with at most, once at boot.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Current mem= implementation seems buggy because the specification and
implementation don't match. The current mem= has been working for many
years and it's not buggy - it works as expected. So we should update the
specification.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIcBAABAgAGBQJQx0kQAAoJEHzG/DNEskfi4fQP/R5PRovayroZALBMLnVJDaLD
Ttr9p40VNXbiJ+MfRgatJjSSJZ4Jl+fC3NEqBhcwVZhckZZb9R2s0WtrSQo5+ZbB
vdRfiuKoCaKM4cSZ08C12uTvsF6xjhjd27CTUlMkyOcDoKxMEFKelv0hocSxe4Wo
xqlv3eF+VsY7kE1BNbgBP06SX4tDpIHRxXfqJPMHaSKQmre+cU0xG2GcEu3QGbHT
DEDTI788YSaWLmBfMC+kWoaQl1+bV/FYvavIAS8/o4K9IKvgR42VzrXmaFaqrbgb
72ksa6xfAi57yTmZHqyGmts06qYeBbPpKI+yIhCMInxA9CY3lPbvHppRf0RQOyzj
YOi4hovGEMJKE+BCILukhJcZ9jCTtS3zut6v1rdvR88f4y7uhR9RfmRfsxuW7PNj
3Rmh191+n0lVWDmhOs2psXuCLJr3LEiA0dFffN1z8REUTtTAZMsj8Rz+SvBNAZDR
hsJhERVeXB6X5uQ5rkLDzbn1Zic60LjVw7LIp6SF2OYf/YKaF8vhyWOA8dyCEu8W
CGo7AoG0BO8tIIr8+LvFe8CweypysZImx4AjCfIs4u9pu/v11zmBvO9NO5yfuObF
BreEERYgTes/UITxn1qdIW4/q+Nr0iKO3CTqsmu6L1GfCz3/XzPGs3U26fUhllqi
Ka0JKgnWvsa6ez6FSzKI
=ivQa
-----END PGP SIGNATURE-----
Merge tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma
Pull Automatic NUMA Balancing bare-bones from Mel Gorman:
"There are three implementations for NUMA balancing, this tree
(balancenuma), numacore which has been developed in tip/master and
autonuma which is in aa.git.
In almost all respects balancenuma is the dumbest of the three because
its main impact is on the VM side with no attempt to be smart about
scheduling. In the interest of getting the ball rolling, it would be
desirable to see this much merged for 3.8 with the view to building
scheduler smarts on top and adapting the VM where required for 3.9.
The most recent set of comparisons available from different people are
mel: https://lkml.org/lkml/2012/12/9/108
mingo: https://lkml.org/lkml/2012/12/7/331
tglx: https://lkml.org/lkml/2012/12/10/437
srikar: https://lkml.org/lkml/2012/12/10/397
The results are a mixed bag. In my own tests, balancenuma does
reasonably well. It's dumb as rocks and does not regress against
mainline. On the other hand, Ingo's tests shows that balancenuma is
incapable of converging for this workloads driven by perf which is bad
but is potentially explained by the lack of scheduler smarts. Thomas'
results show balancenuma improves on mainline but falls far short of
numacore or autonuma. Srikar's results indicate we all suffer on a
large machine with imbalanced node sizes.
My own testing showed that recent numacore results have improved
dramatically, particularly in the last week but not universally.
We've butted heads heavily on system CPU usage and high levels of
migration even when it shows that overall performance is better.
There are also cases where it regresses. Of interest is that for
specjbb in some configurations it will regress for lower numbers of
warehouses and show gains for higher numbers which is not reported by
the tool by default and sometimes missed in treports. Recently I
reported for numacore that the JVM was crashing with
NullPointerExceptions but currently it's unclear what the source of
this problem is. Initially I thought it was in how numacore batch
handles PTEs but I'm no longer think this is the case. It's possible
numacore is just able to trigger it due to higher rates of migration.
These reports were quite late in the cycle so I/we would like to start
with this tree as it contains much of the code we can agree on and has
not changed significantly over the last 2-3 weeks."
* tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits)
mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
mm/rmap: Convert the struct anon_vma::mutex to an rwsem
mm: migrate: Account a transhuge page properly when rate limiting
mm: numa: Account for failed allocations and isolations as migration failures
mm: numa: Add THP migration for the NUMA working set scanning fault case build fix
mm: numa: Add THP migration for the NUMA working set scanning fault case.
mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node
mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG
mm: sched: numa: Control enabling and disabling of NUMA balancing
mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate
mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships
mm: numa: migrate: Set last_nid on newly allocated page
mm: numa: split_huge_page: Transfer last_nid on tail page
mm: numa: Introduce last_nid to the page frame
sched: numa: Slowly increase the scanning period as NUMA faults are handled
mm: numa: Rate limit setting of pte_numa if node is saturated
mm: numa: Rate limit the amount of memory that is migrated between nodes
mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting
mm: numa: Migrate pages handled during a pmd_numa hinting fault
mm: numa: Migrate on reference policy
...
Pull networking changes from David Miller:
1) Allow to dump, monitor, and change the bridge multicast database
using netlink. From Cong Wang.
2) RFC 5961 TCP blind data injection attack mitigation, from Eric
Dumazet.
3) Networking user namespace support from Eric W. Biederman.
4) tuntap/virtio-net multiqueue support by Jason Wang.
5) Support for checksum offload of encapsulated packets (basically,
tunneled traffic can still be checksummed by HW). From Joseph
Gasparakis.
6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
Daniel Borkmann.
7) Bridge port parameters over netlink and BPDU blocking support
from Stephen Hemminger.
8) Improve data access patterns during inet socket demux by rearranging
socket layout, from Eric Dumazet.
9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
Jon Maloy.
10) Update TCP socket hash sizing to be more in line with current day
realities. The existing heurstics were choosen a decade ago.
From Eric Dumazet.
11) Fix races, queue bloat, and excessive wakeups in ATM and
associated drivers, from Krzysztof Mazur and David Woodhouse.
12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
in VXLAN driver, from David Stevens.
13) Add "oops_only" mode to netconsole, from Amerigo Wang.
14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
allow DCB netlink to work on namespaces other than the initial
namespace. From John Fastabend.
15) Support PTP in the Tigon3 driver, from Matt Carlson.
16) tun/vhost zero copy fixes and improvements, plus turn it on
by default, from Michael S. Tsirkin.
17) Support per-association statistics in SCTP, from Michele
Baldessari.
And many, many, driver updates, cleanups, and improvements. Too
numerous to mention individually.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
net/mlx4_en: Add support for destination MAC in steering rules
net/mlx4_en: Use generic etherdevice.h functions.
net: ethtool: Add destination MAC address to flow steering API
bridge: add support of adding and deleting mdb entries
bridge: notify mdb changes via netlink
ndisc: Unexport ndisc_{build,send}_skb().
uapi: add missing netconf.h to export list
pkt_sched: avoid requeues if possible
solos-pci: fix double-free of TX skb in DMA mode
bnx2: Fix accidental reversions.
bna: Driver Version Updated to 3.1.2.1
bna: Firmware update
bna: Add RX State
bna: Rx Page Based Allocation
bna: TX Intr Coalescing Fix
bna: Tx and Rx Optimizations
bna: Code Cleanup and Enhancements
ath9k: check pdata variable before dereferencing it
ath5k: RX timestamp is reported at end of frame
ath9k_htc: RX timestamp is reported at end of frame
...
Pull x86 timer update from Ingo Molnar:
"This tree includes HPET fixes and also implements a calibration-free,
TSC match driven APIC timer interrupt mode: 'TSC deadline mode'
supported in SandyBridge and later CPUs."
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: hpet: Fix inverted return value check in arch_setup_hpet_msi()
x86: hpet: Fix masking of MSI interrupts
x86: apic: Use tsc deadline for oneshot when available
Pull x86 BSP hotplug changes from Ingo Molnar:
"This tree enables CPU#0 (the boot processor) to be onlined/offlined on
x86, just like any other CPU. Enabled on Intel CPUs for now.
Allowing this required the identification and fixing of latent CPU#0
assumptions (such as CPU#0 initializations, etc.) in the x86
architecture code, plus the identification of barriers to
BSP-offlining, such as active PIC interrupts which can only be
serviced on the BSP.
It's behind a default-off option, and there's a debug option that
allows the automatic testing of this feature.
The motivation of this feature is to allow and prepare for true
CPU-hotplug hardware support: recent changes to MCE support enable us
to detect a deteriorating but not yet hard-failing L1/L2 cache on a
CPU that could be soft-unplugged - or a failing L3 cache on a
multi-socket system.
Note that true hardware hot-plug is not yet fully enabled by this,
because that requires a special platform wakeup sequence to be sent to
the freshly powered up CPU#0. Future patches for this are planned,
once such a platform exists. Chicken and egg"
* 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, topology: Debug CPU0 hotplug
x86/i387.c: Initialize thread xstate only on CPU0 only once
x86, hotplug: Handle retrigger irq by the first available CPU
x86, hotplug: The first online processor saves the MTRR state
x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.
x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
x86-32, hotplug: Add start_cpu0() entry point to head_32.S
x86-64, hotplug: Add start_cpu0() entry point to head_64.S
kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback
x86, hotplug, suspend: Online CPU0 for suspend or hibernate
x86, hotplug: Support functions for CPU0 online/offline
x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it
x86, Kconfig: Add config switch for CPU0 hotplug
doc: Add x86 CPU0 online/offline feature
Pull perf updates from Ingo Molnar:
"Lots of activity:
211 files changed, 8328 insertions(+), 4116 deletions(-)
most of it on the tooling side.
Main changes:
* ftrace enhancements and fixes from Steve Rostedt.
* uprobes fixes, cleanups and preparation for the ARM port from Oleg
Nesterov.
* UAPI fixes, from David Howels - prepares the arch/x86 UAPI
transition
* Separate perf tests into multiple objects, one per test, from Jiri
Olsa.
* Make hardware event translations available in sysfs, from Jiri
Olsa.
* Fixes to /proc/pid/maps parsing, preparatory to supporting data
maps, from Namhyung Kim
* Implement ui_progress for GTK, from Namhyung Kim
* Add framework for automated perf_event_attr tests, where tools with
different command line options will be run from a 'perf test', via
python glue, and the perf syscall will be intercepted to verify
that the perf_event_attr fields set by the tool are those expected,
from Jiri Olsa
* Add a 'link' method for hists, so that we can have the leader with
buckets for all the entries in all the hists. This new method is
now used in the default 'diff' output, making the sum of the
'baseline' column be 100%, eliminating blind spots.
* libtraceevent fixes for compiler warnings trying to make perf it
build on some distros, like fedora 14, 32-bit, some of the warnings
really pointed to real bugs.
* Add a browser for 'perf script' and make it available from the
report and annotate browsers. It does filtering to find the
scripts that handle events found in the perf.data file used. From
Feng Tang
* perf inject changes to allow showing where a task sleeps, from
Andrew Vagin.
* Makefile improvements from Namhyung Kim.
* Add --pre and --post command hooks in 'stat', from Peter Zijlstra.
* Don't stop synthesizing threads when one vanishes, this is for the
existing threads when we start a tool like trace.
* Use sched:sched_stat_runtime to provide a thread summary, this
produces the same output as the 'trace summary' subcommand of
tglx's original "trace" tool.
* Support interrupted syscalls in 'trace'
* Add an event duration column and filter in 'trace'.
* There are references to the man pages in some tools, so try to
build Documentation when installing, warning the user if that is
not possible, from Borislav Petkov.
* Give user better message if precise is not supported, from David
Ahern.
* Try to find cross-built objdump path by using the session
environment information in the perf.data file header, from Irina
Tirdea, original patch and idea by Namhyung Kim.
* Diplays more output on features check for make V=1, so that one can
figure out what is happening by looking at gcc output, etc. From
Jiri Olsa.
* Add on_exit implementation for systems without one, e.g. Android,
from Bernhard Rosenkraenzer.
* Only process events for vcpus of interest, helps handling large
number of events, from David Ahern.
* Cross compilation fixes for Android, from Irina Tirdea.
* Add documentation on compiling for Android, from Irina Tirdea.
* perf diff improvements from Jiri Olsa.
* Target (task/user/cpu/syswide) handling improvements, from Namhyung
Kim.
* Add support in 'trace' for tracing workload given by command line,
from Namhyung Kim.
* ... and much more."
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (194 commits)
uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race
perf evsel: Introduce is_group_member method
perf powerpc: Use uapi/unistd.h to fix build error
tools: Pass the target in descend
tools: Honour the O= flag when tool build called from a higher Makefile
tools: Define a Makefile function to do subdir processing
perf ui: Always compile browser setup code
perf ui: Add ui_progress__finish()
perf ui gtk: Implement ui_progress functions
perf ui: Introduce generic ui_progress helper
perf ui tui: Move progress.c under ui/tui directory
perf tools: Add basic event modifier sanity check
perf tools: Omit group members from perf_evlist__disable/enable
perf tools: Ensure single disable call per event in record comand
perf tools: Fix 'disabled' attribute config for record command
perf tools: Fix attributes for '{}' defined event groups
perf tools: Use sscanf for parsing /proc/pid/maps
perf tools: Add gtk.<command> config option for launching GTK browser
perf tools: Fix compile error on NO_NEWT=1 build
perf hists: Initialize all of he->stat with zeroes
...
This patch adds Kconfig options and kernel parameters to allow the
enabling and disabling of automatic NUMA balancing. The existance
of such a switch was and is very important when debugging problems
related to transparent hugepages and we should have the same for
automatic NUMA placement.
Signed-off-by: Mel Gorman <mgorman@suse.de>
RCU callback execution can add significant OS jitter and also can
degrade both scheduling latency and, in asymmetric multiprocessors,
energy efficiency. This commit therefore adds the ability for selected
CPUs ("rcu_nocbs=" boot parameter) to have their callbacks offloaded
to kthreads. If the "rcu_nocb_poll" boot parameter is also specified,
these kthreads will do polling, removing the need for the offloaded
CPUs to do wakeups. At least one CPU must be doing normal callback
processing: currently CPU 0 cannot be selected as a no-CBs CPU.
In addition, attempts to offline the last normal-CBs CPU will fail.
This feature was inspired by Jim Houston's and Joe Korty's JRCU, and
this commit includes fixes to problems located by Fengguang Wu's
kbuild test robot.
[ paulmck: Added gfp.h include file as suggested by Fengguang Wu. ]
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This driver supports GRCAN and CRHCAN CAN controllers available in the GRLIB
VHDL IP core library.
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
If CONFIG_BOOTPARAM_HOTPLUG_CPU0 is turned on, CPU0 is hotpluggable. Otherwise,
by default CPU0 is not hotpluggable and kernel parameter cpu0_hotplug enables
CPU0 online/offline feature.
The documentations point out two known CPU0 dependencies. First, resume from
hibernate or suspend always starts from CPU0. So hibernate and suspend are
prevented if CPU0 is offline. Another dependency is PIC interrupts always go
to CPU0.
It's said that some machines may depend on CPU0 to poweroff/reboot. But I
haven't seen such dependency on a few tested machines.
Please let me know if you see any CPU0 dependencies on your machine.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Add trace_options to the kernel command line parameter to be able to
set options at early boot. For example, to enable stack dumps of
events, add the following:
trace_options=stacktrace
This along with the trace_event option, you can get not only
traces of the events but also the stack dumps with them.
Requested-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If the TSC deadline mode is supported, LAPIC timer one-shot mode can be
implemented using IA32_TSC_DEADLINE MSR. An interrupt will be generated
when the TSC value equals or exceeds the value in the IA32_TSC_DEADLINE
MSR.
This enables us to skip the APIC calibration during boot. Also, in
xapic mode, this enables us to skip the uncached apic access to re-arm
the APIC timer.
As this timer ticks at the high frequency TSC rate, we use the
TSC_DIVISOR (32) to work with the 32-bit restrictions in the
clockevent API's to avoid 64-bit divides etc (frequency is u32 and
"unsigned long" in the set_next_event(), max_delta limits the next
event to 32-bit for 32-bit kernel).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: venki@google.com
Cc: len.brown@intel.com
Link: http://lkml.kernel.org/r/1350941878.6017.31.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull module signing support from Rusty Russell:
"module signing is the highlight, but it's an all-over David Howells frenzy..."
Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.
* 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
X.509: Fix indefinite length element skip error handling
X.509: Convert some printk calls to pr_devel
asymmetric keys: fix printk format warning
MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
MODSIGN: Make mrproper should remove generated files.
MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
MODSIGN: Use the same digest for the autogen key sig as for the module sig
MODSIGN: Sign modules during the build process
MODSIGN: Provide a script for generating a key ID from an X.509 cert
MODSIGN: Implement module signature checking
MODSIGN: Provide module signing public keys to the kernel
MODSIGN: Automatically generate module signing keys if missing
MODSIGN: Provide Kconfig options
MODSIGN: Provide gitignore and make clean rules for extra files
MODSIGN: Add FIPS policy
module: signature checking hook
X.509: Add a crypto key parser for binary (DER) X.509 certificates
MPILIB: Provide a function to read raw data into an MPI
X.509: Add an ASN.1 decoder
X.509: Add simple ASN.1 grammar compiler
...
Features include:
- Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
Aside from the issues discussed at the LKS, distros are shipping
NFSv4.1 with all the trimmings.
- Fix fdatasync()/fsync() for the corner case of a server reboot.
- NFSv4 OPEN access fix: finally distinguish correctly between
open-for-read and open-for-execute permissions in all situations.
- Ensure that the TCP socket is closed when we're in CLOSE_WAIT
- More idmapper bugfixes
- Lots of pNFS bugfixes and cleanups to remove unnecessary state and
make the code easier to read.
- In cases where a pNFS read or write fails, allow the client to
resume trying layoutgets after two minutes of read/write-through-mds.
- More net namespace fixes to the NFSv4 callback code.
- More net namespace fixes to the NFSv3 locking code.
- More NFSv4 migration preparatory patches.
Including patches to detect network trunking in both NFSv4 and NFSv4.1
- pNFS block updates to optimise LAYOUTGET calls.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQdMvBAAoJEGcL54qWCgDyV84P/0XvcEXj6kdMv9EiWfRczo7r
iAwAIhiEmG1agtZa6v+Gso2MYRQbkGyJi0LKIwzGqNUi0BLQGQCoV93kB0ITVpiN
g7poDTnPyoItW1oJCtC48/Mx0G5C1yrHSwFAJrXmtzDF1mwd/BIQReafYp6x+/TU
Mvwm7au3Y2ySRBEDmY4zyBERHXGt//JmsZ9Ays6jewQg5ZOyjDQKoeHVYaaeJoF0
A0tQGcBSNdySagI5dt4SlkuO7AClhzVHlilep2dsBu/TLS0F2pEdHXvM2W0koZmM
uazaIpzd2F7TfokTYExgsyKsqpkzpDf1kebN4Y1+Ioi7Yy30dQrX6lNaUNcOmOJQ
xx694HDHV90KdRBVSFhOIHMTBRcls68hBcWib3MXWHTKX6HVgnFMwhwxGH0MRezf
3rmXoqn+CO1j5WeQmA3BqdVbHSZHi913TKEwE/qoW4pmOFhv5I2flXWQS/Rwvdng
2xDCe6TlvhMS92IpyvNEIicXLRSm+DUAmoAfSqqlifZIAEM5R29e/wCAsmVprO3B
LPHyUoIMO6SZ1PL6Rk20+6qQfvCK7U/ChULsUL/zb7R88Pc3sFE2BeAvZVATsvH3
+FJWTz43fwUBoMhPsn8xSBLn/fq6az5C19syz6Fpu3DZ4X0EwyVWifiFk6HgcxZD
J8ajEl+dNZeFE8rkwykX
=uBk7
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Features include:
- Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
Aside from the issues discussed at the LKS, distros are shipping
NFSv4.1 with all the trimmings.
- Fix fdatasync()/fsync() for the corner case of a server reboot.
- NFSv4 OPEN access fix: finally distinguish correctly between
open-for-read and open-for-execute permissions in all situations.
- Ensure that the TCP socket is closed when we're in CLOSE_WAIT
- More idmapper bugfixes
- Lots of pNFS bugfixes and cleanups to remove unnecessary state and
make the code easier to read.
- In cases where a pNFS read or write fails, allow the client to
resume trying layoutgets after two minutes of read/write-
through-mds.
- More net namespace fixes to the NFSv4 callback code.
- More net namespace fixes to the NFSv3 locking code.
- More NFSv4 migration preparatory patches.
Including patches to detect network trunking in both NFSv4 and
NFSv4.1
- pNFS block updates to optimise LAYOUTGET calls."
* tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (113 commits)
pnfsblock: cleanup nfs4_blkdev_get
NFS41: send real read size in layoutget
NFS41: send real write size in layoutget
NFS: track direct IO left bytes
NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()
NFSv4.1: Ensure that the layout sequence id stays 'close' to the current
NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code
NFSv4 set open access operation call flag in nfs4_init_opendata_res
NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL
NFSv4 reduce attribute requests for open reclaim
NFSv4: nfs4_open_done first must check that GETATTR decoded a file type
NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid
NFSv4.1: Deal with wraparound issues when updating the layout stateid
NFSv4.1: Always set the layout stateid if this is the first layoutget
NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout
NFSv4: don't put ACCESS in OPEN compound if O_EXCL
NFSv4: don't check MAY_WRITE access bit in OPEN
NFS: Set key construction data for the legacy upcall
NFSv4.1: don't do two EXCHANGE_IDs on mount
NFS: nfs41_walk_client_list(): re-lock before iterating
...
We do a very simple search for a particular string appended to the module
(which is cache-hot and about to be SHA'd anyway). There's both a config
option and a boot parameter which control whether we accept or fail with
unsigned modules and modules that are signed with an unknown key.
If module signing is enabled, the kernel will be tainted if a module is
loaded that is unsigned or has a signature for which we don't have the
key.
(Useful feedback and tweaks by David Howells <dhowells@redhat.com>)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Pull security subsystem updates from James Morris:
"Highlights:
- Integrity: add local fs integrity verification to detect offline
attacks
- Integrity: add digital signature verification
- Simple stacking of Yama with other LSMs (per LSS discussions)
- IBM vTPM support on ppc64
- Add new driver for Infineon I2C TIS TPM
- Smack: add rule revocation for subject labels"
Fixed conflicts with the user namespace support in kernel/auditsc.c and
security/integrity/ima/ima_policy.c.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (39 commits)
Documentation: Update git repository URL for Smack userland tools
ima: change flags container data type
Smack: setprocattr memory leak fix
Smack: implement revoking all rules for a subject label
Smack: remove task_wait() hook.
ima: audit log hashes
ima: generic IMA action flag handling
ima: rename ima_must_appraise_or_measure
audit: export audit_log_task_info
tpm: fix tpm_acpi sparse warning on different address spaces
samples/seccomp: fix 31 bit build on s390
ima: digital signature verification support
ima: add support for different security.ima data types
ima: add ima_inode_setxattr/removexattr function and calls
ima: add inode_post_setattr call
ima: replace iint spinblock with rwlock/read_lock
ima: allocating iint improvements
ima: add appraise action keywords and default rules
ima: integrity appraisal extension
vfs: move ima_file_free before releasing the file
...
This is a large set of updates, mostly for drivers (qla2xxx [including support
for new 83xx based card], qla4xxx, mpt2sas, bfa, zfcp, hpsa, be2iscsi, isci,
lpfc, ipr, ibmvfc, ibmvscsi, megaraid_sas). There's also a rework for tape
adding virtually unlimited numbers of tape drives plus a set of dif fixes for
sd and a fix for a live lock on hot remove of SCSI devices.
This round includes a signed tag pull of isci-for-3.6
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQEcBAABAgAGBQJQaqCFAAoJEDeqqVYsXL0MKJ4IALg/Obnk0/fNvBUNIrh5zRmj
r9UlXFJnlEDT03qRGdn8okgWMChbgaD1ZrwDTQnjNsabVQoTXI6oO6/uL2c8crpY
BFBwJvkNJS99nbcZv10CpJ3K7ykmRnKlkYon12iknhGwdtU+XJ14Z4PUcZkI9jmg
sBQQ6uNVWyosaONNE+k6o+dw6OTttJkzRX8e9in3thstxNTcG+h9iB1zZ/ETkSEj
tD4MyOgDiPf8kPV2awQThQGpni9Tu3SQr5dEn/iUUktUjiYsDNQuyaAk+QzyhUU7
D35iIJnIHlXTSTMQkrG4qpJHBvqPkWlYJzaOmheQryQ3vzp2C5Ly/hS9il45uIQ=
=49u9
-----END PGP SIGNATURE-----
Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull first round of SCSI updates from James Bottomley:
"This is a large set of updates, mostly for drivers (qla2xxx [including
support for new 83xx based card], qla4xxx, mpt2sas, bfa, zfcp, hpsa,
be2iscsi, isci, lpfc, ipr, ibmvfc, ibmvscsi, megaraid_sas).
There's also a rework for tape adding virtually unlimited numbers of
tape drives plus a set of dif fixes for sd and a fix for a live lock
on hot remove of SCSI devices.
This round includes a signed tag pull of isci-for-3.6
Signed-off-by: James Bottomley <JBottomley@Parallels.com>"
Fix up trivial conflict in drivers/scsi/qla2xxx/qla_nx.c due to new PCI
helper function use in a function that was removed by this pull.
* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (198 commits)
[SCSI] st: remove st_mutex
[SCSI] sd: Ensure we correctly disable devices with unknown protection type
[SCSI] hpsa: gen8plus Smart Array IDs
[SCSI] qla4xxx: Update driver version to 5.03.00-k1
[SCSI] qla4xxx: Disable generating pause frames for ISP83XX
[SCSI] qla4xxx: Fix double clearing of risc_intr for ISP83XX
[SCSI] qla4xxx: IDC implementation for Loopback
[SCSI] qla4xxx: update copyrights in LICENSE.qla4xxx
[SCSI] qla4xxx: Fix panic while rmmod
[SCSI] qla4xxx: Fail probe_adapter if IRQ allocation fails
[SCSI] qla4xxx: Prevent MSI/MSI-X falling back to INTx for ISP82XX
[SCSI] qla4xxx: Update idc reg in case of PCI AER
[SCSI] qla4xxx: Fix double IDC locking in qla4_8xxx_error_recovery
[SCSI] qla4xxx: Clear interrupt while unloading driver for ISP83XX
[SCSI] qla4xxx: Print correct IDC version
[SCSI] qla4xxx: Added new mbox cmd to pass driver version to FW
[SCSI] scsi_dh_alua: Enable STPG for unavailable ports
[SCSI] scsi_remove_target: fix softlockup regression on hot remove
[SCSI] ibmvscsi: Fix host config length field overflow
[SCSI] ibmvscsi: Remove backend abstraction
...
An optional boot parameter is introduced to allow client
administrators to specify a string that the Linux NFS client can
insert into its nfs_client_id4 id string, to make it both more
globally unique, and to ensure that it doesn't change even if the
client's nodename changes.
If this boot parameter is not specified, the client's nodename is
used, as before.
Client installation procedures can create a unique string (typically,
a UUID) which remains unchanged during the lifetime of that client
instance. This works just like creating a UUID for the label of the
system's root and boot volumes.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Pull x86/smap support from Ingo Molnar:
"This adds support for the SMAP (Supervisor Mode Access Prevention) CPU
feature on Intel CPUs: a hardware feature that prevents unintended
user-space data access from kernel privileged code.
It's turned on automatically when possible.
This, in combination with SMEP, makes it even harder to exploit kernel
bugs such as NULL pointer dereferences."
Fix up trivial conflict in arch/x86/kernel/entry_64.S due to newly added
includes right next to each other.
* 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, smep, smap: Make the switching functions one-way
x86, suspend: On wakeup always initialize cr4 and EFER
x86-32: Start out eflags and cr4 clean
x86, smap: Do not abuse the [f][x]rstor_checking() functions for user space
x86-32, smap: Add STAC/CLAC instructions to 32-bit kernel entry
x86, smap: Reduce the SMAP overhead for signal handling
x86, smap: A page fault due to SMAP is an oops
x86, smap: Turn on Supervisor Mode Access Prevention
x86, smap: Add STAC and CLAC instructions to control user space access
x86, uaccess: Merge prototypes for clear_user/__clear_user
x86, smap: Add a header file with macros for STAC/CLAC
x86, alternative: Add header guards to <asm/alternative-asm.h>
x86, alternative: Use .pushsection/.popsection
x86, smap: Add CR4 bit for SMAP
x86-32, mm: The WP test should be done on a kernel page
Pull x86/fpu update from Ingo Molnar:
"The biggest change is the addition of the non-lazy (eager) FPU saving
support model and enabling it on CPUs with optimized xsaveopt/xrstor
FPU state saving instructions.
There are also various Sparse fixes"
Fix up trivial add-add conflict in arch/x86/kernel/traps.c
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, kvm: fix kvm's usage of kernel_fpu_begin/end()
x86, fpu: remove cpu_has_xmm check in the fx_finit()
x86, fpu: make eagerfpu= boot param tri-state
x86, fpu: enable eagerfpu by default for xsaveopt
x86, fpu: decouple non-lazy/eager fpu restore from xsave
x86, fpu: use non-lazy fpu restore for processors supporting xsave
lguest, x86: handle guest TS bit for lazy/non-lazy fpu host models
x86, fpu: always use kernel_fpu_begin/end() for in-kernel FPU usage
x86, kvm: use kernel_fpu_begin/end() in kvm_load/put_guest_fpu()
x86, fpu: remove unnecessary user_fpu_end() in save_xstate_sig()
x86, fpu: drop_fpu() before restoring new state from sigframe
x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels
x86, fpu: Consolidate inline asm routines for saving/restoring fpu state
x86, signal: Cleanup ifdefs and is_ia32, is_x32
Pull x86/asm changes from Ingo Molnar:
"The one change that stands out is the alternatives patching change
that prevents us from ever patching back instructions from SMP to UP:
this simplifies things and speeds up CPU hotplug.
Other than that it's smaller fixes, cleanups and improvements."
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Unspaghettize do_trap()
x86_64: Work around old GAS bug
x86: Use REP BSF unconditionally
x86: Prefer TZCNT over BFS
x86/64: Adjust types of temporaries used by ffs()/fls()/fls64()
x86: Drop unnecessary kernel_eflags variable on 64-bit
x86/smp: Don't ever patch back to UP if we unplug cpus
Although almost everyone is well-served by the defaults, some uses of RCU
benefit from shorter grace periods, while others benefit more from the
greater efficiency provided by longer grace periods. Situations requiring
a large number of grace periods to elapse (and wireshark startup has
been called out as an example of this) are helped by lower-latency
grace periods. Furthermore, in some embedded applications, people are
willing to accept a small degradation in update efficiency (due to there
being more of the shorter grace-period operations) in order to gain the
lower latency.
In contrast, those few systems with thousands of CPUs need longer grace
periods because the CPU overhead of a grace period rises roughly
linearly with the number of CPUs. Such systems normally do not make
much use of facilities that require large numbers of grace periods to
elapse, so this is a good tradeoff.
Therefore, this commit allows the durations to be controlled from sysfs.
There are two sysfs parameters, one named "jiffies_till_first_fqs" that
specifies the delay in jiffies from the end of grace-period initialization
until the first attempt to force quiescent states, and the other named
"jiffies_till_next_fqs" that specifies the delay (again in jiffies)
between subsequent attempts to force quiescent states. They both default
to three jiffies, which is compatible with the old hard-coded behavior.
At some future time, it may be possible to automatically increase the
grace-period length with the number of CPUs, but we do not yet have
sufficient data to do a good job. Preliminary data indicates that we
should add an addiitonal jiffy to each of the delays for every 200 CPUs
in the system, but more experimentation is needed. For now, the number
of systems with more than 1,000 CPUs is small enough that this can be
relegated to boot-time hand tuning.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reason for merge:
x86/fpu changed the structure of some of the code that x86/smap
changes; mostly fpu-internal.h but also minor changes to the
signal code.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Resolved Conflicts:
arch/x86/ia32/ia32_signal.c
arch/x86/include/asm/fpu-internal.h
arch/x86/kernel/signal.c
Add the "eagerfpu=auto" (that selects the default scheme in
enabling eagerfpu) which can override compiled-in boot parameters
like "eagerfpu=on/off" (that force enable/disable eagerfpu).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1347300665-6209-5-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
xsaveopt/xrstor support optimized state save/restore by tracking the
INIT state and MODIFIED state during context-switch.
Enable eagerfpu by default for processors supporting xsaveopt.
Can be disabled by passing "eagerfpu=off" boot parameter.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1347300665-6209-3-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Decouple non-lazy/eager fpu restore policy from the existence of the xsave
feature. Introduce a synthetic CPUID flag to represent the eagerfpu
policy. "eagerfpu=on" boot paramter will enable the policy.
Requested-by: H. Peter Anvin <hpa@zytor.com>
Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1347300665-6209-2-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Unlike the IMA measurement policy, the appraise policy can not be dependent
on runtime process information, such as the task uid, as the 'security.ima'
xattr is written on file close and must be updated each time the file changes,
regardless of the current task uid.
This patch extends the policy language with 'fowner', defines an appraise
policy, which appraises all files owned by root, and defines 'ima_appraise_tcb',
a new boot command line option, to enable the appraise policy.
Changelog v3:
- separate the measure from the appraise rules in order to support measuring
without appraising and appraising without measuring.
- change appraisal default for filesystems without xattr support to fail
- update default appraise policy for cgroups
Changelog v1:
- don't appraise RAMFS (Dmitry Kasatkin)
- merged rest of "ima: ima_must_appraise_or_measure API change" commit
(Dmtiry Kasatkin)
ima_must_appraise_or_measure() called ima_match_policy twice, which
searched the policy for a matching rule. Once for a matching measurement
rule and subsequently for an appraisal rule. Searching the policy twice
is unnecessary overhead, which could be noticeable with a large policy.
The new version of ima_must_appraise_or_measure() does everything in a
single iteration using a new version of ima_match_policy(). It returns
IMA_MEASURE, IMA_APPRAISE mask.
With the use of action mask only one efficient matching function
is enough. Removed other specific versions of matching functions.
Changelog:
- change 'owner' to 'fowner' to conform to the new LSM conditions posted by
Roberto Sassu.
- fix calls to ima_log_string()
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>