Replace the clockevents_notify() call with an explicit function call.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1528188.S1pjqkSL1P@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We write a stack pointer to MSR_IA32_SYSENTER_ESP exactly once,
and we unnecessarily cache the value in tss.sp1. We never
read the cached value.
Remove all of the caching. It serves no purpose.
Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/05a0163eb33ef5208363f0015496855da7cebadd.1428002830.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
At Denys' request, clean up the comment describing stack padding
in the 32-bit sysenter path.
No code changes.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/41fee7bb8490ae840fe7ef2699f9c2feb932e729.1428002830.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change alpha_rtc_set_mmss() and remote_set_mmss() to use
rtc_class_ops's set_mmss64(), to be y2038 safe.
Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Henderson <rth@twiddle.net>
Link: http://lkml.kernel.org/r/1427945681-29972-15-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As part of addressing "y2038 problem" for in-kernel uses, this
patch converts read_boot_clock() to read_boot_clock64() and
read_persistent_clock() to read_persistent_clock64() using
timespec64 by converting clock_access_fn to use timespec64.
Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Thierry Reding <treding@nvidia.com> (for tegra part)
Cc: Russell King <rmk@dyn-67.arm.linux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427945681-29972-7-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As part of addressing "y2038 problem" for in-kernel uses, this
patch adds the y2038-safe omap_read_persistent_clock64() using
timespec64.
Because we rely on some subsequent changes to convert arm
multiarch support, omap_read_persistent_clock() will be removed
then.
Also remove the needless spinlock, because
read_persistent_clock() doesn't run simultaneously.
Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Tony Lindgren <tony@atomide.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427945681-29972-5-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As part of addressing in-kernel y2038 issues, this patch adds
read_persistent_clock64() and replaces all the call sites of
read_persistent_clock() with this function. This is a __weak
implementation, which simply calls the existing y2038 unsafe
read_persistent_clock().
This allows architecture specific implementations to be
converted independently, and eventually the y2038 unsafe
read_persistent_clock() can be removed after all its
architecture specific implementations have been converted to
read_persistent_clock64().
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427945681-29972-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add support for the new CLWB (cache line write back)
instruction. This instruction was announced in the document
"Intel Architecture Instruction Set Extensions Programming
Reference" with reference number 319433-022.
https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
The CLWB instruction is used to write back the contents of
dirtied cache lines to memory without evicting the cache lines
from the processor's cache hierarchy. This should be used in
favor of clflushopt or clflush in cases where you require the
cache line to be written to memory but plan to access the data
again in the near future.
One of the main use cases for this is with persistent memory
where CLWB can be used with PCOMMIT to ensure that data has been
accepted to memory and is durable on the DIMM.
This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH
and PCOMMIT with appropriate fencing:
void flush_and_commit_buffer(void *vaddr, unsigned int size)
{
void *vend = vaddr + size - 1;
for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
clwb(vaddr);
/* Flush any possible final partial cacheline */
clwb(vend);
/*
* Use SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes.
* (MFENCE via mb() also works)
*/
wmb();
/* PCOMMIT and the required SFENCE for ordering */
pcommit_sfence();
}
After this function completes the data pointed to by vaddr is
has been accepted to memory and will be durable if the vaddr
points to persistent memory.
Regarding the details of how the alternatives assembly is set
up, we need one additional byte at the beginning of the CLFLUSH
so that we can flip it into a CLFLUSHOPT by changing that byte
into a 0x66 prefix. Two options are to either insert a 1 byte
ASM_NOP1, or to add a 1 byte NOP_DS_PREFIX. Both have no
functional effect with the plain CLFLUSH, but I've been told
that executing a CLFLUSH + prefix should be faster than
executing a CLFLUSH + NOP.
We had to hard code the assembly for CLWB because, lacking the
ability to assemble the CLWB instruction itself, the next
closest thing is to have an xsaveopt instruction with a 0x66
prefix. Unfortunately XSAVEOPT itself is also relatively new,
and isn't included by all the GCC versions that the kernel needs
to support.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1422377631-8986-3-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Fix two regressions in the balloon driver's use of memory hotplug
when used in a PV guest.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJVHWimAAoJEFxbo/MsZsTRSFkH/A1SYwcqUADPVZYDSc2T10Km
1kF7UifizlJ8/gCbEbhzLMDSPTwEENzN5ONWXQ+a3EXuK0UwLBnic9uKWAg2Q8cp
+uIpU1P15yM7rJFRaf6PZ2BqFdvWpXUJgNnof4+r0uqqa57hdmVOmVMKFenlf/h7
2jj0b19wS6jhuwl9PzgKTKQyjnpWrmkhfg5tDQzn1qUCHoKDIhM7ROUP7wsYfIu8
iAT3JV9lmNwdGT5jwf4MgRmBpqGZzQs03xaAz2E9/fnWWv7u4EuZcmbDcooUmi7H
luHQ2Dds1ajhjgtRwPoxT0tHFuRl7s0jsyJ8GOLsdmoMkJTAQnQA21egI7Bpdfo=
=HHbI
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-4.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen regression fixes from David Vrabel:
"Fix two regressions in the balloon driver's use of memory hotplug when
used in a PV guest"
* tag 'stable/for-linus-4.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: before adding hotplugged memory, set frames to invalid
x86/xen: prepare p2m list for memory hotplug
Conflicts:
drivers/net/usb/asix_common.c
drivers/net/usb/sr9800.c
drivers/net/usb/usbnet.c
include/linux/usb/usbnet.h
net/ipv4/tcp_ipv4.c
net/ipv6/tcp_ipv6.c
The TCP conflicts were overlapping changes. In 'net' we added a
READ_ONCE() to the socket cached RX route read, whilst in 'net-next'
Eric Dumazet touched the surrounding code dealing with how mini
sockets are handled.
With USB, it's a case of the same bug fix first going into net-next
and then I cherry picked it back into net.
Signed-off-by: David S. Miller <davem@davemloft.net>
On a 32-bit build I got:
arch/x86/kernel/cpu/perf_event_intel_pt.c:413:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
arch/x86/kernel/cpu/perf_event_intel_bts.c:162:24: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
Fix it. The code should probably be (re-)tested on 32-bit systems to make
sure all is fine.
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: linux-kernel@vger.kernel.org
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf with LBRs on has a tendency to rewrite the DEBUGCTL MSR with
the same value. Add a little optimization to skip the unnecessary
write.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1426871484-21285-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The perf PMI currently does unnecessary MSR accesses when
LBRs are enabled. We use LBR freezing, or when in callstack
mode force the LBRs to only filter on ring 3.
So there is no need to disable the LBRs explicitely in the
PMI handler.
Also we always unnecessarily rewrite LBR_SELECT in the LBR
handler, even though it can never change.
5) | /* write_msr: MSR_LBR_SELECT(1c8), value 0 */
5) | /* read_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */
5) | /* write_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */
5) | /* write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f */
5) | /* write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 */
5) | /* write_msr: MSR_LBR_SELECT(1c8), value 0 */
5) | /* read_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */
5) | /* write_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */
This patch:
- Avoids disabling already frozen LBRs unnecessarily in the PMI
- Avoids changing LBR_SELECT in the PMI
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1426871484-21285-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Technically PEBS_ENABLED is only guaranteed to exist when we
detected PEBS. So add a check for this to the PMU dump function.
I don't think it can happen on a real CPU, but could in a VM.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1425059312-18217-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
LBRs and LBR freezing are controlled through the DEBUGCTL MSR. So
dump the state of DEBUGCTL too when dumping the PMU state.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1425059312-18217-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The PMU reset code didn't quite keep up with newer PMU features.
Improve it a bit to really reset a modern PMU:
- Clear all overflow status
- Clear LBRs and freezing state
- Disable fixed counters too
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1425059312-18217-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch disables the PMU HT bug when Hyperthreading (HT)
is disabled. We cannot do this test immediately when perf_events
is initialized. We need to wait until the topology information
is setup properly. As such, we register a later initcall, check
the topology and potentially disable the workaround. To do this,
we need to ensure there is no user of the PMU. At this point of
the boot, the only user is the NMI watchdog, thus we disable
it during the switch and re-enable it right after.
Having the workaround disabled when it is not needed provides
some benefits by limiting the overhead is time and space.
The workaround still ensures correct scheduling of the corrupting
memory events (0xd0, 0xd1, 0xd2) when HT is off. Those events
can only be measured on counters 0-3. Something else the current
kernel did not handle correctly.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Cc: maria.n.dimakopoulou@gmail.com
Link: http://lkml.kernel.org/r/1416251225-17721-13-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch limits the number of counters available to each CPU when
the HT bug workaround is enabled.
This is necessary to avoid situation of counter starvation. Such can
arise from configuration where one HT thread, HT0, is using all 4 counters
with corrupting events which require exclusion the the sibling HT, HT1.
In such case, HT1 would not be able to schedule any event until HT0
is done. To mitigate this problem, this patch artificially limits
the number of counters to 2.
That way, we can gurantee that at least 2 counters are not in exclusive
mode and therefore allow the sibling thread to schedule events of the
same type (system vs. per-thread). The 2 counters are not determined
in advance. We simply set the limit to two events per HT.
This helps mitigate starvation in case of events with specific counter
constraints such a PREC_DIST.
Note that this does not elimintate the starvation is all cases. But
it is better than not having it.
(Solution suggested by Peter Zjilstra.)
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Cc: maria.n.dimakopoulou@gmail.com
Link: http://lkml.kernel.org/r/1416251225-17721-11-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patches activates the HT bug workaround for the
SNB/IVB/HSW processors. This covers non-PEBS mode.
Activation is done thru the constraint tables.
Both client and server processors needs this workaround.
Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1416251225-17721-8-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch implements a software workaround for a HW erratum
on Intel SandyBridge, IvyBridge and Haswell processors
with Hyperthreading enabled. The errata are documented for
each processor in their respective specification update
documents:
- SandyBridge: BJ122
- IvyBridge: BV98
- Haswell: HSD29
The bug causes silent counter corruption across hyperthreads only
when measuring certain memory events (0xd0, 0xd1, 0xd2, 0xd3).
Counters measuring those events may leak counts to the sibling
counter. For instance, counter 0, thread 0 measuring event 0xd0,
may leak to counter 0, thread 1, regardless of the event measured
there. The size of the leak is not predictible. It all depends on
the workload and the state of each sibling hyper-thread. The
corrupting events do undercount as a consequence of the leak. The
leak is compensated automatically only when the sibling counter measures
the exact same corrupting event AND the workload is on the two threads
is the same. Given, there is no way to guarantee this, a work-around
is necessary. Furthermore, there is a serious problem if the leaked count
is added to a low-occurrence event. In that case the corruption on
the low occurrence event can be very large, e.g., orders of magnitude.
There is no HW or FW workaround for this problem.
The bug is very easy to reproduce on a loaded system.
Here is an example on a Haswell client, where CPU0, CPU4
are siblings. We load the CPUs with a simple triad app
streaming large floating-point vector. We use 0x81d0
corrupting event (MEM_UOPS_RETIRED:ALL_LOADS) and
0x20cc (ROB_MISC_EVENTS:LBR_INSERTS). Given we are not
using the LBR, the 0x20cc event should be zero.
$ taskset -c 0 triad &
$ taskset -c 4 triad &
$ perf stat -a -C 0 -e r81d0 sleep 100 &
$ perf stat -a -C 4 -r20cc sleep 10
Performance counter stats for 'system wide':
139 277 291 r20cc
10,000969126 seconds time elapsed
In this example, 0x81d0 and r20cc ar eusing sinling counters
on CPU0 and CPU4. 0x81d0 leaks into 0x20cc and corrupts it
from 0 to 139 millions occurrences.
This patch provides a software workaround to this problem by modifying the
way events are scheduled onto counters by the kernel. The patch forces
cross-thread mutual exclusion between counters in case a corrupting event
is measured by one of the hyper-threads. If thread 0, counter 0 is measuring
event 0xd0, then nothing can be measured on counter 0, thread 1. If no corrupting
event is measured on any hyper-thread, event scheduling proceeds as before.
The same example run with the workaround enabled, yield the correct answer:
$ taskset -c 0 triad &
$ taskset -c 4 triad &
$ perf stat -a -C 0 -e r81d0 sleep 100 &
$ perf stat -a -C 4 -r20cc sleep 10
Performance counter stats for 'system wide':
0 r20cc
10,000969126 seconds time elapsed
The patch does provide correctness for all non-corrupting events. It does not
"repatriate" the leaked counts back to the leaking counter. This is planned
for a second patch series. This patch series makes this repatriation more
easy by guaranteeing the sibling counter is not measuring any useful event.
The patch introduces dynamic constraints for events. That means that events which
did not have constraints, i.e., could be measured on any counters, may now be
constrained to a subset of the counters depending on what is going on the sibling
thread. The algorithm is similar to a cache coherency protocol. We call it XSU
in reference to Exclusive, Shared, Unused, the 3 possible states of a PMU
counter.
As a consequence of the workaround, users may see an increased amount of event
multiplexing, even in situtations where there are fewer events than counters
measured on a CPU.
Patch has been tested on all three impacted processors. Note that when
HT is off, there is no corruption. However, the workaround is still enabled,
yet not costing too much. Adding a dynamic detection of HT on turned out to
be complex are requiring too much to code to be justified.
This patch addresses the issue when PEBS is not used. A subsequent patch
fixes the problem when PEBS is used.
Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
[spinlock_t -> raw_spinlock_t]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1416251225-17721-7-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch adds a new shared_regs style structure to the
per-cpu x86 state (cpuc). It is used to coordinate access
between counters which must be used with exclusion across
HyperThreads on Intel processors. This new struct is not
needed on each PMU, thus is is allocated on demand.
Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
[peterz: spinlock_t -> raw_spinlock_t]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1416251225-17721-6-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch adds an index parameter to the get_event_constraint()
x86_pmu callback. It is expected to represent the index of the
event in the cpuc->event_list[] array. When the callback is used
for fake_cpuc (evnet validation), then the index must be -1. The
motivation for passing the index is to use it to index into another
cpuc array.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Cc: maria.n.dimakopoulou@gmail.com
Link: http://lkml.kernel.org/r/1416251225-17721-5-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch adds 3 new PMU model specific callbacks
during the event scheduling done by x86_schedule_events().
->start_scheduling(): invoked when entering the schedule routine.
->stop_scheduling(): invoked at the end of the schedule routine
->commit_scheduling(): invoked for each committed event
To be used optionally by model-specific code.
Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1416251225-17721-4-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make the cpuc->kfree_on_online a vector to accommodate
more than one entry and add the second entry to be
used by a later patch.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1416251225-17721-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add support for Branch Trace Store (BTS) via kernel perf event infrastructure.
The difference with the existing implementation of BTS support is that this
one is a separate PMU that exports events' trace buffers to userspace by means
of AUX area of the perf buffer, which is zero-copy mapped into userspace.
The immediate benefit is that the buffer size can be much bigger, resulting in
fewer interrupts and no kernel side copying is involved and little to no trace
data loss. Also, kernel code can be traced with this driver.
The old way of collecting BTS traces still works.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1422614435-114702-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add support for Intel Processor Trace (PT) to kernel's perf events.
PT is an extension of Intel Architecture that collects information about
software execuction such as control flow, execution modes and timings and
formats it into highly compressed binary packets. Even being compressed,
these packets are generated at hundreds of megabytes per second per core,
which makes it impractical to decode them on the fly in the kernel.
This driver exports trace data by through AUX space in the perf ring
buffer, which is zero-copy mapped into userspace for faster data retrieval.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1422614392-114498-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Intel PT cannot be used at the same time as LBR or BTS and will cause a
general protection fault if they are used together. In order to avoid
fixing up GPs in the fast path, instead we disallow creating LBR/BTS
events when PT events are present and vice versa.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1421237903-181015-12-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some of the CYCLE_ACTIVITY.* events can only be scheduled on
counter 2. Due to a typo Haswell matched those with
INTEL_EVENT_CONSTRAINT, which lead to the events never
matching as the comparison does not expect anything
in the umask too. Fix the typo.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1425925222-32361-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For supporting Intel LBR branches filtering, Intel LBR sharing logic
mechanism is introduced from commit b36817e886 ("perf/x86: Add Intel
LBR sharing logic"). It modifies __intel_shared_reg_get_constraints() to
config lbr_sel, which is finally used to set LBR_SELECT.
However, the intel_shared_regs_constraints() function is called after
intel_pebs_constraints(). The PEBS event will return immediately after
intel_pebs_constraints(). So it's impossible to filter branches for PEBS
events.
This patch moves intel_shared_regs_constraints() ahead of
intel_pebs_constraints().
We can safely do that because the intel_shared_regs_constraints() function
only returns empty constraint if its rejecting the event, otherwise it
returns NULL such that we continue calling intel_pebs_constraints() and
x86_get_event_constraint().
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1427467105-9260-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Prior to this commit, it was impossible to use relative path to
include Makefiles from the top level Makefile because the option
"--include-dir=$(srctree)" becomes effective when Make enters into
sub Makefiles.
To use relative path in any places, this commit moves the option
above the "sub-make" target.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
The "MAKEFLAGS += --include-dir=$(srctree)" line in the top Makefile
allows us to do this.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Some of x86 bare-metal and Xen CPU initialization code is common
between the two and therefore can be factored out to avoid code
duplication.
As a side effect, doing so will also extend the fix provided by
commit a7fcf28d43 ("x86/asm/entry: Replace this_cpu_sp0() with
current_top_of_stack() to x86_32") to 32-bit Xen PV guests.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: konrad.wilk@oracle.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1427897534-5086-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch fixes an error in kgdb for x86_64 which would report
the value of dx when asked to give the value of si.
Signed-off-by: Steffen Liebergeld <steffen.liebergeld@kernkonzept.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When I wrote the opportunistic SYSRET code, I missed an important difference
between SYSRET and IRET.
Both instructions are capable of setting EFLAGS.TF, but they behave differently
when doing so:
- IRET will not issue a #DB trap after execution when it sets TF.
This is critical -- otherwise you'd never be able to make forward progress when
returning to userspace.
- SYSRET, on the other hand, will trap with #DB immediately after
returning to CPL3, and the next instruction will never execute.
This breaks anything that opportunistically SYSRETs to a user
context with TF set. For example, running this code with TF set
and a SIGTRAP handler loaded never gets past 'post_nop':
extern unsigned char post_nop[];
asm volatile ("pushfq\n\t"
"popq %%r11\n\t"
"nop\n\t"
"post_nop:"
: : "c" (post_nop) : "r11");
In my defense, I can't find this documented in the AMD or Intel manual.
Fix it by using IRET to restore TF.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2a23c6b8a9 ("x86_64, entry: Use sysret to return to userspace when possible")
Link: http://lkml.kernel.org/r/9472f1ca4c19a38ecda45bba9c91b7168135fcfa.1427923514.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Enabling CPU_DCACHE_DISABLE on a SMP capable system will prevent the
kernel from booting because of the following ldrex instruction in
arch_spin_lock:
(gdb) x/10i $pc
=> 0xc053cfa8 <_raw_spin_lock+4>: ldrex r3, [r0]
0xc053cfac <_raw_spin_lock+8>: add r2, r3, #65536 ; 0x10000
which is taken by the very first printk call:
at /home/fainelli/work/linux/arch/arm/include/asm/spinlock.h:65
fmt=0xc0637650 " 01 66Booting Linux on physical CPU 0x%xn", args=<incomplete type>)
at kernel/printk/printk.c:1525
fmt=0xc05370f4 <printk+52> " 24320215342 04340235344 20320215342 36377/341 17") at kernel/printk/printk.c:1688
ldrex requires exclusive monitor(s) (local or global) which are no longer
working when the Data cache is disabled in CP15 and will just hang the CPU
there.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Several interrupt controllers support both edge and level interrupts, so
it's useful to provide that information in /proc/interrupts.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When trying to kexec into a new kernel on a platform where multiple CPU
cores are present, but no SMP bringup code is available yet, the
kexec_load system call fails with:
kexec_load failed: Invalid argument
The SMP test added to machine_kexec_prepare() in commit 2103f6cba6
("ARM: 7807/1: kexec: validate CPU hotplug support") wants to prohibit
kexec on SMP platforms where it cannot disable secondary CPUs.
However, this test is too strict: if the secondary CPUs couldn't be
enabled in the first place, there's no need to disable them later at
kexec time. Hence skip the test in the absence of SMP bringup code.
This allows to add all CPU cores to the DTS from the beginning, without
having to implement SMP bringup first, improving DT compatibility.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
IOMMU should be able to use single pages as well as bigger blocks, so if
higher order allocations fail, we should not affect state of the system,
with events such as OOM killer, but rather fall back to order 0
allocations.
This patch changes the behavior of ARM IOMMU DMA allocator to use
__GFP_NORETRY, which bypasses OOM invocation, for orders higher than
zero and, only if that fails, fall back to normal order 0 allocation
which might invoke OOM killer.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Move shutdown and reboot related code to a separate file, out of
process.c. This helps to avoid polluting process.c with non-process
related code.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Normally, when a CPU wants to clear a cache line to zero in the external
L2 cache, it would generate bus cycles to write each word as it would do
with any other data access.
However, a Cortex A9 connected to a L2C-310 has a specific feature where
the CPU can detect this operation, and signal that it wants to zero an
entire cache line. This feature, known as Full Line of Zeros (FLZ),
involves a non-standard AXI signalling mechanism which only the L2C-310
can properly interpret.
There are separate enable bits in both the L2C-310 and the Cortex A9 -
the L2C-310 needs to be enabled and have the FLZ enable bit set in the
auxiliary control register before the Cortex A9 has this feature
enabled.
Unfortunately, the suspend code was not respecting this - it's not
obvious from the code:
swsusp_arch_suspend()
cpu_suspend() /* saves the Cortex A9 auxiliary control register */
arch_save_image()
soft_restart() /* turns off FLZ in Cortex A9, and disables L2C */
cpu_resume() /* restores the Cortex A9 registers, inc auxcr */
At this point, we end up with the L2C disabled, but the Cortex A9 with
FLZ enabled - which means any memset() or zeroing of a full cache line
will fail to take effect.
A similar issue exists in the resume path, but it's slightly more
complex:
swsusp_arch_suspend()
cpu_suspend() /* saves the Cortex A9 auxiliary control register */
arch_save_image() /* image with A9 auxcr saved */
...
swsusp_arch_resume()
call_with_stack()
arch_restore_image() /* restores image with A9 auxcr saved above */
soft_restart() /* turns off FLZ in Cortex A9, and disables L2C */
cpu_resume() /* restores the Cortex A9 registers, inc auxcr */
Again, here we end up with the L2C disabled, but Cortex A9 FLZ enabled.
There's no need to turn off the L2C in either of these two paths; there
are benefits from not doing so - for example, the page copies will be
faster with the L2C enabled.
Hence, fix this by providing a variant of soft_restart() which can be
used without turning the L2 cache controller off, and use it in both
of these paths to keep the L2C enabled across the respective resume
transitions.
Fixes: 8ef418c717 ("ARM: l2c: trial at enabling some Cortex-A9 optimisations")
Reported-by: Sean Cross <xobs@kosagi.com>
Tested-by: Sean Cross <xobs@kosagi.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* A new efi=debug command line option that enables debug output in the
EFI boot stub and results in less verbose EFI memory map output by
default - Borislav Petkov
* Disable interrupts around EFI calls and use a more standard page
table saving and restoring idiom when making EFI calls - Ingo Molnar
* Reduce the number of memory allocations performed when allocating the
FDT in EFI boot stub by retrieving size from the FDT header in the
EFI config table - Ard Biesheuvel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVG9qYAAoJEC84WcCNIz1V8+sP/iKFXQIIXRdlLVSrHHqUPn4K
f32qYdfFfJvG5RMF3Y9B4+lUYi5Svr9SHgg9ZxkVW+GcuI5GUdjU9LjaVtDL9kZ0
YepHp7hdrV+mqX/zDC+NaKqOjbF4jR+5JK8cYnzMDt22jCLBV96aREbH75rN43v1
55VJUplDd6JM4h4XuF/LxyKXJf+LOIFLS4p8c0XPVd3ict7ACAi+JgxPl25fRbe4
bGx9D+LvTvQ0am5C1s8dDcpEd53jbIdKiMM+vhVGmjcvtfA2L01i1aA9pw1zVhyn
FKZXSKOwWjxDzWa/oTLAUzawcLPS3i/0FsDH5TVBLM57OI7bSP1kqzdgFOfR/X5L
KQmuY1TeiYZCeS/JtNHqV1/vap8jucGJEYXcQe/neaD9VvJYGYFEBXFvi9c/68Lk
yLJu4NAYmAp5GnkM+AxXO0aKOVvfNJ6YeGvH+Js7jBPlSdCwa93DzUJgGQxIQD3n
mGfjNgu8dyK3fHIrXFEH7mzokfNHE3cE/FI+1hGS7TGLGxvfXsatZEX813Wjc9+Q
9cL2jAnWf1kZLkbDSSJ/6XJ2a121MgmaXqLrzmLznpIkUgEuhWmbL7/gyZy4Q4/+
ZKA/PNykRWolnz/DZNbF5XnUnRdGTB/kJ4pVVuZc9U/3QjmIb0BiqPa7ZiOSVLmS
P24V+d3LjK2BU8JX3CnE
=eapi
-----END PGP SIGNATURE-----
Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi
Pull EFI updates from Matt Fleming:
- Fixes and cleanups for SMBIOS 3.0 DMI code. (Ivan Khoronzhuk)
- A new efi=debug command line option that enables debug output in the
EFI boot stub and results in less verbose EFI memory map output by
default. (Borislav Petkov)
- Disable interrupts around EFI calls and use a more standard page
table saving and restoring idiom when making EFI calls. (Ingo Molnar)
- Reduce the number of memory allocations performed when allocating the
FDT in EFI boot stub by retrieving size from the FDT header in the
EFI config table. (Ard Biesheuvel)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use the new tick_suspend/resume_local() and get rid of the
homebrewn implementation of these in the ARM bL switcher. The
check for the cpumask is completely pointless. There is no harm
to suspend a per cpu tick device unconditionally. If that's a
real issue then we fix it proper at the core level and not with
some completely undocumented hacks in some random core code.
Move the tick internals to the core code, now that this nuisance
is gone.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ rjw: Rebase, changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Link: http://lkml.kernel.org/r/1655112.Ws17YsMfN7@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Xen calls on every cpu into tick_resume() which is just wrong.
tick_resume() is for the syscore global suspend/resume
invocation. What XEN really wants is a per cpu local resume
function.
Provide a tick_resume_local() function and use it in XEN.
Also provide a complementary tick_suspend_local() and modify
tick_unfreeze() and tick_freeze(), respectively, to use the
new local tick resume/suspend functions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Combined two patches, rebased, modified subject/changelog. ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1698741.eezk9tnXtG@vostro.rjw.lan
[ Merged to latest timers/core. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call.
We are way better off to have explicit calls instead of this
monstrosity. Split out the suspend/resume() calls and invoke
them directly from the call sites.
No locking required at this point because these calls happen
with interrupts disabled and a single cpu online.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Rebased on top of 4.0-rc5. ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/713674030.jVm1qaHuPf@vostro.rjw.lan
[ Rebased on top of latest timers/core. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The ASRock Q1900DC-ITX mainboard (Baytrail-D) hangs randomly in
both BIOS and UEFI mode while rebooting unless reboot=pci is
used. Add a quirk to reboot via the pci method.
The problem is very intermittent and hard to debug, it might succeed
rebooting just fine 40 times in a row - but fails half a dozen times
the next day. It seems to be slightly less common in BIOS CSM mode
than native UEFI (with the CSM disabled), but it does happen in either
mode. Since I've started testing this patch in late january, rebooting
has been 100% reliable.
Most of the time it already hangs during POST, but occasionally it
might even make it through the bootloader and the kernel might even
start booting, but then hangs before the mode switch. The same symptoms
occur with grub-efi, gummiboot and grub-pc, just as well as (at least)
kernel 3.16-3.19 and 4.0-rc6 (I haven't tried older kernels than 3.16).
Upgrading to the most current mainboard firmware of the ASRock
Q1900DC-ITX, version 1.20, does not improve the situation.
( Searching the web seems to suggest that other Bay Trail-D mainboards
might be affected as well. )
--
Signed-off-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: <stable@vger.kernel.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/20150330224427.0fb58e42@mir
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently x86-64 efi_call_phys_prolog() saves into a global variable (save_pgd),
and efi_call_phys_epilog() restores the kernel pagetables from that global
variable.
Change this to a cleaner save/restore pattern where the saving function returns
the saved object and the restore function restores that.
Apply the same concept to the 32-bit code as well.
Plus this approach, as an added bonus, allows us to express the
!efi_enabled(EFI_OLD_MEMMAP) situation in a clean fashion as well,
via a 'NULL' return value.
Cc: Tapasweni Pathak <tapaswenipathak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Tapasweni Pathak reported that we do a kmalloc() in efi_call_phys_prolog()
on x86-64 while having interrupts disabled, which is a big no-no, as
kmalloc() can sleep.
Solve this by removing the irq disabling from the prolog/epilog calls
around EFI calls: it's unnecessary, as in this stage we are single
threaded in the boot thread, and we don't ever execute this from
interrupt contexts.
Reported-by: Tapasweni Pathak <tapaswenipathak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
... and hide the memory regions dump behind it. Make it default-off.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20141209095843.GA3990@pd.tnic
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Logically, we just want to jump around the following instruction
and its prologue/epilogue:
call *sys_call_table(,%rax,8)
if the syscall number is too big - we do not specifically target
the "int_ret_from_sys_call" label.
Use a local, numerical label for this jump, for more clarity.
This also makes the code smaller:
-ffffffff8187756b: 0f 87 0f 00 00 00 ja ffffffff81877580 <int_ret_from_sys_call>
+ffffffff8187756b: 77 0f ja ffffffff8187757c <int_ret_from_sys_call>
because jumps to global labels are never translated to short jump
instructions by GAS.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-9-git-send-email-dvlasenk@redhat.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is no reason to use MOVQ to load a non-negative immediate
constant value into a 64-bit register. MOVL does the same, since
the upper 32 bits are zero-extended by the CPU.
This makes the code a bit smaller, while leaving functionality
unchanged.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-8-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
At the 'exit_intr' label we test whether interrupt/exception was in
kernel. If it did, we jump to the preemption check. If preemption
does happen (IOW if we call preempt_schedule_irq()), we go back to
'exit_intr'.
But it's pointless, we already know that the test succeeded last
time, preemption doesn't change the fact that interrupt/exception
was in the kernel.
We can go back directly to checking PER_CPU_VAR(__preempt_count) instead.
This makes the 'exit_intr' label unused, drop it.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-5-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Get rid of #define obfuscation of retint_kernel in
CONFIG_PREEMPT case by defining retint_kernel label always, not
only for CONFIG_PREEMPT.
Strip retint_kernel of .global-ness (ENTRY macro) - it has no
users outside of this file.
This looks like cosmetics, but it is not:
"je LABEL" can be optimized to short jump by assember
only if LABEL is not global, for global labels jump is always
a near one with relocation.
Convert retint_restore_args to a local numeric label, making it
clearer that it is not used elsewhere in the file.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This mimics the recent similar 64-bit change.
Saves ~110 bytes of code.
Patch was run-tested on 32 and 64 bits, Intel and AMD CPU.
I also looked at the diff of entry_64.o disassembly, to have
a different view of the changes.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
SYSRET code path has a small irq-off block.
On this code path, TRACE_IRQS_ON can't be called right before
interrupts are enabled for real, we can't clobber registers
there. So current code does it earlier, in a safe place.
But with this, TRACE_IRQS_OFF/ON frames just two fast
instructions, which is ridiculous: now most of irq-off block is
_outside_ of the framing.
Do the same thing that we do on SYSCALL entry: do not track this
irq-off block, it is very small to ever cause noticeable irq
latency.
Be careful: make sure that "jnz int_ret_from_sys_call_irqs_off"
now does invoke TRACE_IRQS_OFF - move
int_ret_from_sys_call_irqs_off label before TRACE_IRQS_OFF.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
__verify_local_APIC() is detritus from the early APIC days.
Its return value isn't used anywhere and the information it
prints when debug is enabled is already part of APIC
initialization messages printed to syslog. Off with it!
Signed-off-by: Bandan Das <bsd@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/jpgy4mcsxsq.fsf@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Space allocated for paca is based off nr_cpu_ids,
but pnv_alloc_idle_core_states() iterates paca with
cpu_nr_cores()*threads_per_core, which is using NR_CPUS.
This causes pnv_alloc_idle_core_states() to write over memory,
which is outside of paca array and may later lead to various panics.
Fixes: 7cba160ad7 (powernv/cpuidle: Redesign idle states management)
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch adds support to migrate vcpu interrupts. Two new vcpu ioctls
are added which get/set the complete status of pending interrupts in one
go. The ioctls are marked as available with the new capability
KVM_CAP_S390_IRQ_STATE.
We can not use a ONEREG, as the number of pending local interrupts is not
constant and depends on the number of CPUs.
To retrieve the interrupt state we add an ioctl KVM_S390_GET_IRQ_STATE.
Its input parameter is a pointer to a struct kvm_s390_irq_state which
has a buffer and length. For all currently pending interrupts, we copy
a struct kvm_s390_irq into the buffer and pass it to userspace.
To store interrupt state into a buffer provided by userspace, we add an
ioctl KVM_S390_SET_IRQ_STATE. It passes a struct kvm_s390_irq_state into
the kernel and injects all interrupts contained in the buffer.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Let's provide a version of kvm_s390_inject_vcpu() that
does not acquire the local-interrupt lock and skips
waking up the vcpu.
To be used in a later patch for vcpu-local interrupt migration,
where we are already holding the lock.
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
We have introduced struct kvm_s390_irq a while ago which allows to
inject all kinds of interrupts as defined in the Principles of
Operation.
Add ioctl to inject interrupts with the extended struct kvm_s390_irq
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
We now have a mechanism for delivering interrupts according to their priority.
Let's inject them using our new infrastructure (instead of letting only hardware
handle them), so we can be sure that the irq priorities are satisfied.
For s390, the cpu timer and the clock comparator are to be checked for common
code kvm_cpu_has_pending_timer(), although the cpu timer is only stepped when
the guest is being executed.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
This patch makes interrupt handling compliant to the z/Architecture
Principles of Operation with regard to interrupt priorities.
Add a bitmap for pending floating interrupts. Each bit relates to a
interrupt type and its list. A turned on bit indicates that a list
contains items (interrupts) which need to be delivered. When delivering
interrupts on a cpu we can merge the existing bitmap for cpu-local
interrupts and floating interrupts and have a single mechanism for
delivery.
Currently we have one list for all kinds of floating interrupts and a
corresponding spin lock. This patch adds a separate list per
interrupt type. An exception to this are service signal and machine check
interrupts, as there can be only one pending interrupt at a time.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
This fixes a bug introduced with commit c05c4186bb ("KVM: s390:
add floating irq controller").
get_all_floating_irqs() does copy_to_user() while holding
a spin lock. Let's fix this by filling a temporary buffer
first and copy it to userspace after giving up the lock.
Cc: <stable@vger.kernel.org> # 3.18+: 69a8d45626 KVM: s390: no need to hold...
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Use the normal return values for bool functions
Signed-off-by: Joe Perches <joe@perches.com>
Message-Id: <9f593eb2f43b456851cd73f7ed09654ca58fb570.1427759009.git.joe@perches.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This driver is 64 bit only, and so this driver and device are ready
for 2038. This patch changes the driver to the new PHC and also
carries the timespec64 parameter on out to the gxio_mpipe_get-
set_timestamp functions, making explicit the fact that the tv_sec
field is 64 bits wide.
Not even compile tested.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Chris Metcalf <cmetcalf@ezchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flag all Multi buffer SHA1 helper ciphers as internal ciphers
to prevent them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all 64 bit ARMv8 AES helper ciphers as internal ciphers to
prevent them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all ARMv8 AES helper ciphers as internal ciphers to prevent
them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all NEON bit sliced AES helper ciphers as internal ciphers to
prevent them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all Twofish AVX helper ciphers as internal ciphers to prevent
them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all Serpent SSE2 helper ciphers as internal ciphers to prevent
them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all Serpent AVX helper ciphers as internal ciphers to prevent
them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all Serpent AVX2 helper ciphers as internal ciphers to prevent
them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all CAST6 helper ciphers as internal ciphers to prevent them
from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all AVX Camellia helper ciphers as internal ciphers to prevent
them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all CAST5 helper ciphers as internal ciphers to prevent them
from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all AES-NI Camellia helper ciphers as internal ciphers to
prevent them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all GHASH ARMv8 vmull.p64 helper ciphers as internal ciphers
to prevent them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all ash clmulni helper ciphers as internal ciphers to prevent them
from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all AES-NI helper ciphers as internal ciphers to prevent them from
being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
After some review about what these facilities do, the following
facilities will work under KVM and can, therefore, be reported
to the guest if the cpu model and the host cpu provide this bit.
There are plans underway to make the whole bit thing more readable,
but its not yet finished. So here are some last bit changes and
we enhance the KVM mask with:
9 The sense-running-status facility is installed in the
z/Architecture architectural mode.
---> handled by SIE or KVM
10 The conditional-SSKE facility is installed in the
z/Architecture architectural mode.
---> handled by SIE. KVM will retry SIE
13 The IPTE-range facility is installed in the
z/Architecture architectural mode.
---> handled by SIE. KVM will retry SIE
36 The enhanced-monitor facility is installed in the
z/Architecture architectural mode.
---> handled by SIE
47 The CMPSC-enhancement facility is installed in the
z/Architecture architectural mode.
---> handled by SIE
48 The decimal-floating-point zoned-conversion facility
is installed in the z/Architecture architectural mode.
---> handled by SIE
49 The execution-hint, load-and-trap, miscellaneous-
instruction-extensions and processor-assist
---> handled by SIE
51 The local-TLB-clearing facility is installed in the
z/Architecture architectural mode.
---> handled by SIE
52 The interlocked-access facility 2 is installed.
---> handled by SIE
53 The load/store-on-condition facility 2 and load-and-
zero-rightmost-byte facility are installed in the
z/Architecture architectural mode.
---> handled by SIE
57 The message-security-assist-extension-5 facility is
installed in the z/Architecture architectural mode.
---> handled by SIE
66 The reset-reference-bits-multiple facility is installed
in the z/Architecture architectural mode.
---> handled by SIE. KVM will retry SIE
80 The decimal-floating-point packed-conversion
facility is installed in the z/Architecture architectural
mode.
---> handled by SIE
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
If the PER-3 facility is installed, the breaking-event address is to be
stored in the low core.
There is no facility bit for PER-3 in stfl(e) and Linux always uses the
value at address 272 no matter if PER-3 is available or not.
We can't hide its existence from the guest. All program interrupts
injected via the SIE automatically store this information if the PER-3
facility is available in the hypervisor. Also the itdb contains the
address automatically.
As there is no switch to turn this mechanism off, let's simply make it
consistent and also store the breaking event address in case of manual
program interrupt injection.
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
user_mode_ignore_vm86() can be used instead of user_mode(), in
places where we have already done a v8086_mode() security
check of ptregs.
But doing this check in the wrong place would be a bug that
could result in security problems, and also the naming still
isn't very clear.
Furthermore, it only affects 32-bit kernels, while most
development happens on 64-bit kernels.
If we replace them with user_mode() checks then the cost is only
a very minor increase in various slowpaths:
text data bss dec hex filename
10573391 703562 1753042 13029995 c6d26b vmlinux.o.before
10573423 703562 1753042 13030027 c6d28b vmlinux.o.after
So lets get rid of this distinction once and for all.
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150329090233.GA1963@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The existing clean-files rule was missing vdsox32.so and
vdsox32.so.dbg. We should really rename the intermediates to
allow a single rule to get them all.
Also-reported-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/7fa2ad4a63bc6f52e214125900d54165ef06cc10.1427482099.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After 'make clean' the following files were left in arch/x86/vdso/:
vdso-image-32-int80.c
vdso-image-32-syscall.c
vdso-image-32-sysenter.c
These file are generated during the build process and are present
in .gitignore, so remove them.
Signed-off-by: Andrey Skvortsov <andrej.skvortzov@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/f85bb7ef6f8c6f6aa4bf422348018c84321454f8.1427482099.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This vDSO code only gets used by 64-bit kernels, not 32-bit ones.
On 64-bit kernels, the data segment is the same for 32-bit and
64-bit userspace, and the SYSRET instruction loads %ss with its
selector.
So there's no need to repeat it by hand. Segment loads are somewhat
expensive: tens of cycles.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
[ Removed unnecessary comment. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/63da6d778f69fd0f1345d9287f6764d58be519fa.1427482099.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The ASLR implementation needs to special-case AMD F15h processors by
clearing out bits [14:12] of the virtual address in order to avoid I$
cross invalidations and thus performance penalty for certain workloads.
For details, see:
dfb09f9b7a ("x86, amd: Avoid cache aliasing penalties on AMD family 15h")
This special case reduces the mmapped file's entropy by 3 bits.
The following output is the run on an AMD Opteron 62xx class CPU
processor under x86_64 Linux 4.0.0:
$ for i in `seq 1 10`; do cat /proc/self/maps | grep "r-xp.*libc" ; done
b7588000-b7736000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6
b7570000-b771e000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6
b75d0000-b777e000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6
b75b0000-b775e000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6
b7578000-b7726000 r-xp 00000000 00:01 4924 /lib/i386-linux-gnu/libc.so.6
...
Bits [12:14] are always 0, i.e. the address always ends in 0x8000 or
0x0000.
32-bit systems, as in the example above, are especially sensitive
to this issue because 32-bit randomness for VA space is 8 bits (see
mmap_rnd()). With the Bulldozer special case, this diminishes to only 32
different slots of mmap virtual addresses.
This patch randomizes per boot the three affected bits rather than
setting them to zero. Since all the shared pages have the same value
at bits [12..14], there is no cache aliasing problems. This value gets
generated during system boot and it is thus not known to a potential
remote attacker. Therefore, the impact from the Bulldozer workaround
gets diminished and ASLR randomness increased.
More details at:
http://hmarco.org/bugs/AMD-Bulldozer-linux-ASLR-weakness-reducing-mmaped-files-by-eight.html
Original white paper by AMD dealing with the issue:
http://developer.amd.com/wordpress/media/2012/10/SharedL1InstructionCacheonAMD15hCPU.pdf
Mentored-by: Ismael Ripoll <iripoll@disca.upv.es>
Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan-Simon <dl9pf@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-fsdevel@vger.kernel.org
Link: http://lkml.kernel.org/r/1427456301-3764-1-git-send-email-hecmargi@upv.es
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This file doesn't use any macros from pci_ids.h anymore, drop the include.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427635734-24786-80-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
At exit_intr, we GET_THREAD_INFO(%rcx) and then jump to
retint_kernel if saved CS was from kernel. But the code at
retint_kernel doesn't need %rcx.
Move GET_THREAD_INFO(%rcx) down, after CS check and branch.
While at it, remove "has a correct top of stack" comment.
After recent changes which eliminated FIXUP_TOP_OF_STACK,
we always have a correct pt_regs layout.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427738975-7391-5-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The "retint_kernel" code block is misplaced. Since its logical
continuation is "retint_restore_args", it is more natural to
place it above that label. This also makes two jumps "short".
This change only moves code block around, without changing
logic.
This enables the next simplification: making
"retint_restore_args" label a local numeric one.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427738975-7391-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As the infrastructure for eventfd has now been merged, report the
ioeventfd capability as being supported.
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
[maz: grouped the case entry with the others, fixed commit log]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Currently we have struct kvm_exit_mmio for encapsulating MMIO abort
data to be passed on from syndrome decoding all the way down to the
VGIC register handlers. Now as we switch the MMIO handling to be
routed through the KVM MMIO bus, it does not make sense anymore to
use that structure already from the beginning. So we keep the data in
local variables until we put them into the kvm_io_bus framework.
Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private
structure. On that way we replace the data buffer in that structure
with a pointer pointing to a single location in a local variable, so
we get rid of some copying on the way.
With all of the virtual GIC emulation code now being registered with
the kvm_io_bus, we can remove all of the old MMIO handling code and
its dispatching functionality.
I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something),
because that touches a lot of code lines without any good reason.
This is based on an original patch by Nikolay.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Cc: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
A trivial code cleanup. This `if` is redundant.
Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Message-Id: <20150328222717.GA6508@gnote>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some constants are redfined in emulate.c. Avoid it.
s/SELECTOR_RPL_MASK/SEGMENT_RPL_MASK
s/SELECTOR_TI_MASK/SEGMENT_TI_MASK
No functional change.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1427635984-8113-3-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The eflags are redefined (using other defines) in emulate.c.
Use the definition from processor-flags.h as some mess already started.
No functional change.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1427635984-8113-2-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the source of BSF and BSR is zero, the destination register should not
change. That is how real hardware behaves. If we set the destination even with
the same value that we had before, we may clear bits [63:32] unnecassarily.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1427719163-5429-4-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
POPA should assign the values to the registers as usual registers are assigned.
In other words, 32-bits register assignments should clear bits [63:32] of the
register.
Split the code of register assignments that will be used by future changes as
well.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1427719163-5429-3-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On legacy mode CMOV emulation should still clear bits [63:32] even if the
assignment is not done. The previous fix 140bad89fd ("KVM: x86: emulation of
dword cmov on long-mode should clear [63:32]") was incomplete.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1427719163-5429-2-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If data is read from PIC with invalid access size, the return data stays
uninitialized even though success is returned.
Fix this by always initializing the data.
Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Message-Id: <20150311111609.GG8544@dhcp-25-225.brq.redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Given that sizeof(long) is now always 8, we can simplify syscall_get_arch()
a bit.
Just another piece I didn't find when removing 31 bit support.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This code calls cpu_resume() using a straight branch (b), so
now that we have moved cpu_resume() back to .text, this should
be moved there as well.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This code calls cpu_resume() using a straight branch (b), so
now that we have moved cpu_resume() back to .text, this should
be moved there as well. Any direct references to symbols that will
remain in the .data section are replaced with explicit PC-relative
references.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Move cpu_resume() to the .text section where it belongs. Change
the adr reference to sleep_save_sp to an explicit PC relative
reference so sleep_save_sp itself can remain in .data.
This helps prevent linker failure on large kernels, as the code
in the .data section may be too far away to be in range for normal
b/bl instructions.
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When building a very large kernel, it is up to the linker to decide
when and where to insert stubs to allow calls to functions that are
out of range for the ordinary b/bl instructions.
However, since the kernel is built as a position dependent binary,
these stubs (aka veneers) may contain absolute addresses, which will
break far calls performed with the MMU off.
For instance, the call from __enable_mmu() in the .head.text section
to __turn_mmu_on() in the .idmap.text section may be turned into
something like this:
c0008168 <__enable_mmu>:
c0008168: f020 0002 bic.w r0, r0, #2
c000816c: f420 5080 bic.w r0, r0, #4096
c0008170: f000 b846 b.w c0008200 <____turn_mmu_on_veneer>
[...]
c0008200 <____turn_mmu_on_veneer>:
c0008200: 4778 bx pc
c0008202: 46c0 nop
c0008204: e59fc000 ldr ip, [pc]
c0008208: e12fff1c bx ip
c000820c: c13dfae1 teqgt sp, r1, ror #21
[...]
c13dfae0 <__turn_mmu_on>:
c13dfae0: 4600 mov r0, r0
[...]
After adding --pic-veneer to the LDFLAGS, the veneer is emitted like
this instead:
c0008200 <____turn_mmu_on_veneer>:
c0008200: 4778 bx pc
c0008202: 46c0 nop
c0008204: e59fc004 ldr ip, [pc, #4]
c0008208: e08fc00c add ip, pc, ip
c000820c: e12fff1c bx ip
c0008210: 013d7d31 teqeq sp, r1, lsr sp
c0008214: 00000000 andeq r0, r0, r0
Note that this particular example is best addressed by moving
.head.text and .idmap.text closer together, but this issue could
potentially affect any code that needs to execute with the
MMU off.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This moves all fixup snippets to the .text.fixup section, which is
a special section that gets emitted along with the .text section
for each input object file, i.e., the snippets are kept much closer
to the code they refer to, which helps prevent linker failure on
large kernels.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
arm64 builds with GCC 5 have caused the __asmeq assertions in the PSCI
calling code to fire, so move the ARM PSCI calls out of line into their
own assembly file for consistency and to safeguard against the same
issue occuring with the 32-bit toolchain.
[will: brought into line with arm64 implementation]
Reported-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The latest and greatest fixes for ARM platform code. Worth pointing out are:
- Lines-wise, largest is a PXA fix for dealing with interrupts on DT that was
quite broken. It's still newish code so while we could have held this off,
it seemed appropriate to include now
- Some GPIO fixes for OMAP platforms added a few lines. This was also fixes for
code recently added (this release).
- Small OMAP timer fix to behave better with partially upstreamed platforms,
which is quite welcome.
- Allwinner fixes about operating point control, reducing overclocking in some
cases for better stability.
+ a handful of other smaller fixes across the map.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVGGpHAAoJEIwa5zzehBx3vi8QAKRlHuJzKlEAGOglrmaYLNwr
/eSx6wQQlMa+DFr6YVgypYvImCpvbOVPRGMFpaAyriKDrWcu1HiEF4nvq+3Iq1Qt
yqPRbTh2xU1qZ9kmidcY5rhSX7IO7Bgeo0fV4q1Dyt5rgUKfIRhE6BnYO5C9TbY5
CoFri3sFSyeAwIUYcbvt7OElVTB2ro6xmqfc6ZspPqPLYnbc9wSSWSKLV8EFM+OS
6DD3UFiTPN1o0nb/Vo/fuOsryCE3oR2wfomWAr7XMeTbB2DOugocre0ttyxhhbBt
Dkc0u3/a75AtisIPkoHPGQZX3671SnUKmge9VYGeMWBHpGCvXJMMvPonbkr2yema
gxH5/KPBwmv4c/asL/nfKiQnHowxLGdCDVCQ+NKfhyvEKZ7WzwVzXekrWn3L8u1W
xUy3O0XU3YQJVXX/S2ec5RwsGe9K/5OanKTkjeJ8q5zLtJA0ax0fJGyQ+/iEcjFy
CnlW857EL1ChXQ2WItKvaLyxe/mea4HGonHe5nZSRVZtTVTYTSFASrJchmRR/eu2
NXnZPBtvAEn8MqTuNyVVjqwifojK8jdVox+mlNoQUoehLhyAYc4IHbC6+/2c5HU7
dW5tFMGPOlQNmYbXqe+oKSUaaSWcxUaPTzmD6IRUrozrhabyv/lb79GXDLb2c6QL
Vz7WtY1qTD7TOwPv+Xyd
=OclH
-----END PGP SIGNATURE-----
Merge tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"The latest and greatest fixes for ARM platform code. Worth pointing
out are:
- Lines-wise, largest is a PXA fix for dealing with interrupts on DT
that was quite broken. It's still newish code so while we could
have held this off, it seemed appropriate to include now
- Some GPIO fixes for OMAP platforms added a few lines. This was
also fixes for code recently added (this release).
- Small OMAP timer fix to behave better with partially upstreamed
platforms, which is quite welcome.
- Allwinner fixes about operating point control, reducing
overclocking in some cases for better stability.
plus a handful of other smaller fixes across the map"
* tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm64: juno: Fix misleading name of UART reference clock
ARM: dts: sunxi: Remove overclocked/overvoltaged OPP
ARM: dts: sun4i: a10-lime: Override and remove 1008MHz OPP setting
ARM: socfpga: dts: fix spi1 interrupt
ARM: dts: Fix gpio interrupts for dm816x
ARM: dts: dra7: remove ti,hwmod property from pcie phy
ARM: OMAP: dmtimer: disable pm runtime on remove
ARM: OMAP: dmtimer: check for pm_runtime_get_sync() failure
ARM: OMAP2+: Fix socbus family info for AM33xx devices
ARM: dts: omap3: Add missing dmas for crypto
ARM: dts: rockchip: disable gmac by default in rk3288.dtsi
MAINTAINERS: add rockchip regexp to the ARM/Rockchip entry
ARM: pxa: fix pxa interrupts handling in DT
ARM: pxa: Fix typo in zeus.c
ARM: sunxi: Have ARCH_SUNXI select RESET_CONTROLLER for clock driver usage
There's a few fixes to merge for 4.0, one to add a select in the machine
Kconfig option to fix a potential build failure, and two fixing cpufreq related
issues.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVEyVYAAoJEBx+YmzsjxAg54AQAIIZvr69IYXoGL4njoVRhKPq
Tm8RPb4B74mKHuAfRSblP7BV2cr3hCNBrLmyDnPUAw5KmYdNwfZVAEHxitBqin33
JQphd5NGD3gtI0YYJQ4DrPlydSDMCV+FvH/4ASgEXyCrimAvcvsASgGoTSZ4PsN+
/I7FmIVdGsYJ3V6uJv9mPlgAW6PiGkGFkJ/9ZpTbN2sRCLiE/SAIoSR4C9QvMbhI
khlmMlP8fF1TpOK9m7sXlTHvaf1kH5GK3XN6GxgvVPR66R2kEYKTBLSpxbB3zQmH
PPAx8UAQbIR6T+dBar5a7rDaSPVOQ2d+f+ytXFKeY5Sonb0iShmjD3EMRFj8PYUv
/JUdr/8o9HDBWBHNw7JeVx2texk/Gk5AGSGMuPjGsyJYkEoNK0vZS2mwB5doQggo
nAADHBPhJptPyUqA4ubnjdgc82To4eh7z5dYgNYCH8XlvFuIRHsHPlp9QIdaTzTJ
eHBMQx6pgr66QNjF940s0z2rH/UtGmWT60WpWToQA7EJau0E2VosQAnjLctiQogO
ZUhgMa5V3zCiK3831bwDG3we3ictudkZgvIFWeRkbI/StRbt/n8A2NRK7QWhqqh7
SXgNP5kisAq29X9grn5ekku6O4gJ1xobbjwDsdYw3z36uBjFkYUe85j3AJjhW+sG
v2chTh9PHTZ6hu/uyK1L
=e6O1
-----END PGP SIGNATURE-----
Merge tag 'sunxi-fixes-for-4.0' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into fixes
Allwinner fixes for 4.0
There's a few fixes to merge for 4.0, one to add a select in the machine
Kconfig option to fix a potential build failure, and two fixing cpufreq related
issues.
* tag 'sunxi-fixes-for-4.0' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux:
ARM: dts: sunxi: Remove overclocked/overvoltaged OPP
ARM: dts: sun4i: a10-lime: Override and remove 1008MHz OPP setting
ARM: sunxi: Have ARCH_SUNXI select RESET_CONTROLLER for clock driver usage
Signed-off-by: Olof Johansson <olof@lixom.net>
- Fix a device tree based booting vs legacy booting regression for
omap3 crypto hardware by adding the missing DMA channels.
- Fix /sys/bus/soc/devices/soc0/family for am33xx devices.
- Fix two timer issues that can cause hangs if the timer related
hwmod data is missing like it often initially is for new SoCs.
- Remove pcie hwmods entry from dts as that causes runtime PM to
fail for the PHYs.
- A paper bag type dts configuration fix for dm816x GPIO
interrupts that I just noticed. This is most of the changes
diffstat wise, but as it's a basic feature for connecting
devices and things work otherwise, it should be fixed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVCzVOAAoJEBvUPslcq6VzJKMP/ioZMQuxKWjMZDQhc4PsSZzF
3qtijSrN4aPnYphih23/7XuE0dws0pWdpH1toYPPlzB78gN18MuFOpHRHSKxy23e
dTRvgrvfDfVFyHYpvOTstaqNai90HbwpQnWx2IdEI9xhP0N3RNQQDTdE8mrISoVE
2GyT8hF++MxbBk3/dhx1tIoqSkWvSi2TH8rAQ6I/UR2O07HRz8tdcciHq0goYvA9
n16Ou7a3WpRiaGQZvN4wAvwRCklKL3vM/7ZpbRlJAW4M1Uzc1OGKSGHrqr9kbG77
U/QgJ7lQoHOn6zDRYZqCY5lHR5U8wW8WVCH5WNV6GZaVHzbOSDoV/Ygt/CB9eBzh
i8RAcKyvVOWqYACjQ3ogfaC5AD7Of7DONFDQVBr/4xaUOLVquYx8V4r89dfDTeyY
i95ay06NaS6esX3Y0lsgBSlvPObRxoVH4nFUUjUH+gLqhqHpqojWF8CMUl/zs8Cm
xHIb/4VYR/dnlhqZvhn9NGo9MlAo4D6ftNE5BooON8cKPuEnl25pb5KWmNBu1piM
knNIV7QA/aawQ3jnH9tXyCWcprcxt/2INwwp41DE/mxpn6xiw+5UwzINu6q5NAzs
DxEWYv08mM6Qrt0YsIe7cYDPO03my547LWhI3tUWf/KQvqc98hP5ZODy/QDQJmbA
IpP36kjKB5dJldTv2sWb
=eCH/
-----END PGP SIGNATURE-----
Merge tag 'fixes-v4.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
Fixes for omaps for the -rc cycle:
- Fix a device tree based booting vs legacy booting regression for
omap3 crypto hardware by adding the missing DMA channels.
- Fix /sys/bus/soc/devices/soc0/family for am33xx devices.
- Fix two timer issues that can cause hangs if the timer related
hwmod data is missing like it often initially is for new SoCs.
- Remove pcie hwmods entry from dts as that causes runtime PM to
fail for the PHYs.
- A paper bag type dts configuration fix for dm816x GPIO
interrupts that I just noticed. This is most of the changes
diffstat wise, but as it's a basic feature for connecting
devices and things work otherwise, it should be fixed.
* tag 'fixes-v4.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: Fix gpio interrupts for dm816x
ARM: dts: dra7: remove ti,hwmod property from pcie phy
ARM: OMAP: dmtimer: disable pm runtime on remove
ARM: OMAP: dmtimer: check for pm_runtime_get_sync() failure
ARM: OMAP2+: Fix socbus family info for AM33xx devices
ARM: dts: omap3: Add missing dmas for crypto
Signed-off-by: Olof Johansson <olof@lixom.net>
- Fix interrupt number for SPI1 interface
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJVCxZzAAoJEBmUBAuBoyj0V3YQAJN24cPBrxeyAG1atuHN2T9r
8QtJ2eVOU32IwzNoOl1PHITGaf1b3Ks03NC+GSk3WMAzMJ/fJMY2p6ABpCCrriBZ
xRZfkqqlJlB6/BNb62BkUK45yy0chgxoPoaDiX9CR0KqBeVciZM0Ag4Tw82oQDDF
mEBKJxSO7GIzRj5uHzQHaTh6Fi6QMRYLYDMyoRNom1tpdOLm7ufyWI++q6FLk2/W
YAxI8tj+AN48cgTB9oO+K83RjxVuR7YM4xQPUJ3K21AAAc1AHhXgOyzMd2FZhIhJ
qC5xODFXVcG5+m6/7vYxkBnLo9sexHF8EdD7ByRE2KMoRrsWU2s304ErUBfzp635
9LbSHZB2HD3rM7KEz/QiRGYkijACgVVBbIImHAaAbd4H13hz4FevKARPNvFz9u62
Bu3jYj94wuGJYHuiIa9p4QZcMaQVSThBsnWUfq18l1UljIoeRzK4jt0P/DCowRJO
J3gtmH1E/N/NKGbw1RqNwqSydUqdbeyDdOua/2oLrEaOthJATCj3eFCTpIK5lrQD
Ttc3EsdL7r2OVD7zOXZsbH9mZXxgDnIu0E4Ocm3F3Xs22ERRMVPiDb0xvwXbtq5+
uN0HdFN2P7wb02dv2r3NWcTpQEBVqvPe4VRIv+U4hgXXJJrYeEZqWYQ8Vd2d4bsb
ZkvK7CHsv9FiU3aCnT7B
=NEia
-----END PGP SIGNATURE-----
Merge tag 'socfpga_fix_for_v4.0_2' of git://git.rocketboards.org/linux-socfpga-next into fixes
Late fix for v4.0 on the SoCFPGA platform:
- Fix interrupt number for SPI1 interface
* tag 'socfpga_fix_for_v4.0_2' of git://git.rocketboards.org/linux-socfpga-next:
ARM: socfpga: dts: fix spi1 interrupt
Signed-off-by: Olof Johansson <olof@lixom.net>
The UART reference clock speed is 7273.8 kHz, not 72738 kHz.
Dots aren't usually used in node names even though ePAPR permits
them. However, this can easily be avoided by expressing the
frequency in Hz, not kHz.
This patch changes the name to refclk7273800hz, reflecting the
actual clock speed.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
There are only 2 fixes, one for the zeus board about the regulator changes,
where a typo prevented the zeus board from having a working can regulator,
and one regression triggered by the interrupts IRQ shift of 16 affecting all
boards.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVBzpKAAoJEAP2et0duMsSVngP+wTxRklD/kyjRrB4edOGvnKj
yCJdSbw6dK4IQd4K5BbnJKRSxUTZs9TrPA11IB9IznWcoRjHwKkSr6uk5h/bg+vl
FwbH60iu7DJ4wyKjIzLCstZpONfFYB/bVOZti3lg+9u0A//YMg0/2PcfcfsUzfgi
/SAaYuJe+A0XZNZYGwtBEDyYo9hkDzJPbYMRneQnL8XwjLJ9BvksoqBZ/4aIQXlb
gTlUhlCo9wMngfS3Fbhpd+stw/zO1mt0xUXRXwWJYnUSS2x9mz0K/9zz8RWmodtx
R4NNz3ZhQzMLHxjj3NpUDcId7faiZNVY0yvSVJU7O5/TLgXluNdH+vRTNiOrpVGu
+s/+nxg0HKuK16Tz/0mlQH0uqWvtjJvtNTLS3yDgJKM3civ2Vl+i4DKYK0Ht8ynN
ZSBDmARc6Tq8M9JHgzZFAI8ypKACYuIHJzWNF919BuodJ1N2gcexwKo3m6kPoobG
WFDTzT+TDiFB0Z4VLOWtWKU9V/RrZqZnrF5rqqkpJ1NyOPdotj+9n2Q63JEdzhH7
eQKSK9P4FcLKtEmlMo/iRI9livkN81nIRRUTsxxrSTjHA8MJAQOBAU7JDs/ic3YX
j1vyC/Brd0fPbCqnePfS2BO9G14fwHNb2AxoFgM9JI9zb8sIVxscM4s8uczFCIOs
rd6bZsQ6mhKjrObS1iSe
=r3W0
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-v4.0-rc5' of https://github.com/rjarzmik/linux into fixes
arm: pxa: fixes for v4.0-rc5
There are only 2 fixes, one for the zeus board about the regulator changes,
where a typo prevented the zeus board from having a working can regulator,
and one regression triggered by the interrupts IRQ shift of 16 affecting all
boards.
* tag 'fixes-for-v4.0-rc5' of https://github.com/rjarzmik/linux:
ARM: pxa: fix pxa interrupts handling in DT
ARM: pxa: Fix typo in zeus.c
Signed-off-by: Olof Johansson <olof@lixom.net>
Pull x86 fix from Ingo Molnar:
"Fix x86 syscall exit code bug that resulted in spurious non-execution
of TIF-driven user-return worklets, causing big trouble for things
like KVM that rely on user notifiers for correctness of their vcpu
model, causing crashes like double faults"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm/entry: Check for syscall exit work with IRQs disabled
Pull parsic fixes from Helge Deller:
"One patch from Mikulas fixes a bug on parisc by artifically
incrementing the counter in pmd_free when the kernel tries to free
the preallocated pmd.
Other than that we now prevent that syscalls gets added without
incrementing __NR_Linux_syscalls and fix the initial pmd setup code
if a default page size greater than 4k has been selected"
* 'parisc-4.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix pmd code to depend on PT_NLEVELS value, not on CONFIG_64BIT
parisc: mm: don't count preallocated pmds
parisc: Add compile-time check when adding new syscalls
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVFTw+AAoJEGnX8d3iisJeojUP/12ANSkctP29kOjZFkZrav6v
M9qaSR0NqtuxqQ7o8SY4FeA61/vKvy0Sfg5aqfLq1b4YU7FJginyOxZv+d3NPNto
HCS4XVJwFiEmqYx6EgIrCnYmxNQsRTseJEI24/ImHmzvnjufx/JQ9wr0wQVcw/S3
ewxIyf0uyyVJ2UAOMjn2F3ukJYIpteItyT/59ndWluB/+JTHhhJUsIYowP2irU29
EsvEiExNv5d+n35gtxY7GdKEsX5bQc8bTT2UiladcWYIWdb9OoV4+L8MgFh7JHii
vV71D4kdH5bhJsD1D98nm9jFpP6GfPWCY2BMTIhWIYmg+2Zl+cfiaYgIIuBYNG+x
+ahrqB5Nbm+Ux4R7KnIYlXMgDiWMPetkAptUmbod293M/pXPyyQtFhiaB2X9l6Bl
48IUvWYjSp5p8mfhwBXcMItTazrv46HqarHfl3NG0sljRc9BwdPqlumhB1xxsQXg
iauvvD6s6GHRIGapYlINRww9mCaNu5bTcY9cD6AHEc++6Ov/ZYpWnnipmUINNCmL
4yLd+pUQ6QoPuvCHbpzdEvb/eb/LDSbo17nB3aPNn/pPGzICgAXnhY4ET6nzZ3LM
Dc/8FfrYmalYHmjeoWC8CPR/XmrmI7xCkNlF/kre+vCWRLs8Zewl/btxyU48+hho
81HBp3a0QCzCOcFKvzam
=zVXd
-----END PGP SIGNATURE-----
Merge tag 'arc-4.0-fixes-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
"We found some issues with signal handling taking down the system. I
know its late, but these are important and all marked for stable.
ARC signal handling related fixes uncovered during recent testing of
NPTL tools"
* tag 'arc-4.0-fixes-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: signal handling robustify
ARC: SA_SIGINFO ucontext regs off-by-one
When the patch for e16343c47e (ARM: 8160/1: drop warning about
return_address not using unwind tables) was created there was still more
code in said branch. Probably this simplification was just missed during
conflict resolution when the patch was applied.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch makes it possible to enter zImage in Thumb mode for ARMv7-M
(Cortex-M) CPUs that do not support ARM mode. The kernel entry is also
made in Thumb mode.
[ukl: fix spelling in commit log, return early in call_cache_fn]
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Tested-by: Stefan Agner <stefan@agner.ch>
Tested-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Tested-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Usually ELF_ET_DYN_BASE is 2/3 of TASK_SIZE. With 3G/1G user/kernel
split this is not so, because 2*TASK_SIZE overflows 32 bits,
so the actual value of ELF_ET_DYN_BASE is:
(2 * TASK_SIZE / 3) = 0x2a000000
When ASLR is disabled PIE binaries will load at ELF_ET_DYN_BASE address.
On 32bit platforms AddressSanitzer uses addresses [0x20000000 - 0x40000000]
for shadow memory [1]. So ASan doesn't work for PIE binaries when ASLR disabled
as it fails to map shadow memory.
Also after Kees's 'split ET_DYN ASLR from mmap ASLR' patchset PIE binaries
has a high chance of loading somewhere in between [0x2a000000 - 0x40000000]
even if ASLR enabled. This makes ASan with PIE absolutely incompatible.
Fix overflow by dividing TASK_SIZE prior to multiplying.
After this patch ELF_ET_DYN_BASE equals to (for CONFIG_VMSPLIT_3G=y):
(TASK_SIZE / 3 * 2) = 0x7f555554
[1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm#Mapping
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Reported-by: Maria Guseva <m.guseva@samsung.com>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When running the 32-bit ARM kernel on ARMv8 capable bare metal (e.g.,
32-bit Android userland and kernel on a Cortex-A53), or as a KVM guest
on a 64-bit host, we should advertise the availability of the Crypto
instructions, so that userland libraries such as OpenSSL may use them.
(Support for the v8 Crypto instructions in the 32-bit build was added
to OpenSSL more than six months ago)
This adds the ID feature bit detection, and sets elf_hwcap2 accordingly.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The various CPU feature registers consist of 4-bit blocks that
represent signed quantities, whose positive values represent
incremental features, and whose negative values are reserved.
To improve forward compatibility, update the feature detection
code to take possible future higher values into account, but
ignore negative values.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This moves the .idmap.text section closer to .head.text, so that
relative branches are less likely to go out of range if the kernel
text gets bigger.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch replaces the 'branch to setup()' instructions embedded
in the PROCINFO structs with the offset to that setup function
relative to the base of the struct. This preserves the position
independent nature of that field, but uses a data item rather
than an instruction.
This is mainly done to prevent linker failures on large kernels,
where the setup function is out of reach for the branch.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Occasionally, there's a question about the method we use to find the
start of physical memory. Add some documentation so we don't have to
keep repeating outselves on the mailing list.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Allow users to enable the vdso in Kconfig; include the vdso in the
build if CONFIG_VDSO is enabled. Add 'vdso_install' target.
Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Initialize the VDSO page list at boot, install the VDSO mapping at
exec time, and update the data page during timer ticks. This code is
not built if CONFIG_VDSO is not enabled.
Account for the VDSO length when randomizing the offset from the
stack. The [vdso] and [vvar] pages are placed immediately following
the sigpage with separate _install_special_mapping calls.
We want to "penalize" systems lacking the arch timer as little
as possible. Previous versions of this code installed the VDSO
unconditionally and unmodified, making it a measurably slower way for
glibc to invoke the real syscalls on such systems. E.g. calling
gettimeofday via glibc goes from ~560ns to ~630ns on i.MX6Q.
If we can indicate to glibc that the time-related APIs in the VDSO are
not accelerated, glibc can continue to invoke the syscalls directly
instead of dispatching through the VDSO only to fall back to the slow
path.
Thus, if the architected timer is unusable for whatever reason, patch
the VDSO at boot time so that symbol lookups for gettimeofday and
clock_gettime return NULL. (This is similar to what powerpc does and
borrows code from there.) This allows glibc to perform the syscall
directly instead of passing control to the VDSO, which minimizes the
penalty. In my measurements the time taken for a gettimeofday call
via glibc goes from ~560ns to ~580ns (again on i.MX6Q), and this is
solely due to adding a test and branch to glibc's gettimeofday syscall
wrapper.
An alternative to patching the VDSO at boot would be to not install
the VDSO at all when the arch timer isn't usable. Another alternative
is to include a separate "dummy" vdso.so without gettimeofday and
clock_gettime, which would be selected at boot time. Either of these
would get cumbersome if the VDSO were to gain support for an API such
as getcpu which is unrelated to arch timer support.
Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Place VDSO-related user-space code in arch/arm/kernel/vdso/.
It is almost completely written in C with some assembly helpers to
load the data page address, sample the counter, and fall back to
system calls when necessary.
The VDSO can service gettimeofday and clock_gettime when
CONFIG_ARM_ARCH_TIMER is enabled and the architected timer is present
(and correctly configured). It reads the CP15-based virtual counter
to compute high-resolution timestamps.
Of particular note is that a post-processing step ("vdsomunge") is
necessary to produce a shared object which is architecturally allowed
to be used by both soft- and hard-float EABI programs.
The 2012 edition of the ARM ABI defines Tag_ABI_VFP_args = 3 "Code is
compatible with both the base and VFP variants; the user did not
permit non-variadic functions to pass FP parameters/results."
Unfortunately current toolchains do not support this tag, which is
ideally what we would use.
The best available option is to ensure that both EF_ARM_ABI_FLOAT_SOFT
and EF_ARM_ABI_FLOAT_HARD are unset in the ELF header's e_flags,
indicating that the shared object is "old" and should be accepted for
backward compatibility's sake. While binutils < 2.24 appear to
produce a vdso.so with both flags clear, 2.24 always sets
EF_ARM_ABI_FLOAT_SOFT, with no way to inhibit this behavior. So we
have to fix things up with a custom post-processing step.
In fact, the VDSO code in glibc does much less validation (including
checking these flags) than the code for handling conventional
file-backed shared libraries, so this is a bit moot unless glibc's
VDSO code becomes more strict.
Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Define the layout of the data structure shared between kernel and
userspace.
Track the vdso address in the mm_context; needed for communicating
AT_SYSINFO_EHDR to the ELF loader.
Add declarations for arm_install_vdso; implementation is in a
following patch.
Define AT_SYSINFO_EHDR, and, if CONFIG_VDSO=y, report the vdso shared
object address via the ELF auxiliary vector.
Note - this adds the AT_SYSINFO_EHDR in a new user-visible header
asm/auxvec.h; this is consistent with other architectures.
Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Now that the code is in place for KVM to support MIPS SIMD Architecutre
(MSA) in MIPS guests, wire up the new KVM_CAP_MIPS_MSA capability.
For backwards compatibility, the capability must be explicitly enabled
in order to detect or make use of MSA from the guest.
The capability is not supported if the hardware supports MSA vector
partitioning, since the extra support cannot be tested yet and it
extends the state that the userland program would have to save.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Add KVM register numbers for the MIPS SIMD Architecture (MSA) registers,
and implement access to them with the KVM_GET_ONE_REG / KVM_SET_ONE_REG
ioctls when the MSA capability is enabled (exposed in a later patch) and
present in the guest according to its Config3.MSAP bit.
The MSA vector registers use the same register numbers as the FPU
registers except with a different size (128bits). Since MSA depends on
Status.FR=1, these registers are inaccessible when Status.FR=0. These
registers are returned as a single native endian 128bit value, rather
than least significant half first with each 64-bit half native endian as
the kernel uses internally.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Add guest exception handling for MIPS SIMD Architecture (MSA) floating
point exceptions and MSA disabled exceptions.
MSA floating point exceptions from the guest need passing to the guest
kernel, so for these a guest MSAFPE is emulated.
MSA disabled exceptions are normally handled by passing a reserved
instruction exception to the guest (because no guest MSA was supported),
but the hypervisor can now handle them if the guest has MSA by passing
an MSA disabled exception to the guest, or if the guest has MSA enabled
by transparently restoring the guest MSA context and enabling MSA and
the FPU.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Emulate MSA related parts of COP0 interface so that the guest will be
able to enable/disable MSA (Config5.MSAEn) once the MSA capability has
been wired up.
As with the FPU (Status.CU1) setting Config5.MSAEn has no immediate
effect if the MSA state isn't live, as MSA state is restored lazily on
first use. Changes after the MSA state has been restored take immediate
effect, so that the guest can start getting MSA disabled exceptions
right away for guest MSA operations. The MSA state is saved lazily too,
as MSA may get re-enabled in the near future anyway.
A special case is also added for when Status.CU1 is set while FR=0 and
the MSA state is live. In this case we are at risk of getting reserved
instruction exceptions if we try and save the MSA state, so we lose the
MSA state sooner while MSA is still usable.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add base code for supporting the MIPS SIMD Architecture (MSA) in MIPS
KVM guests. MSA cannot yet be enabled in the guest, we're just laying
the groundwork.
As with the FPU, whether the guest's MSA context is loaded is stored in
another bit in the fpu_inuse vcpu member. This allows MSA to be disabled
when the guest disables it, but keeping the MSA context loaded so it
doesn't have to be reloaded if the guest re-enables it.
New assembly code is added for saving and restoring the MSA context,
restoring only the upper half of the MSA context (for if the FPU context
is already loaded) and for saving/clearing and restoring MSACSR (which
can itself cause an MSA FP exception depending on the value). The MSACSR
is restored before returning to the guest if MSA is already enabled, and
the existing FP exception die notifier is extended to catch the possible
MSA FP exception and step over the ctcmsa instruction.
The helper function kvm_own_msa() is added to enable MSA and restore
the MSA context if it isn't already loaded, which will be used in a
later patch when the guest attempts to use MSA for the first time and
triggers an MSA disabled exception.
The existing FPU helpers are extended to handle MSA. kvm_lose_fpu()
saves the full MSA context if it is loaded (which includes the FPU
context) and both kvm_lose_fpu() and kvm_drop_fpu() disable MSA.
kvm_own_fpu() also needs to lose any MSA context if FR=0, since there
would be a risk of getting reserved instruction exceptions if CU1 is
enabled and we later try and save the MSA context. We shouldn't usually
hit this case since it will be handled when emulating CU1 changes,
however there's nothing to stop the guest modifying the Status register
directly via the comm page, which will cause this case to get hit.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Now that the code is in place for KVM to support FPU in MIPS KVM guests,
wire up the new KVM_CAP_MIPS_FPU capability.
For backwards compatibility, the capability must be explicitly enabled
in order to detect or make use of the FPU from the guest.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Add KVM register numbers for the MIPS FPU registers, and implement
access to them with the KVM_GET_ONE_REG / KVM_SET_ONE_REG ioctls when
the FPU capability is enabled (exposed in a later patch) and present in
the guest according to its Config1.FP bit.
The registers are accessible in the current mode of the guest, with each
sized access showing what the guest would see with an equivalent access,
and like the architecture they may become UNPREDICTABLE if the FR mode
is changed. When FR=0, odd doubles are inaccessible as they do not exist
in that mode.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Add guest exception handling for floating point exceptions and
coprocessor 1 unusable exceptions.
Floating point exceptions from the guest need passing to the guest
kernel, so for these a guest FPE is emulated.
Also, coprocessor 1 unusable exceptions are normally passed straight
through to the guest (because no guest FPU was supported), but the
hypervisor can now handle them if the guest has its FPU enabled by
restoring the guest FPU context and enabling the FPU.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Emulate FPU related parts of COP0 interface so that the guest will be
able to enable/disable the following once the FPU capability has been
wired up:
- The FPU (Status.CU1)
- 64-bit FP register mode (Status.FR)
- Hybrid FP register mode (Config5.FRE)
Changing Status.CU1 has no immediate effect if the FPU state isn't live,
as the FPU state is restored lazily on first use. After that, changes
take place immediately in the host Status.CU1, so that the guest can
start getting coprocessor unusable exceptions right away for guest FPU
operations if it is disabled. The FPU state is saved lazily too, as the
FPU may get re-enabled in the near future anyway.
Any change to Status.FR causes the FPU state to be discarded and FPU
disabled, as the register state is architecturally UNPREDICTABLE after
such a change. This should also ensure that the FPU state is fully
initialised (with stale state, but that's fine) when it is next used in
the new FP mode.
Any change to the Config5.FRE bit is immediately updated in the host
state so that the guest can get the relevant exceptions right away for
single-precision FPU operations.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add base code for supporting FPU in MIPS KVM guests. The FPU cannot yet
be enabled in the guest, we're just laying the groundwork.
Whether the guest's FPU context is loaded is stored in a bit in the
fpu_inuse vcpu member. This allows the FPU to be disabled when the guest
disables it, but keeping the FPU context loaded so it doesn't have to be
reloaded if the guest re-enables it.
An fpu_enabled vcpu member stores whether userland has enabled the FPU
capability (which will be wired up in a later patch).
New assembly code is added for saving and restoring the FPU context, and
for saving/clearing and restoring FCSR (which can itself cause an FP
exception depending on the value). The FCSR is restored before returning
to the guest if the FPU is already enabled, and a die notifier is
registered to catch the possible FP exception and step over the ctc1
instruction.
The helper function kvm_lose_fpu() is added to save FPU context and
disable the FPU, which is used when saving hardware state before a
context switch or KVM exit (the vcpu_get_regs() callback).
The helper function kvm_own_fpu() is added to enable the FPU and restore
the FPU context if it isn't already loaded, which will be used in a
later patch when the guest attempts to use the FPU for the first time
and triggers a co-processor unusable exception.
The helper function kvm_drop_fpu() is added to discard the FPU context
and disable the FPU, which will be used in a later patch when the FPU
state will become architecturally UNPREDICTABLE (change of FR mode) to
force a reload of [stale] context in the new FR mode.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add a vcpu_get_regs() and vcpu_set_regs() callbacks for loading and
restoring context which may be in hardware registers. This may include
floating point and MIPS SIMD Architecture (MSA) state which may be
accessed directly by the guest (but restored lazily by the hypervisor),
and also dedicated guest registers as provided by the VZ ASE.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add Config4 and Config5 co-processor 0 registers, and add capability to
write the Config1, Config3, Config4, and Config5 registers using the KVM
API.
Only supported bits can be written, to minimise the chances of the guest
being given a configuration from e.g. QEMU that is inconsistent with
that being emulated, and as such the handling is in trap_emul.c as it
may need to be different for VZ. Currently the only modification
permitted is to make Config4 and Config5 exist via the M bits, but other
bits will be added for FPU and MSA support in future patches.
Care should be taken by userland not to change bits without fully
handling the possible extra state that may then exist and which the
guest may begin to use and depend on.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Various semi-used definitions exist in kvm_host.h for the default guest
config registers. Remove them and use the appropriate values directly
when initialising the Config registers.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Clean up KVM_GET_ONE_REG / KVM_SET_ONE_REG register definitions for
MIPS, to prepare for adding a new group for FPU & MSA vector registers.
Definitions are added for common bits in each group of registers, e.g.
KVM_REG_MIPS_CP0 = KVM_REG_MIPS | 0x10000, for the coprocessor 0
registers.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
The information messages when the KVM module is loaded and unloaded are
a bit pointless and out of line with other architectures, so lets drop
them.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Sort the registers in the kvm_mips_get_reg() switch by register number,
which puts ERROREPC after the CONFIG registers.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Implement access to the guest Processor Identification CP0 register
using the KVM_GET_ONE_REG and KVM_SET_ONE_REG ioctls. This allows the
owning process to modify and read back the value that is exposed to the
guest in this register.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Trap instructions are used by Linux to implement BUG_ON(), however KVM
doesn't pass trap exceptions on to the guest if they occur in guest
kernel mode, instead triggering an internal error "Exception Code: 13,
not yet handled". The guest kernel then doesn't get a chance to print
the usual BUG message and stack trace.
Implement handling of the trap exception so that it gets passed to the
guest and the user is left with a more useful log message.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
When handling floating point exceptions (FPEs) and MSA FPEs the Cause
bits of the appropriate control and status register (FCSR for FPEs and
MSACSR for MSA FPEs) are read and cleared before enabling interrupts,
presumably so that it doesn't have to go through the pain of restoring
those bits if the process is pre-empted, since writing those bits would
cause another immediate exception while still in the kernel.
The bits aren't normally ever restored again, since userland never
expects to see them set.
However for virtualisation it is necessary for the kernel to be able to
restore these Cause bits, as the guest may have been interrupted in an
FP exception handler but before it could read the Cause bits. This can
be done by registering a die notifier, to get notified of the exception
when such a value is restored, and if the PC was at the instruction
which is used to restore the guest state, the handler can step over it
and continue execution. The Cause bits can then remain set without
causing further exceptions.
For this to work safely a few changes are made:
- __build_clear_fpe and __build_clear_msa_fpe no longer clear the Cause
bits, and now return from exception level with interrupts disabled
instead of enabled.
- do_fpe() now clears the Cause bits and enables interrupts after
notify_die() is called, so that the notifier can chose to return from
exception without this happening.
- do_msa_fpe() acts similarly, but now actually makes use of the second
argument (msacsr) and calls notify_die() with the new DIE_MSAFP,
allowing die notifiers to be informed of MSA FPEs too.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Guest user mode can generate a guest MSA Disabled exception on an MSA
capable core by simply trying to execute an MSA instruction. Since this
exception is unknown to KVM it will be passed on to the guest kernel.
However guest Linux kernels prior to v3.15 do not set up an exception
handler for the MSA Disabled exception as they don't support any MSA
capable cores. This results in a guest OS panic.
Since an older processor ID may be being emulated, and MSA support is
not advertised to the guest, the correct behaviour is to generate a
Reserved Instruction exception in the guest kernel so it can send the
guest process an illegal instruction signal (SIGILL), as would happen
with a non-MSA-capable core.
Fix this as minimally as reasonably possible by preventing
kvm_mips_check_privilege() from relaying MSA Disabled exceptions from
guest user mode to the guest kernel, and handling the MSA Disabled
exception by emulating a Reserved Instruction exception in the guest,
via a new handle_msa_disabled() KVM callback.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # v3.15+
MIPS FP/MSA fixes from the MIPS tree. Includes a fix to ensure that the
FPU is properly disabled by lose_fpu() when MSA is in use, and Paul
Burton's "FP/MSA fixes" patchset which is required for FP/MSA support in
KVM:
> This series fixes a bunch of bugs, both build & runtime, with FP & MSA
> support. Most of them only affect systems with the new FP modes & MSA
> support enabled but patch 6 in particular is more general, fixing
> problems for mips64 systems.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The maximum word size is 64-bits since MSA state is saved using st.d
which stores two 64-bit words, therefore reimplement FPR_IDX using xor,
and only within each 64-bit word.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9169/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This reverts commit 02987633df.
The basic premise of the patch was incorrect since MSA context
(including FP state) is saved using st.d which stores two consecutive
64-bit words in memory rather than a single 128-bit word. This means
that even with big endian MSA, the FP state is still in the first 64-bit
word.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9168/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The expected semantics of __enable_fpu are for the FPU to be enabled
in the given mode if possible, otherwise for the FPU to be left
disabled and SIGFPE returned. The FPU was incorrectly being left
enabled in cases where the desired value for FR was unavailable.
Without ensuring the FPU is disabled in this case, it would be
possible for userland to go on to execute further FP instructions
natively in the incorrect mode, rather than those instructions being
trapped & emulated as they need to be.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9167/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
If a ptracee has not used the FPU and the ptracer sets its FP context
using PTRACE_POKEUSR, PTRACE_SETFPREGS or PTRACE_SETREGSET then that
context will be discarded upon either the ptracee using the FPU or a
further write to the context via ptrace. Prevent this loss by recording
that the task has "used" math once its FP context has been written to.
The context initialisation code that was present for the PTRACE_POKEUSR
case is reused for the other 2 cases to provide consistent behaviour
for the different ptrace requests.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9166/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
When running the emulator to handle an instruction that raised an FP
unimplemented operation exception, the FCSR cause bits were being
cleared. This is done to ensure that the kernel does not take an FP
exception when later restoring FP context to registers. However, this
was not being done when the emulator is invoked in response to a
coprocessor unusable exception. This happens in 2 cases:
- There is no FPU present in the system. In this case things were
OK, since the FP context is never restored to hardware registers
and thus no FP exception may be raised when restoring FCSR.
- The FPU could not be configured to the mode required by the task.
In this case it would be possible for the emulator to set cause
bits which are later restored to hardware if the task migrates
to a CPU whose associated FPU does support its mode requirements,
or if the tasks FP mode requirements change.
Consistently clear the cause bits after invoking the emulator, by moving
the clearing to process_fpemu_return and ensuring this is always called
before the tasks FP context is restored. This will make it easier to
catch further paths invoking the emulator in future, as will be
introduced in further patches.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9165/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Much like for traditional scalar FP exceptions, the cause bits in the
MSACSR register need to be cleared following an MSA FP exception.
Without doing so the exception will simply be raised again whenever
the kernel restores MSACSR from a tasks saved context, leading to
undesirable spurious exceptions. Clear the cause bits from the
handle_msa_fpe function, mirroring the way handle_fpe clears the
cause bits in FCSR.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9164/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Uses of the cfcmsa & ctcmsa instructions were not being wrapped by a
macro in the case where the toolchain supports MSA, since the arguments
exactly match a typical use of the instructions. However using current
toolchains this leads to errors such as:
arch/mips/kernel/genex.S:437: Error: opcode not supported on this processor: mips32r2 (mips32r2) `cfcmsa $5,1'
Thus uses of the instructions must be in the context of a ".set msa"
directive, however doing that from the users of the instructions would
be messy due to the possibility that the toolchain does not support
MSA. Fix this by renaming the macros (prepending an underscore) in order
to avoid recursion when attempting to emit the instructions, and provide
implementations for the TOOLCHAIN_SUPPORTS_MSA case which ".set msa" as
appropriate.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9163/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Recursive macros made the code more concise & worked great for the
case where the toolchain doesn't support MSA. However, with toolchains
which do support MSA they lead to build failures such as:
arch/mips/kernel/r4k_switch.S: Assembler messages:
arch/mips/kernel/r4k_switch.S:148: Error: invalid operands `insert.w $w(0+1)[2],$1'
arch/mips/kernel/r4k_switch.S:148: Error: invalid operands `insert.w $w(0+1)[3],$1'
arch/mips/kernel/r4k_switch.S:148: Error: invalid operands `insert.w $w((0+1)+1)[2],$1'
arch/mips/kernel/r4k_switch.S:148: Error: invalid operands `insert.w $w((0+1)+1)[3],$1'
...
Drop the recursion from msa_init_all_upper invoking the msa_init_upper
macro explicitly for each vector register.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9162/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Assuming at ($1) as the source or destination register of copy or
insert instructions:
- Simplifies the macros providing those instructions for toolchains
without MSA support.
- Avoids an unnecessary move instruction when at is used as the source
or destination register anyway.
- Is sufficient for the uses to be introduced in the kernel by a
subsequent patch.
Note that due to a patch ordering snafu on my part this also fixes the
currently broken build with MSA support enabled. The build has been
broken since commit c9017757c5 "MIPS: init upper 64b of vector
registers when MSA is first used", which this patch should have
preceeded.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9161/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The {save,restore}_fp_context{,32} functions require that the assembler
allows the use of sdc instructions on any FP register, and this is
acomplished by setting the arch to mips64r2 or mips64r6
(using MIPS_ISA_ARCH_LEVEL_RAW).
However this has the effect of enabling the assembler to use mips64
instructions in the expansion of pseudo-instructions. This was done in
the (now-reverted) commit eec43a224c "MIPS: Save/restore MSA context
around signals" which led to my mistakenly believing that there was an
assembler bug, when in reality the assembler was just emitting mips64
instructions. Avoid the issue for future commits which will add code to
r4k_fpu.S by pushing the .set MIPS_ISA_ARCH_LEVEL_RAW directives into
the functions that require it, and remove the spurious assertion
declaring the assembler bug.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
[james.hogan@imgtec.com: Rebase on v4.0-rc1 and reword commit message to
reflect use of MIPS_ISA_ARCH_LEVEL_RAW]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9612/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The lose_fpu() function only disables the FPU in CP0_Status.CU1 if the
FPU is in use and MSA isn't enabled.
This isn't necessarily a problem because KSTK_STATUS(current), the
version of CP0_Status stored on the kernel stack on entry from user
mode, does always get updated and gets restored when returning to user
mode, but I don't think it was intended, and it is inconsistent with the
case of only the FPU being in use. Sometimes leaving the FPU enabled may
also mask kernel bugs where FPU operations are executed when the FPU
might not be enabled.
So lets disable the FPU in the MSA case too.
Fixes: 33c771ba5c ("MIPS: save/disable MSA in lose_fpu")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9323/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This is a trivial port from kGraft. Module relocations are not supported yet.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
There are a couple of syscall argument zero-extension instructions in
the 32-bit compat entry code, and it was mentioned that people keep
trying to optimize them out, introducing bugs.
Make them more visible, and add a "do not remove" comment.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427452582-21624-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The existing comment has proven to be not very clear.
Replace it with a comment similar to the one we now have in the 64-bit
syscall entry point. (Three instances, one per 32-bit syscall entry).
In the INT80 entry point's CFI annotations, replace mysterious
expressions with numric constants. In this case, raw numbers
look more understandable.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427452582-21624-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The comment is ancient, it dates to the time when only AMD's
x86_64 implementation existed. AMD wasn't (and still isn't)
supporting SYSENTER, so these writes were "just in case" back
then.
This has changed: Intel's x86_64 appeared, and Intel does
support SYSENTER in long mode. "Some future 64-bit CPU" is here
already.
The code may appear "buggy" for AMD as it stands, since
MSR_IA32_SYSENTER_EIP is only 32-bit for AMD CPUs. Writing a
kernel function's address to it would drop high bits. Subsequent
use of this MSR for branch via SYSENTER seem to allow user to
transition to CPL0 while executing his code. Scary, eh?
Explain why that is not a bug: because SYSENTER insn would not
work on AMD CPU.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427453956-21931-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
board-rx51 has no card detect pin in the mmc slot, but can detect that
the (cell-phone) cover has been removed and the card is accessible.
The semantics between cover/card detect differ, the gpio on the slot
informs you after the card has been removed, cover removal does not
necessarily mean that the card has been removed.
This means different code paths are necessary. To complete this we
also want different fields in the platform data for cover and card
detect. This separation is not pushed all the way down into struct
omap2_hsmmc_info which is used to initialize the platform data.
If we did that we had to go over all board files and set the new
gpio_cod pin to -EINVAL. If we forget one board or some out-of-tree
archicture forgets that the default '0' is used which is a valid pin
number.
Signed-off-by: Andreas Fenkart <afenkart@gmail.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
'enum clock_event_mode' is used for two purposes today:
- to pass mode to the driver of clockevent device::set_mode().
- for managing state of the device for clockevents core.
For supporting new modes/states we have moved away from the
legacy set_mode() callback to new per-mode/state callbacks. New
modes/states shouldn't be exposed to the legacy (now OBSOLOTE)
callbacks and so we shouldn't add new states to 'enum
clock_event_mode'.
Lets have separate enums for the two use cases mentioned above.
Keep using the earlier enum for legacy set_mode() callback and
mark it OBSOLETE. And add another enum to clearly specify the
possible states of a clockevent device.
This also renames the newly added per-mode callbacks to reflect
state changes.
We haven't got rid of 'mode' member of 'struct
clock_event_device' as it is used by some of the clockevent
drivers and it would automatically die down once we migrate
those drivers to the new interface. It ('mode') is only updated
now for the drivers using the legacy interface.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: linaro-kernel@lists.linaro.org
Cc: linaro-networking@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/b6b0143a8a57bd58352ad35e08c25424c879c0cb.1425037853.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While thinking on the whole clock discussion it occurred to me we have
two distinct uses of time:
1) the tracking of event/ctx/cgroup enabled/running/stopped times
which includes the self-monitoring support in struct
perf_event_mmap_page.
2) the actual timestamps visible in the data records.
And we've been conflating them.
The first is all about tracking time deltas, nobody should really care
in what time base that happens, its all relative information, as long
as its internally consistent it works.
The second however is what people are worried about when having to
merge their data with external sources. And here we have the
discussion on MONOTONIC vs MONOTONIC_RAW etc..
Where MONOTONIC is good for correlating between machines (static
offset), MONOTNIC_RAW is required for correlating against a fixed rate
hardware clock.
This means configurability; now 1) makes that hard because it needs to
be internally consistent across groups of unrelated events; which is
why we had to have a global perf_clock().
However, for 2) it doesn't really matter, perf itself doesn't care
what it writes into the buffer.
The below patch makes the distinction between these two cases by
adding perf_event_clock() which is used for the second case. It
further makes this configurable on a per-event basis, but adds a few
sanity checks such that we cannot combine events with different clocks
in confusing ways.
And since we then have per-event configurability we might as well
retain the 'legacy' behaviour as a default.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
An upcoming patch will depend on tai_ns() and NMI-safe ktime_get_raw_fast(),
so merge timers/core here in a separate topic branch until it's all cooked
and timers/core is merged upstream.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull RCU updates from Paul E. McKenney:
- Documentation updates.
- Changes permitting use of call_rcu() and friends very early in
boot, for example, before rcu_init() is invoked.
- Miscellaneous fixes.
- Add in-kernel API to enable and disable expediting of normal RCU
grace periods.
- Improve RCU's handling of (hotplug-) outgoing CPUs.
Note: ARM support is lagging a bit here, and these improved
diagnostics might generate (harmless) splats.
- NO_HZ_FULL_SYSIDLE fixes.
- Tiny RCU updates to make it more tiny.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Internally, lockdep_sys_exit_thunk saves callee-clobbered
registers, and calls a C function, lockdep_sys_exit. Thus,
callee-preserved registers won't be mangled, there is no need to
save them.
Patch was run-tested.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427314468-12763-4-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is no need to have an extra level of macro indirection
here.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427314468-12763-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This change simply moves defines around (even if it's not
obvious in a patch form). Nothing is changed.
This is a preparation for folding ARCH_LOCKDEP_SYS_EXIT defines
into their users.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427314468-12763-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A named label "ret_from_sys_call" implies that there are jumps
to this location from elsewhere, as happens with many other
labels in this file.
But this label is used only by the JMP a few insns above.
To make that obvious, use local numeric label instead.
Improve comments:
"and return regs->ax" isn't too informative. We always return
regs->ax.
The comment suggesting that it'd be cool to use rip relative
addressing for CALL is deleted. It's unclear why that would be
an improvement - we aren't striving to use position-independent
code here. PIC code here would require something like LEA
sys_call_table(%rip),reg + CALL *(reg,%rax*8)...
"iret frame is also incomplete" is no longer true, fix that too.
Also fix typo in comment.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427303896-24023-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf_pmu_disable() is called before pmu->add() and perf_pmu_enable() is called
afterwards. No need to call these inside of x86_pmu_add() as well.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424281543-67335-1-git-send-email-dsahern@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation of adding another tkr field, rename this one to
tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base,
since the structure is not specific to CLOCK_MONOTONIC and the mono
name got added to the tk_read_base instance.
Lots of trivial churn.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On Broadwell INST_RETIRED.ALL cannot be used with any period
that doesn't have the lowest 6 bits cleared. And the period
should not be smaller than 128.
This is erratum BDM11 and BDM55:
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf
BDM11: When using a period < 100; we may get incorrect PEBS/PMI
interrupts and/or an invalid counter state.
BDM55: When bit0-5 of the period are !0 we may get redundant PEBS
records on overflow.
Add a new callback to enforce this, and set it for Broadwell.
How does this handle the case when an app requests a specific
period with some of the bottom bits set?
Short answer:
Any useful instruction sampling period needs to be 4-6 orders
of magnitude larger than 128, as an PMI every 128 instructions
would instantly overwhelm the system and be throttled.
So the +-64 error from this is really small compared to the
period, much smaller than normal system jitter.
Long answer (by Peterz):
IFF we guarantee perf_event_attr::sample_period >= 128.
Suppose we start out with sample_period=192; then we'll set period_left
to 192, we'll end up with left = 128 (we truncate the lower bits). We
get an interrupt, find that period_left = 64 (>0 so we return 0 and
don't get an overflow handler), up that to 128. Then we trigger again,
at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
an overflow). We increment with sample_period so we get left = 128. We
fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
overflow). And on and on.
So while the individual interrupts are 'wrong' we get then with
interval=256,128 in exactly the right ratio to average out at 192. And
this works for everything >=128.
So the num_samples*fixed_period thing is still entirely correct +- 127,
which is good enough I'd say, as you already have that error anyhow.
So no need to 'fix' the tools, al we need to do is refuse to create
INST_RETIRED:ALL events with sample_period < 128.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Updated comments and changelog a bit. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add Broadwell support for Broadwell to perf.
The basic support is very similar to Haswell. We use the new cache
event list added for Haswell earlier. The only differences
are a few bits related to remote nodes. To avoid an extra,
mostly identical, table these are patched up in the initialization code.
The constraint list has one new event that needs to be handled over Haswell.
Includes code and testing from Kan Liang.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Haswell offcore events are quite different from Sandy Bridge.
Add a new table to handle Haswell properly.
Note that the offcore bits listed in the SDM are not quite correct
(this is currently being fixed). An uptodate list of bits is
in the patch.
The basic setup is similar to Sandy Bridge. The prefetch columns
have been removed, as prefetch counting is not very reliable
on Haswell. One L1 event that is not in the event list anymore
has been also removed.
- data reads do not include code reads (comparable to earlier Sandy Bridge tables)
- data counts include speculative execution (except L1 write, dtlb, bpu)
- remote node access includes both remote memory, remote cache, remote mmio.
- prefetches are not included in the counts for consistency
(different from Sandy Bridge, which includes prefetches in the remote node)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Removed the HSM30 comments; we don't have them for SNB/IVB either. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When operating in left-justfied mode both the frame-clock and the
bit-clock need to be inverted to be standards compliant.
This means that the exta clock inversion setting in the armadillo800eva
machine driver for CPU component should now be removed.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Mark Brown <broonie@kernel.org>
The DAI link format should be specified for the whole link rather than just
one component on the link. So move the format specification for the HDMI
audio link from the CPU component to the link itself.
Since the sh-mobile-hdmi DAI driver doesn't implement the set_fmt() callback
in this case there is no functional difference between only specifying the
the format for the CPU side or for the whole link, but the later it will
allow us to remove support for just specifying the format for one component.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Mark Brown <broonie@kernel.org>
If the guest CPU is supposed to support rdtscp and the host has rdtscp
enabled in the secondary execution controls, we can also expose this
feature to L1. Just extend nested_vmx_exit_handled to properly route
EXIT_REASON_RDTSCP.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
virt/kvm was never really a good include directory for anything else
than locally included headers.
With the move of iodev.h there is no need anymore to add this
directory the compiler's include path, so remove it from the x86 kvm
Makefile.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
virt/kvm was never really a good include directory for anything else
than locally included headers.
With the move of iodev.h there is no need anymore to add this
directory the compiler's include path, so remove it from the arm and
arm64 kvm Makefile.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
iodev.h contains definitions for the kvm_io_bus framework. This is
needed both by the generic KVM code in virt/kvm as well as by
architecture specific code under arch/. Putting the header file in
virt/kvm and using local includes in the architecture part seems at
least dodgy to me, so let's move the file into include/kvm, so that a
more natural "#include <kvm/iodev.h>" can be used by all of the code.
This also solves a problem later when using struct kvm_io_device
in arm_vgic.h.
Fixing up the FSF address in the GPL header and a wrong include path
on the way.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This is needed in e.g. ARM vGIC emulation, where the MMIO handling
depends on the VCPU that does the access.
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Pull s390 fixes from Martin Schwidefsky:
"A couple of bug fixes for s390.
The ftrace comile fix is quite large for a -rc6 release, but it would
be nice to have it in 4.0"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/smp: reenable smt after resume
s390/mm: limit STACK_RND_MASK for compat tasks
s390/ftrace: fix compile error if CONFIG_KPROBES is disabled
s390/cpum_sf: add diagnostic sampling event only if it is authorized
A malicious signal handler / restorer can DOS the system by fudging the
user regs saved on stack, causing weird things such as sigreturn returning
to user mode PC but cpu state still being kernel mode....
Ensure that in sigreturn path status32 always has U bit; any other bogosity
(gargbage PC etc) will be taken care of by normal user mode exceptions mechanisms.
Reproducer signal handler:
void handle_sig(int signo, siginfo_t *info, void *context)
{
ucontext_t *uc = context;
struct user_regs_struct *regs = &(uc->uc_mcontext.regs);
regs->scratch.status32 = 0;
}
Before the fix, kernel would go off to weeds like below:
--------->8-----------
[ARCLinux]$ ./signal-test
Path: /signal-test
CPU: 0 PID: 61 Comm: signal-test Not tainted 4.0.0-rc5+ #65
task: 8f177880 ti: 5ffe6000 task.ti: 8f15c000
[ECR ]: 0x00220200 => Invalid Write @ 0x00000010 by insn @ 0x00010698
[EFA ]: 0x00000010
[BLINK ]: 0x2007c1ee
[ERET ]: 0x10698
[STAT32]: 0x00000000 : <--------
BTA: 0x00010680 SP: 0x5ffe7e48 FP: 0x00000000
LPS: 0x20003c6c LPE: 0x20003c70 LPC: 0x00000000
...
--------->8-----------
Reported-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The regfile provided to SA_SIGINFO signal handler as ucontext was off by
one due to pt_regs gutter cleanups in 2013.
Before handling signal, user pt_regs are copied onto user_regs_struct and copied
back later. Both structs are binary compatible. This was all fine until
commit 2fa919045b (ARC: pt_regs update #2) which removed the empty stack slot
at top of pt_regs (corresponding to first pad) and made the corresponding
fixup in struct user_regs_struct (the pad in there was moved out of
@scratch - not removed altogether as it is part of ptrace ABI)
struct user_regs_struct {
+ long pad;
struct {
- long pad;
long bta, lp_start, lp_end,....
} scratch;
...
}
This meant that now user_regs_struct was off by 1 reg w.r.t pt_regs and
signal code needs to user_regs_struct.scratch to reflect it as pt_regs,
which is what this commit does.
This problem was hidden for 2 years, because both save/restore, despite
using wrong location, were using the same location. Only an interim
inspection (reproducer below) exposed the issue.
void handle_segv(int signo, siginfo_t *info, void *context)
{
ucontext_t *uc = context;
struct user_regs_struct *regs = &(uc->uc_mcontext.regs);
printf("regs %x %x\n", <=== prints 7 8 (vs. 8 9)
regs->scratch.r8, regs->scratch.r9);
}
int main()
{
struct sigaction sa;
sa.sa_sigaction = handle_segv;
sa.sa_flags = SA_SIGINFO;
sigemptyset(&sa.sa_mask);
sigaction(SIGSEGV, &sa, NULL);
asm volatile(
"mov r7, 7 \n"
"mov r8, 8 \n"
"mov r9, 9 \n"
"mov r10, 10 \n"
:::"r7","r8","r9","r10");
*((unsigned int*)0x10) = 0;
}
Fixes: 2fa919045b "ARC: pt_regs update #2: Remove unused gutter at start of pt_regs"
CC: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This is another single fix, for an include dependency problem when using
ioremap_wc() from asm/io.h without also including asm/pgtable.h.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJVEXtTAAoJEGwLaZPeOHZ62GQP/39zCHTQ4ntXrzUFHueNqB+t
o3F205AL3HwHbygD+lpiYBMMPbh/5IOG9cFKxj798GRd6khfDX/hGha1OWtJT5hh
0vkhqGhVZPySfz+a7jzSPbmgrN4P/uvcavdNCmEJkDY6Gj/RVsfHpuSIrswAI0by
LIGAregj9MeQ2M4snggbkneHYey1xReGH4BQCid/j4tByJi2+AijRh+GSxLZ163U
WIcPPvEh7Rg3rTi39aHyjeIqgNIt4d+MLTGvvrQ7hZsCgrULFlPjCfi6mG398dqh
qg75oJYNoc42pDrX2Q6mLDgwjbdKEJhH5AHHsnY6Zhd7rSCUK46k2Vmq6sfAP0BK
/VXl9nQnKfm6We5zbPCDoC5mUxtQyLRXxli0vJGTDOI18kV92B92McWgaIN5fzRN
uD942EmEz13IdCGb26p5AhfhhmFsBB9i11wyZ4oKOl3Xnh0eAo8tM2ibhK5lWmog
BtaxrLO78rpGOwa99DtyNEorTICLYBQUPWHKrGa6PN5hUNwUoaJdZy/prqTfsn/9
6JI0NCQKLfSLx+K3T3fCKAowN1UsIQyjk57qjkTS0WHmnZqiLAelOD0z+lFWiPBH
zbGGcVwsQA+NiEcbsI3LDADIkhu7KWg3v+rPVpfn8gDJ2/PMsDSsJ1vGGws+jyna
Pg2vnJdUA6wf4xV7DmYg
=gsz4
-----END PGP SIGNATURE-----
Merge tag 'metag-fixes-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag
Pull arch/metag fix from James Hogan:
"Another metag architecture fix for v4.0
This is another single fix, for an include dependency problem when
using ioremap_wc() from asm/io.h without also including asm/pgtable.h"
* tag 'metag-fixes-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
metag: Fix ioremap_wc/ioremap_cached build errors
The irqclass_sub_desc array and enum interruption_class are out of sync
thus /proc/interrupts is broken. Remove IRQIO_CLW.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a BUILD_BUG_ON() to enforce that irqclass_sub_desc contains the required
number of defined interrupt descriptions and won't be filled up with zeros.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Rename two more files which I forgot. Also remove the "asm" from the
swsusp_asm64.S file, since the ".S" suffix already makes it obvious
that this file contains assembler code.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
ipl.c uses 3 different macros to create a sysfs show function for ipl
attributes. Define IPL_ATTR_SHOW_FN which is used by all macros.
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use macros wherever applicable and create a shutdown_action attribute
group to simplify the code.
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use macros wherever applicable and put bin_attributes inside attribute_groups
to simplify/remove some code.
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove __user address space annotation for sim_stor_event() calls since it
generates false positive warnings from sparse.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
As reported by sparse these can and should be static.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use the sturg instruction instead of the stura instruction. This allows to
modify up to eight bytes in a row instead of only four.
For function tracer enabling and disabling this reduces the time needed to
modify the text sections by 50%, since for each mcount call site six bytes
need to be changed.
Also remove the EXTABLE entries, since calls to this function are not
supposed to fail.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove the s390 architecture implementation of probe_kernel_write() and
instead use a new function s390_kernel_write() to modify kernel text and
data everywhere.
The s390 implementation of probe_kernel_write() was potentially broken
since it modified memory in a read-modify-write fashion, which read four
bytes, modified the requested bytes within those four bytes and wrote
the result back.
If two cpus would modify the same four byte area at different locations
within that area, this could lead to corruption.
Right now the only places which called probe_kernel_write() did run within
stop_machine_run. Therefore the scenario can't happen right now, however
that might change at any time.
To fix this rename probe_kernel_write() to s390_kernel_write() which can
have special semantics, like only call it while running within stop_machine().
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In case of a translation exception the page tables are corrupted. If the
exception handler then calls die() which again calls show_regs()
-> show_code() -> copy_from_user(), the kernel may access the same memory
location again and generates yet another translation exception. Which in turn
will lead to a deadlock on the die_lock spinlock, which the kernel tries to
grab recursively.
Given that the page tables are corrupted anyway, if we see such an exception,
let's simply panic.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Given that the kernel now always runs in 64 bit mode, it is
pointless to check if the z/Architecture mode is active.
Remove the checks.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since sizeof(long) == 4 is always false now, simplify cmpxchg_double a bit.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove the 31 bit syscalls from the syscall table. This is a separate patch
just in case I screwed something up so it can be easily reverted.
However the conversion was done with a script, so everything should be ok.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Rename a couple of files to get rid of the "64" suffix.
"git blame" will still work.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove the 31 bit support in order to reduce maintenance cost and
effectively remove dead code. Since a couple of years there is no
distribution left that comes with a 31 bit kernel.
The 31 bit kernel also has been broken since more than a year before
anybody noticed. In addition I added a removal warning to the kernel
shown at ipl for 5 minutes: a960062e58 ("s390: add 31 bit warning
message") which let everybody know about the plan to remove 31 bit
code. We didn't get any response.
Given that the last 31 bit only machine was introduced in 1999 let's
remove the code.
Anybody with 31 bit user space code can still use the compat mode.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
After a suspend/resume cycle we missed to enable smt again, which leads
to all sorts of bugs, since the kernel assumes smt is enabled, while the
hardware thinks it is not.
Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reported-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
- switch_mm() fix where init_mm.pgd ends up in the user TTBR0;
swapper_pg_dir is not suitable for user mappings
- this_cpu accessors fix for preemption safety
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVEag5AAoJEGvWsS0AyF7xA2gP/3x/Qh3iaON3v5zlP6gMFlzQ
i5L0k4GXdowbMixhe+2PifYg54JvRoDQpjobpOXKwUOPrsHSQHKhdo6PELf/EDA6
bmuak45GeayP5s4LvIDrm8zIoToF3DzIsvT2d9rPLcA5E0jLOZr9mFcJyCu81jwG
JJFea4a4oWnfzETFx86NHLHkxMTokL+TqeLkHPNDm6ZY+ajMPOMR6COvlCBNkGkD
0s3s6yKQefL9q/3OzhJOGfIQghQWAR6oQUqKNvABmeeOtyUC3KApAdFk+3v+fOaV
nAf3thImnd7YdBSC34qCbKAPGvUQZAZbqkMBdtOl6Yk0xMilPb5AVh/5M45++GKd
64rQrDdDRlYEwpThoHceMOSMUY2gKL+KZr7HuK1ZDnQzGrD7yro9V/LRzdpPHcXN
wfrO2xfAF4wowj5AhhSkJ6n1TolrpIkUj0dNyAmiQ7qh9J2LS7HgDSvBXCRaHP/y
z7kNrybqpWMCa3Xg7Tez6K/pV1ObHK2MufLk3wsBL5JuPoxpLmqwhtQABWFwoMks
lQ4y3QweQktGhhp1ZN1Rn93vTOsK4tZrONnwvRmZE/mM5N+ELxVZxfrFcuPWUAcG
y1WiXKVXnXZ/MxgYj5XLTxjCB6K0Rpz4ywE1GfdJKReBbW5l99kzXEYLoTVzD0am
3wallsQqIFPErQ6pbgEw
=MQxz
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull two arm64 fixes from Catalin Marinas:
- switch_mm() fix where init_mm.pgd ends up in the user TTBR0;
swapper_pg_dir is not suitable for user mappings
- this_cpu accessors fix for preemption safety
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: percpu: Make this_cpu accessors pre-empt safe
arm64: Use the reserved TTBR0 if context switching to the init_mm
- Fix the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER
- Little endian fixes for post mobility device tree update
- Add PVR for POWER8NVL processor
- Fixes for hypervisor doorbell handling
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVEMazAAoJEFHr6jzI4aWAdZUP/jr6w7JtFRMUf2VvrU+QsIlZ
QHXIVXONvWxpLLkdDMJvbZ8BEm/sgXyVMpjgEZPneeNYIJp/PeZJqvDjIw5VBEPd
mMljfMwcYLvGAnpJnP3dgnuQUYf6/g1YiZlIHxn4qsmzlCPgzJm5Wn1ZzupmbZZk
vNEHeSZZ5ILLI+2JNwal2uOf8L2Q4a/OINjhPy/2aM+aBXMUX75jK4G4BLpyTs5M
jHTSSkerhvlM2SWy5Pu4UtJ19Sj3m48TgT0ZzkPfRGv4EEs8oLC7gMI1+T+/KE/F
pCgnVB3G5YxP8ln7RrHUH7hod/c3gzu5hsLzyIIH6gquzmffEAEGmYWg8aLOXdGQ
EGkZKubmoCr5j09SLD//Gb5Ru6e93seRF/9cn5wHiAFccVsA+r4VyvMuoHL/b4aP
o47d3cMwsQWqUsoeOfWUOQRZHQWa9kS/ueWOWzTjEt8KILNnf0L8QyL/GzAGrEYz
Rn6lEVaKsKPwMLuPQRjifCIG8Mno9gEVPYd5ZyxzOVirq4lUUqMSprG//u8iuavj
si/N0gwRVkrK3Vy+zp1p1pmii2XRSrVgBKPyx3RIxiZKfVTypQmT1NIZpywQVT3j
wYD7YbyPYk6maY63m5jy7X/9P48NdRSd+rg0+fMuN/Dps+lHQTy8hzf4bU7kT7hO
aIEbuTBunMkaadxxItJI
=ZvQl
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
Pull powerpc fixes from Michael Ellerman:
- Fix the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER
- Little endian fixes for post mobility device tree update
- Add PVR for POWER8NVL processor
- Fixes for hypervisor doorbell handling
* tag 'powerpc-4.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
powerpc/book3s: Fix the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER
powerpc/pseries: Little endian fixes for post mobility device tree update
powerpc: Add PVR for POWER8NVL processor
powerpc/powernv: Fixes for hypervisor doorbell handling
Without proper regulator support for individual boards, it is dangerous
to have overclocked/overvoltaged OPPs in the list. Cpufreq will increase
the frequency without the accompanying voltage increase, resulting in
an unstable system.
Remove them for now. We can revisit them with the new version of OPP
bindings, which support boost settings and frequency ranges, among
other things.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
The Olimex A10-Lime is known to be unstable when running at 1008MHz.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
- extend/clarify explanations where necessary
- move comments from macro values to before the macro, to
make them more consistent, and to reduce preprocessor overhead
- sort GDT index and selector values likewise by number
- use consistent, modern kernel coding style across the file
- capitalize consistently
- use consistent vertical spacing
- remove the unused get_limit() method (noticed by Andy Lutomirski)
No change in code (verified with objdump -d):
64-bit defconfig+kvmconfig:
815a129bc1f80de6445c1d8ca5b97cad vmlinux.o.before.asm
815a129bc1f80de6445c1d8ca5b97cad vmlinux.o.after.asm
32-bit defconfig+kvmconfig:
e659ef045159ddf41a0771b33a34aae5 vmlinux.o.before.asm
e659ef045159ddf41a0771b33a34aae5 vmlinux.o.after.asm
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We currently have a race: if we're preempted during syscall
exit, we can fail to process syscall return work that is queued
up while we're preempted in ret_from_sys_call after checking
ti.flags.
Fix it by disabling interrupts before checking ti.flags.
Reported-by: Stefan Seyfried <stefan.seyfried@googlemail.com>
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Fixes: 96b6352c12 ("x86_64, entry: Remove the syscall exit audit")
Link: http://lkml.kernel.org/r/189320d42b4d671df78c10555976bb10af1ffc75.1427137498.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The THREAD_INFO() macro has a somewhat confusingly generic name,
defined in a generic .h C header file. It also does not make it
clear that it constructs a memory operand for use in assembly
code.
Rename it to ASM_THREAD_INFO() to make it all glaringly
obvious on first glance.
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/20150324184442.GC14760@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On CONFIG_IA32_EMULATION=y kernels we set up
MSR_IA32_SYSENTER_CS/ESP/EIP, but on !CONFIG_IA32_EMULATION
kernels we leave them unchanged.
Clear them to make sure the instruction is disabled properly.
SYSCALL is set up properly in both cases.
Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This file just defines a number of constants, and a few macros
and inline functions. It is particularly badly written.
For example, it is not trivial to see how descriptors are
numbered (you'd expect that should be easy, right?).
This change deobfuscates it via the following changes:
Group all GDT_ENTRY_foo together (move intervening stuff away).
Number them explicitly: use a number, not PREV_DEFINE+1, +2, +3:
I want to immediately see that GDT_ENTRY_PNPBIOS_CS32 is 18.
Seeing (GDT_ENTRY_KERNEL_BASE+6) instead is not useful.
The above change allows to remove GDT_ENTRY_KERNEL_BASE
and GDT_ENTRY_PNPBIOS_BASE, which weren't used anywhere else.
After a group of GDT_ENTRY_foo, define all selector values.
Remove or improve some comments. In particular:
Comment deleted as stating the obvious:
/*
* The GDT has 32 entries
*/
#define GDT_ENTRIES 32
"The segment offset needs to contain a RPL. Grr. -AK"
changed to
"Selectors need to also have a correct RPL (+3 thingy)"
"GDT layout to get 64bit syscall right (sysret hardcodes gdt
offsets)" expanded into a description *how exactly* sysret
hardcodes them.
Patch was tested to compile and not change vmlinux.o
on 32-bit and 64-bit builds (verified with objdump).
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The FIXUP_TOP_OF_STACK macro is only necessary because we don't save %r11
to pt_regs->r11 on SYSCALL64 fast path, but we want ptrace to see it populated.
Bite the bullet, add a single additional PUSH instruction, and remove
the FIXUP_TOP_OF_STACK macro.
The RESTORE_TOP_OF_STACK macro is already a nop. Remove it too.
On SandyBridge CPU, it does not get slower:
measured 54.22 ns per getpid syscall before and after last two
changes on defconfig kernel.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-4-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With this change, on SYSCALL64 code path we are now populating
pt_regs->cs, pt_regs->ss and pt_regs->rcx unconditionally and
therefore don't need to do that in FIXUP_TOP_OF_STACK.
We lose a number of large instructions there:
text data bss dec hex filename
13298 0 0 13298 33f2 entry_64_before.o
12978 0 0 12978 32b2 entry_64.o
What's more important, we convert two "MOVQ $imm,off(%rsp)" to
"PUSH $imm" (the ones which fill pt_regs->cs,ss).
Before this patch, placing them on fast path was slowing it down
by two cycles: this form of MOV is very large, 12 bytes, and
this probably reduces decode bandwidth to one instruction per cycle
when CPU sees them.
Therefore they were living in FIXUP_TOP_OF_STACK instead (away
from fast path).
"PUSH $imm" is a small 2-byte instruction. Moving it to fast path does
not slow it down in my measurements.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
PER_CPU_VAR(kernel_stack) was set up in a way where it points
five stack slots below the top of stack.
Presumably, it was done to avoid one "sub $5*8,%rsp"
in syscall/sysenter code paths, where iret frame needs to be
created by hand.
Ironically, none of them benefits from this optimization,
since all of them need to allocate additional data on stack
(struct pt_regs), so they still have to perform subtraction.
This patch eliminates KERNEL_STACK_OFFSET.
PER_CPU_VAR(kernel_stack) now points directly to top of stack.
pt_regs allocations are adjusted to allocate iret frame as well.
Hopefully we can merge it later with 32-bit specific
PER_CPU_VAR(cpu_current_top_of_stack) variable...
Net result in generated code is that constants in several insns
are changed.
This change is necessary for changing struct pt_regs creation
in SYSCALL64 code path from MOV to PUSH instructions.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This changes the THREAD_INFO() definition and all its callsites
so that they do not count stack position from
(top of stack - KERNEL_STACK_OFFSET), but from top of stack.
Semi-mysterious expressions THREAD_INFO(%rsp,RIP) - "why RIP??"
are now replaced by more logical THREAD_INFO(%rsp,SIZEOF_PTREGS)
- "calculate thread_info's address using information that
rsp is SIZEOF_PTREGS bytes below top of stack".
While at it, replace "(off)-THREAD_SIZE(reg)" with equivalent
"((off)-THREAD_SIZE)(reg)". The form without parentheses
falsely looks like we invoke THREAD_SIZE() macro.
Improve comment atop THREAD_INFO macro definition.
This patch does not change generated code (verified by objdump).
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
this_cpu operations were implemented for arm64 in:
5284e1b arm64: xchg: Implement cmpxchg_double
f97fc81 arm64: percpu: Implement this_cpu operations
Unfortunately, it is possible for pre-emption to take place between
address generation and data access. This can lead to cases where data
is being manipulated by this_cpu for a different CPU than it was
called on. Which effectively breaks the spec.
This patch disables pre-emption for the this_cpu operations
guaranteeing that address generation and data manipulation take place
without a pre-emption in-between.
Fixes: 5284e1b4bc ("arm64: xchg: Implement cmpxchg_double")
Fixes: f97fc81079 ("arm64: percpu: Implement this_cpu operations")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@linaro.org>
[catalin.marinas@arm.com: remove space after type cast]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Historically, the PMU devicetree bindings have expected SPIs to be
listed in order of *logical* CPU number. This is problematic for
bootloaders, especially when the boot CPU (logical ID 0) isn't listed
first in the devicetree.
This patch adds a new optional property, interrupt-affinity, to the
PMU node which allows the interrupt affinity to be described using
a list of phandled to CPU nodes, with each entry in the list
corresponding to the SPI at the same index in the interrupts property.
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This fixes a bug in the new v8 Crypto Extensions GHASH code
that only manifests itself in big-endian mode.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Rename mce_severity() to mce_severity_intel() and assign the
mce_severity function pointer to mce_severity_amd() during init on AMD.
This way, we can avoid a test to call mce_severity_amd every time we get
into mce_severity(). And it's cleaner to do it this way.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Suggested-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1427125373-2918-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Add a severities function that caters to AMD processors. This allows us
to do some vendor-specific work within the function if necessary.
Also, introduce a vendor flag bitfield for vendor-specific settings. The
severities code uses this to define error scope based on the prescence
of the flags field.
This is based off of work by Boris Petkov.
Testing details:
Fam10h, Model 9h (Greyhound)
Fam15h: Models 0h-0fh (Orochi), 30h-3fh (Kaveri) and 60h-6fh (Carrizo),
Fam16h Model 00h-0fh (Kabini)
Boris:
Intel SNB
AMD K8 (JH-E0)
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/1427125373-2918-2-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Fixup build, clean up comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
ARM32 and ARM64 have the same DT definitions and the same approaches.
The generic ARM cpuidle driver can be put in common for those two
architectures.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>
Acked-by: Rob Herring <robherring2@gmail.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
With this change the cpuidle-arm64.c file calls the same function name
for both ARM and ARM64.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>
Acked-by: Rob Herring <robherring2@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
The current state of the different cpuidle drivers is the different PM
operations are passed via the platform_data using the platform driver
paradigm.
This approach allowed to split the low level PM code from the arch specific
and the generic cpuidle code.
Unfortunately there are complaints about this approach as, in the context of the
single kernel image, we have multiple drivers loaded in memory for nothing and
the platform driver is not adequate for cpuidle.
This patch provides a common interface via cpuidle ops for all new cpuidle
driver and a definition for the device tree.
It will allow with the next patches to a have a common definition with ARM64
and share the same cpuidle driver.
The code is optimized to use the __init section intensively in order to reduce
the memory footprint after the driver is initialized and unify the function
names with ARM64.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>
Acked-by: Rob Herring <robherring2@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Having syscall32/sysenter32 initialization in a separate tiny
function, called from within a function that is already syscall
init specific, serves no real purpose.
Its existense also caused an unintended effect of having
wrmsrl(MSR_CSTAR) performed twice: once we set it to a dummy
function returning -ENOSYS, and immediately after
(if CONFIG_IA32_EMULATION), we set it to point to the proper
syscall32 entry point, ia32_cstar_target.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Conflicts:
net/netfilter/nf_tables_core.c
The nf_tables_core.c conflict was resolved using a conflict resolution
from Stephen Rothwell as a guide.
Signed-off-by: David S. Miller <davem@davemloft.net>
An overhead from function call is not appropriate for its size and
frequency of execution.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
1. Fixes
2. Implement access register mode in KVM
3. Provide a userspace post handler for the STSI instruction
4. Provide an interface for compliant memory accesses
5. Provide an interface for getting/setting the guest storage key
6. Fixup for the vector facility patches: do not announce the
vector facility in the guest for old QEMUs.
1-5 were initially shown as RFC in
http://www.spinics.net/lists/kvm/msg114720.html
some small review changes
- added some ACKs
- have the AR mode patches first
- get rid of unnecessary AR_INVAL define
- typos and language
6. two new patches
The two new patches fixup the vector support patches that were
introduced in the last pull request for QEMU versions that dont
know about vector support and guests that do. (We announce the
facility bit, but dont enable the facility so vector aware guests
will crash on vector instructions).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJVCV7NAAoJEBF7vIC1phx8ymcP/RovYmBGd7e6jBZLx4fooc97
DFuMkEdNT3bkA0x/L+SgYcExFkoAUX5KPK74mHOTmmBZHopX1AMDKQyEDacjNWBb
9CJdPffJWKKjFtC7KwrkgnDKBrOsmNLdWsLtl8aEIAxxKznvLXYsrvMrBYqdRkUh
nJsjaQueKP8AbzSLsG9N3Yilps2988VMo+wArfw0jVCCO+sWNZnYYsMXwRgyQsPb
K5+0Co/cw1wfnzy1hUWqpRWs26JLIcewLnMx9Ycoaap1V0A59t8J/9xPDHHODX3d
2wXlJsiNmoJj/kqakT4xlNTS0q7Tn2iLlJ1NNUADScR3zP7twXs17H2g8hGSVRiM
xBd0s671m4eSZB5Bk3LSf0PPLbOmTnCB1qYpXhd56an3MGbYIzRtcmU9LvbY3SzH
yxsVUGww1uvYN6A5RABDjqrbmnl9eQ4HruNUnA/fHLS6sDtYbmi7ln4bV3eE1sPa
0r7lPYKWdvyN0FO3Rb9Qwnjhd3F/uaLTWjptdLXapRLO0fD8adiYqzWmbXMZrgcD
BU1CNejIIYP/GZDOQanJoVQJWd9akUW/s6QiDMJRc87KaL13/cCWoicPi1j62ygj
gueYjz1KfKh6hoVvviTl2NgyXke2qs+bIKDrRV6VfdIgfeSN50lUxdJm61RdDA9d
IPrxUJDy8YdzH7rcMPXP
=GqOj
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-20150318' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into queue
KVM: s390: Features and fixes for 4.1 (kvm/next)
1. Fixes
2. Implement access register mode in KVM
3. Provide a userspace post handler for the STSI instruction
4. Provide an interface for compliant memory accesses
5. Provide an interface for getting/setting the guest storage key
6. Fixup for the vector facility patches: do not announce the
vector facility in the guest for old QEMUs.
1-5 were initially shown as RFC in
http://www.spinics.net/lists/kvm/msg114720.html
some small review changes
- added some ACKs
- have the AR mode patches first
- get rid of unnecessary AR_INVAL define
- typos and language
6. two new patches
The two new patches fixup the vector support patches that were
introduced in the last pull request for QEMU versions that dont
know about vector support and guests that do. (We announce the
facility bit, but dont enable the facility so vector aware guests
will crash on vector instructions).
kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled.
We need to do that for irq notifiers. (Like with edge interrupts.)
Fix it by skipping EOI broadcast only.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=82211
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The following point:
2. per-CPU pvclock time info is updated if the
underlying CPU changes.
Is not true anymore since "KVM: x86: update pvclock area conditionally,
on cpu migration".
Add task migration notification back.
Problem noticed by Andy Lutomirski.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
CC: stable@kernel.org # 3.11+
The idle_task_exit() function may call switch_mm() with next ==
&init_mm. On arm64, init_mm.pgd cannot be used for user mappings, so
this patch simply sets the reserved TTBR0.
Cc: <stable@vger.kernel.org>
Reported-by: Jon Medhurst (Tixy) <tixy@linaro.org>
Tested-by: Jon Medhurst (Tixy) <tixy@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull sparc fixes from David Miller:
"Some perf bug fixes from David Ahern, and the fix for that nasty
memmove() bug"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix several bugs in memmove().
sparc: Touch NMI watchdog when walking cpus and calling printk
sparc: perf: Add support M7 processor
sparc: perf: Make counting mode actually work
sparc: perf: Remove redundant perf_pmu_{en|dis}able calls
The cpu_do_idle() function is always used by the cpuidle drivers.
That led to have each driver including cpuidle.h and proc-fns.h, they are
always paired. That makes a lot of duplicate headers inclusion. Instead of
including both in each .c file, move the proc-fns.h header inclusion in the
cpuidle.h header file directly, so we can save some line of code.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Firstly, handle zero length calls properly. Believe it or not there
are a few of these happening during early boot.
Next, we can't just drop to a memcpy() call in the forward copy case
where dst <= src. The reason is that the cache initializing stores
used in the Niagara memcpy() implementations can end up clearing out
cache lines before we've sourced their original contents completely.
For example, considering NG4memcpy, the main unrolled loop begins like
this:
load src + 0x00
load src + 0x08
load src + 0x10
load src + 0x18
load src + 0x20
store dst + 0x00
Assume dst is 64 byte aligned and let's say that dst is src - 8 for
this memcpy() call. That store at the end there is the one to the
first line in the cache line, thus clearing the whole line, which thus
clobbers "src + 0x28" before it even gets loaded.
To avoid this, just fall through to a simple copy only mildly
optimized for the case where src and dst are 8 byte aligned and the
length is a multiple of 8 as well. We could get fancy and call
GENmemcpy() but this is good enough for how this thing is actually
used.
Reported-by: David Ahern <david.ahern@oracle.com>
Reported-by: Bob Picco <bpicco@meloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 054954eb05 ("xen: switch to linear
virtual mapped sparse p2m list") introduced a regression regarding to
memory hotplug for a pv-domain: as the virtual space for the p2m list
is allocated for the to be expected memory size of the domain only,
hotplugged memory above that size will not be usable by the domain.
Correct this by using a configurable size for the p2m list in case of
memory hotplug enabled (default supported memory size is 512 GB for
64 bit domains and 4 GB for 32 bit domains).
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org> # 3.19+
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
The recent old_rsp -> rsp_scratch rename also changed this
comment, but in this case "old_rsp" was not referring to
PER_CPU(old_rsp).
Fix this.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427115839-6397-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When ioremap_wc() or ioremap_cached() are used without first including
asm/pgtable.h, the _PAGE_CACHEABLE or _PAGE_WR_COMBINE definitions
aren't found, resulting in build errors like the following (in
next-20150323 due to "lib: devres: add a helper function for
ioremap_wc"):
lib/devres.c: In function ‘devm_ioremap_wc’:
lib/devres.c:91: error: ‘_PAGE_WR_COMBINE’ undeclared
We can't easily include asm/pgtable.h in asm/io.h due to dependency
problems, so split out the _PAGE_* definitions from asm/pgtable.h into a
separate asm/pgtable-bits.h header (as a couple of other architectures
already do), and include that in io.h instead.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Cc: Abhilash Kesavan <a.kesavan@samsung.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make the code which sets up the pmd depend on PT_NLEVELS == 3, not on
CONFIG_64BIT. The reason is, that a 64bit kernel with a page size
greater than 4k doesn't need the pmd and thus has PT_NLEVELS = 2.
Signed-off-by: Helge Deller <deller@gmx.de>
The patch dc6c9a35b6 that counts pmds
allocated for a process introduced a bug on 64-bit PA-RISC kernels.
The PA-RISC architecture preallocates one pmd with each pgd. This
preallocated pmd can never be freed - pmd_free does nothing when it is
called with this pmd. When the kernel attempts to free this preallocated
pmd, it decreases the count of allocated pmds. The result is that the
counter underflows and this error is reported.
This patch fixes the bug by artifically incrementing the counter in
pmd_free when the kernel tries to free the preallocated pmd.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Helge Deller <deller@gmx.de>
This allows us to remove some unnecessary ifdefs. There should
be no change to the generated code.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f7e00f0d668e253abf0bd8bf36491ac47bd761ff.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
user_mode_vm() and user_mode() are now the same. Change all callers
of user_mode_vm() to user_mode().
The next patch will remove the definition of user_mode_vm.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/43b1f57f3df70df5a08b0925897c660725015554.1426728647.git.luto@kernel.org
[ Merged to a more recent kernel. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
user_mode() is now identical to user_mode_vm(). Subsequent patches
will change all callers of user_mode_vm() to user_mode() and then
delete user_mode_vm().
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/0dd03eacb5f0a2b5ba0240de25347a31b493c289.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A few of the user_mode() checks in traps.c are immediately after
explicit checks for vm86 mode. Change them to user_mode_ignore_vm86().
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/0b324d5b75c3402be07f8d3c6245ed7f4995029e.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There's no point in checking the VM bit on 64-bit, and, since
we're explicitly checking it, we can use user_mode_ignore_vm86()
after the check.
While we're at it, rearrange the #ifdef slightly to make the code
flow a bit clearer.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/dc1457a734feccd03a19bb3538a7648582f57cdd.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
user_mode() is dangerous and user_mode_vm() has a confusing name.
Add user_mode_ignore_vm86() (equivalent to current user_mode()).
We'll change the small number of legitimate users of user_mode()
to user_mode_ignore_vm86().
Inspired by grsec, although this works rather differently.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/202c56ca63823c338af8e2e54948dbe222da6343.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
strcmp() is always expected to return 0 when arguments are equal,
negative when its first argument @str1 is less than its second argument
@str2 and a positive value otherwise. Previously strcmp("a", "b")
returned 1. Now it gives -1, as it is supposed to.
Until now this bug never triggered, because all uses for strcmp() in the
boot code tested for nonzero:
triton:~/tip> git grep strcmp arch/x86/boot/
arch/x86/boot/boot.h:int strcmp(const char *str1, const char *str2);
arch/x86/boot/edd.c: if (!strcmp(eddarg, "skipmbr") || !strcmp(eddarg, "skip")) {
arch/x86/boot/edd.c: else if (!strcmp(eddarg, "off"))
arch/x86/boot/edd.c: else if (!strcmp(eddarg, "on"))
should in the future strcmp() be used in a comparative way in the boot
code, it might have led to (not so subtle) bugs.
Signed-off-by: Arjun Sreedharan <arjun024@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426520267-1803-1-git-send-email-arjun024@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The private pointer provided by the cacheinfo code is used to implement
the AMD L3 cache-specific attributes using a pointer to the northbridge
descriptor. It is needed for performing L3-specific operations and for
that we need a couple of PCI devices and other service information, all
contained in the northbridge descriptor.
This results in failure of cacheinfo setup as shown below as
cache_get_priv_group() returns the uninitialised private attributes which
are not valid for Intel processors.
------------[ cut here ]------------
WARNING: CPU: 3 PID: 1 at fs/sysfs/group.c:102
internal_create_group+0x151/0x280()
sysfs: (bin_)attrs not set by subsystem for group: index3/
Modules linked in:
CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.0.0-rc3+ #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
...
Call Trace:
dump_stack
warn_slowpath_common
warn_slowpath_fmt
internal_create_group
sysfs_create_groups
device_add
cpu_device_create
? __kmalloc
cache_add_dev
cacheinfo_sysfs_init
? container_dev_init
do_one_initcall
kernel_init_freeable
? rest_init
kernel_init
ret_from_fork
? rest_init
This patch fixes the issue by checking if the L3 cache indices are
populated correctly (AMD-specific) before initializing the private
attributes.
Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Certain MSRs are only relevant to a kernel in host mode, and kvm had
chosen not to implement these MSRs at all for guests. If a guest kernel
ever tried to access these MSRs, the result was a general protection
fault.
KVM will be separately patched to return 0 when these MSRs are read,
and this patch ensures that MSR accesses are tolerant of exceptions.
Signed-off-by: Jesse Larrew <jesse.larrew@amd.com>
[ Drop {} braces around loop ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joel Schopp <joel.schopp@amd.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/1426262619-5016-1-git-send-email-jesse.larrew@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that eager_fpu_init_bp() does setup_init_fpu_buf() only and
nothing else, we can remove it and move this code into its "caller",
eager_fpu_init().
This avoids the confusing games with "static __refdata void (*boot_func)":
init_xstate_buf can be NULL only during boot, so it is safe to call the
__init-annotated setup_init_fpu_buf() function in eager_fpu_init(), we
just need to mark it as __init_refok.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150314151334.GC13029@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that kthreads do not use FPU until they get executed, swapper/0
doesn't need to allocate fpu->state.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150313182716.GB8249@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Call it what it does and in accordance with the context where it is
used: we reset the FPU state either because we were unable to restore it
from the one saved in the task or because we simply want to reset it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
flush_thread() -> drop_init_fpu() is suboptimal and confusing. It does
drop_fpu() or restore_init_xstate() depending on !use_eager_fpu(). But
flush_thread() too checks eagerfpu right after that, and if it is true
then restore_init_xstate() just burns CPU for no reason. We are going to
load init_xstate_buf again after we set used_math()/user_has_fpu(), until
then the FPU state can't survive after switch_to().
Remove it, and change the "if (!use_eager_fpu())" to call drop_fpu().
While at it, clean up the tsk/current usage.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150313173030.GA31217@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change flush_thread() to do user_fpu_begin() and restore_init_xstate()
instead of math_state_restore().
Note: "TODO: cleanup this horror" is still valid. We do not need
init_fpu() at all, we only need fpu_alloc() and memset(0). But this
needs other changes, in particular user_fpu_begin() should set
used_math().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150311173449.GE5032@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Extract the "use_eager_fpu()" code from drop_init_fpu() into a new,
simple helper restore_init_xstate(). The next patch adds another user.
- It is not clear why we do not check use_fxsr() like fpu_restore_checking()
does. eager_fpu_init_bp() calls setup_init_fpu_buf() too, and we have the
"eagerfpu=on" kernel option.
- Ignoring the fact that init_xstate_buf is "struct xsave_struct *", not
"union thread_xstate *", it is not clear why we can not simply use
fpu_restore_checking() and avoid the code duplication.
- It is not clear why we can't call setup_init_fpu_buf() unconditionally
to always create init_xstate_buf(). Then do_device_not_available() path
(at least) could use restore_init_xstate() too. It doesn't need to init
fpu->state, its content doesn't matter until unlazy_fpu()/__switch_to()/etc
which overwrites this memory anyway.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150311173429.GD5032@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, user_fpu_begin() has a single caller and it is not clear why
do we actually need it and why we should not worry about preemption
right after preempt_enable().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150311173409.GC5032@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We want to check whether user code is in 32-bit mode, not
whether the task is nominally 32-bit.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/33e5107085ce347a8303560302b15c2cadd62c4c.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is slightly shorter and slightly faster. It's also more
correct: the split between user and kernel addresses is
TASK_SIZE_MAX, regardless of ti->flags.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/09156b63bad90a327827003c9e53faa82ef4c56e.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Both the execve() and sigreturn() family of syscalls have the
ability to change registers in ways that may not be compatabile
with the syscall path they were called from.
In particular, SYSRET and SYSEXIT can't handle non-default %cs and %ss,
and some bits in eflags.
These syscalls have stubs that are hardcoded to jump to the IRET path,
and not return to the original syscall path.
The following commit:
76f5df43ca ("Always allocate a complete "struct pt_regs" on the kernel stack")
recently changed this for some 32-bit compat syscalls, but introduced a bug where
execve from a 32-bit program to a 64-bit program would fail because it still returned
via SYSRETL. This caused Wine to fail when built for both 32-bit and 64-bit.
This patch sets TIF_NOTIFY_RESUME for execve() and sigreturn() so
that the IRET path is always taken on exit to userspace.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1426978461-32089-1-git-send-email-brgerst@gmail.com
[ Improved the changelog and comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
commit id 2ba9f0d has changed CONFIG_KVM_BOOK3S_64_HV to tristate to allow
HV/PR bits to be built as modules. But the MCE code still depends on
CONFIG_KVM_BOOK3S_64_HV which is wrong. When user selects
CONFIG_KVM_BOOK3S_64_HV=m to build HV/PR bits as a separate module the
relevant MCE code gets excluded.
This patch fixes the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER. This
makes sure that the relevant MCE code is included when HV/PR bits
are built as a separate modules.
Fixes: 2ba9f0d887 ("kvm: powerpc: book3s: Support building HV and PR KVM as module")
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
- Revert a recent PCI commit related to IRQ resources management
that introduced a regression for drivers attempting to bind to
devices whose previous drivers did not balance pci_enable_device()
and pci_disable_device() as expected (Rafael J Wysocki).
- Fix a deadlock in at91_rtc_interrupt() introduced by a typo in a
recent commit related to wakeup interrupt handling (Dan Carpenter).
- Allow the power capping RAPL (Running-Average Power Limit) driver
to use different energy units for domains within one CPU package
which is necessary to handle Intel Haswell EP processors correctly
(Jacob Pan).
- Improve the cpuidle mvebu driver's handling of Armada XP SoCs by
updating the target residency and exit latency numbers for those
chips (Sebastien Rannou).
- Prevent the cpuidle mvebu driver from calling cpu_pm_enter() twice
in a row before cpu_pm_exit() is called on the same CPU which
breaks the core's assumptions regarding the usage of those
functions (Gregory Clement).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJVDLwIAAoJEILEb/54YlRxSrMQAKn/DwfNMqVWP5vf/YWauSfL
S51+E/emGvy+3fwBsa46KkddRQ0ysxc1wKIHcWXc1UPtA+lKNS3MoCUdD+isUFt7
PUrMsblgjh/e6LiXOBqElAiuugVoH7JCVMKvlv5Tsn3qxY3AJEoxGwV7p4XyP6lJ
PumqAvWFtaIFKThJFdKPGC511tYTQWoZ/3u843aEsHtpvmiytgUrvxpuCXlSSKT1
vbOdHAJXi0QyQYWIZ0VNN+MZ2WvaU9t1QCpBJUnzZMi2kuG3HP9rzY40GOnoMn6/
jXaxegeT7UX5JY5NWU9VrrVwKzppIpyKW6yckIRcKD+ovwKdGbMrfMco2iyK1xgV
Q6B5h5guYTTynjBoi9XO3d7AWN3gM+8OYCPJgcRG2BMQEunlS0D+i3cRDqeHzW0M
W+OaENK9MnxG9KVEq0PIrWomGZL1SlOtHfHm9xu8hpqGx4h1iTSgiAEFQQ+Zmmzh
+g1OLgddHkWjkPoZ/Y8d1NpdnTf+kbkm8Wqm9Uyie1/HnUJMnHYNbzZTyF4ZjlV2
MAl2P0zBqWhLEDb4STHWHdnBZVhvGCpg1J2pFaSRjDEn+EP0YBH+LscWi//xONNr
5acBoVzid92co+JwrYn3/MYHctV8bBLdXqeGUiuKD6tk9u+aLke24RTBwm5frPBE
SjHr1sLhmzubzXtzQIrp
=4Oz9
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These are fixes for recent regressions (PCI/ACPI resources and at91
RTC locking), a stable-candidate powercap RAPL driver fix and two ARM
cpuidle fixes (one stable-candidate too).
Specifics:
- Revert a recent PCI commit related to IRQ resources management that
introduced a regression for drivers attempting to bind to devices
whose previous drivers did not balance pci_enable_device() and
pci_disable_device() as expected (Rafael J Wysocki).
- Fix a deadlock in at91_rtc_interrupt() introduced by a typo in a
recent commit related to wakeup interrupt handling (Dan Carpenter).
- Allow the power capping RAPL (Running-Average Power Limit) driver
to use different energy units for domains within one CPU package
which is necessary to handle Intel Haswell EP processors correctly
(Jacob Pan).
- Improve the cpuidle mvebu driver's handling of Armada XP SoCs by
updating the target residency and exit latency numbers for those
chips (Sebastien Rannou).
- Prevent the cpuidle mvebu driver from calling cpu_pm_enter() twice
in a row before cpu_pm_exit() is called on the same CPU which
breaks the core's assumptions regarding the usage of those
functions (Gregory Clement)"
* tag 'pm+acpi-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "x86/PCI: Refine the way to release PCI IRQ resources"
rtc: at91rm9200: double locking bug in at91_rtc_interrupt()
powercap / RAPL: handle domains with different energy units
cpuidle: mvebu: Update cpuidle thresholds for Armada XP SOCs
cpuidle: mvebu: Fix the CPU PM notifier usage