We can disable debug_pagealloc processing even if the code is compiled
with CONFIG_DEBUG_PAGEALLOC. This patch changes the code to query
whether it is enabled or not in runtime.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I ran into this issue while debugging an early boot problem. The system
hit a BUG_ON() but report bug failed to print the line number and file
name. The reason being that the system was running in real mode and
report_bug() searches for addresses in the PAGE_OFFSET+ region.
Suggested-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This function has been unused since commit 14cf11af6c ("powerpc: Merge enough
to start building in arch/powerpc."), so remove it.
Signed-off-by: Rashmica Gupta <rashmicy@gmail.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This means the 'M' flag will work properly when the kernel prints a backtrace.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently DSCR (Data Stream Control Register) can be accessed with
mfspr or mtspr instructions inside a thread via two different SPR
numbers. One being the user accessible problem state SPR number 0x03
and the other being the privilege state SPR number 0x11. All access
through the privilege state SPR number get emulated through illegal
instruction exception. Any access through the problem state SPR number
raises one facility unavailable exception which sets the thread based
dscr_inherit bit and enables DSCR facility through FSCR register thus
allowing direct access to DSCR without going through this exception in
the future. We set the thread.dscr_inherit bit whether the access was
with mfspr or mtspr instruction which is neither correct nor does it
match the behaviour through the instruction emulation code path driven
from privilege state SPR number. User currently observes two different
kind of behaviour when accessing the DSCR through these two SPR numbers.
This problem can be observed through these two test cases by replacing
the privilege state SPR number with the problem state SPR number.
(1) http://ozlabs.org/~anton/junkcode/dscr_default_test.c
(2) http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c
This patch fixes the problem by making sure that the behaviour visible
to the user remains the same irrespective of which SPR number is being
used. Inside facility unavailable exception, we check whether it was
cuased by a mfspr or a mtspr isntrucction. In case of mfspr instruction,
just emulate the instruction. In case of mtspr instruction, set the
thread based dscr_inherit bit and also enable the facility through FSCR.
All user SPR based mfspr instruction will be emulated till one user SPR
based mtspr has been executed.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Remove slice_set_psize() which is not used.
It was added in 3a8247cc2c "powerpc: Only demote individual slices
rather than whole process" but was never used.
Remove vsx_assist_exception() which is not used.
It was added in ce48b21007 "powerpc: Add VSX context save/restore,
ptrace and signal support" but was never used.
Remove generic_mach_cpu_die() which is not used.
Its last caller was removed in 375f561a41 "powerpc/powernv: Always go
into nap mode when CPU is offline".
Remove mpc7448_hpc2_power_off() and mpc7448_hpc2_halt() which are
unused.
These were introduced in c5d56332fd "[POWERPC] Add general support for
mpc7448hpc2 (Taiga) platform" but were never used.
This was partially found by using a static code analysis program called
cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
[mpe: Update changelog with details on when/why they are unused]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This still has not been merged and now powerpc is the only arch that does
not have this change. Sorry about missing linuxppc-dev before.
V2->V2
- Fix up to work against 3.18-rc1
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.
The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e. using a global
register that may be set to the per cpu base.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
[mpe: Fix build errors caused by set/or_softirq_pending(), and rework
assignment in __set_breakpoint() to use memcpy().]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Handle Hypervisor Maintenance Interrupt (HMI) in Linux. This patch implements
basic infrastructure to handle HMI in Linux host. The design is to invoke
opal handle hmi in real mode for recovery and set irq_pending when we hit HMI.
During check_irq_replay pull opal hmi event and print hmi info on console.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In machine_check_e500 exception handler is a wrong indication
in case of MCSR_BUS_WBERR - so print "Write" instead of "Read".
Signed-off-by: Wladislav Wiebe <wladislav.kw@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
We don't see MCE counter getting increased in /proc/interrupts which gives
false impression of no MCE occurred even when there were MCE events.
The machine check early handling was added for PowerKVM and we missed to
increment the MCE count in the early handler.
We also increment mce counters in the machine_check_exception call, but
in most cases where we handle the error hypervisor never reaches there
unless its fatal and we want to crash. Only during fatal situation we may
see double increment of mce count. We need to fix that. But for
now it always good to have some count increased instead of zero.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Recent CPUs support quad word load and store instructions. Add
support to the alignment handler for them.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The facility unavailable exception can be triggered from userspace by
accessing PMU registers when EBB is not enabled. This causes the
included pr_err() to run, hence spamming the kernel log buffer.
This avoids this by rate limiting these messages.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, if a process starts a transaction and then takes an
exception because the FPU, VMX or VSX unit is unavailable to it,
we end up corrupting any FP/VMX/VSX state that was valid before
the interrupt. For example, if the process starts a transaction
with the FPU available to it but VMX unavailable, and then does
a VMX instruction inside the transaction, the FP state gets
corrupted.
Loading up the desired state generally involves doing a reclaim
and a recheckpoint. To avoid corrupting already-valid state, we have
to be careful not to reload that state from the thread_struct
between the reclaim and the recheckpoint (since the thread_struct
values are stale by now), and we have to reload that state from
the transact_fp/vr arrays after the recheckpoint to get back the
current transactional values saved there by the reclaim.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, when we have a process using the transactional memory
facilities on POWER8 (that is, the processor is in transactional
or suspended state), and the process enters the kernel and the
kernel then uses the floating-point or vector (VMX/Altivec) facility,
we end up corrupting the user-visible FP/VMX/VSX state. This
happens, for example, if a page fault causes a copy-on-write
operation, because the copy_page function will use VMX to do the
copy on POWER8. The test program below demonstrates the bug.
The bug happens because when FP/VMX state for a transactional process
is stored in the thread_struct, we store the checkpointed state in
.fp_state/.vr_state and the transactional (current) state in
.transact_fp/.transact_vr. However, when the kernel wants to use
FP/VMX, it calls enable_kernel_fp() or enable_kernel_altivec(),
which saves the current state in .fp_state/.vr_state. Furthermore,
when we return to the user process we return with FP/VMX/VSX
disabled. The next time the process uses FP/VMX/VSX, we don't know
which set of state (the current register values, .fp_state/.vr_state,
or .transact_fp/.transact_vr) we should be using, since we have no
way to tell if we are still in the same transaction, and if not,
whether the previous transaction succeeded or failed.
Thus it is necessary to strictly adhere to the rule that if FP has
been enabled at any point in a transaction, we must keep FP enabled
for the user process with the current transactional state in the
FP registers, until we detect that it is no longer in a transaction.
Similarly for VMX; once enabled it must stay enabled until the
process is no longer transactional.
In order to keep this rule, we add a new thread_info flag which we
test when returning from the kernel to userspace, called TIF_RESTORE_TM.
This flag indicates that there is FP/VMX/VSX state to be restored
before entering userspace, and when it is set the .tm_orig_msr field
in the thread_struct indicates what state needs to be restored.
The restoration is done by restore_tm_state(). The TIF_RESTORE_TM
bit is set by new giveup_fpu/altivec_maybe_transactional helpers,
which are called from enable_kernel_fp/altivec, giveup_vsx, and
flush_fp/altivec_to_thread instead of giveup_fpu/altivec.
The other thing to be done is to get the transactional FP/VMX/VSX
state from .fp_state/.vr_state when doing reclaim, if that state
has been saved there by giveup_fpu/altivec_maybe_transactional.
Having done this, we set the FP/VMX bit in the thread's MSR after
reclaim to indicate that that part of the state is now valid
(having been reclaimed from the processor's checkpointed state).
Finally, in the signal handling code, we move the clearing of the
transactional state bits in the thread's MSR a bit earlier, before
calling flush_fp_to_thread(), so that we don't unnecessarily set
the TIF_RESTORE_TM bit.
This is the test program:
/* Michael Neuling 4/12/2013
*
* See if the altivec state is leaked out of an aborted transaction due to
* kernel vmx copy loops.
*
* gcc -m64 htm_vmxcopy.c -o htm_vmxcopy
*
*/
/* We don't use all of these, but for reference: */
int main(int argc, char *argv[])
{
long double vecin = 1.3;
long double vecout;
unsigned long pgsize = getpagesize();
int i;
int fd;
int size = pgsize*16;
char tmpfile[] = "/tmp/page_faultXXXXXX";
char buf[pgsize];
char *a;
uint64_t aborted = 0;
fd = mkstemp(tmpfile);
assert(fd >= 0);
memset(buf, 0, pgsize);
for (i = 0; i < size; i += pgsize)
assert(write(fd, buf, pgsize) == pgsize);
unlink(tmpfile);
a = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
assert(a != MAP_FAILED);
asm __volatile__(
"lxvd2x 40,0,%[vecinptr] ; " // set 40 to initial value
TBEGIN
"beq 3f ;"
TSUSPEND
"xxlxor 40,40,40 ; " // set 40 to 0
"std 5, 0(%[map]) ;" // cause kernel vmx copy page
TABORT
TRESUME
TEND
"li %[res], 0 ;"
"b 5f ;"
"3: ;" // Abort handler
"li %[res], 1 ;"
"5: ;"
"stxvd2x 40,0,%[vecoutptr] ; "
: [res]"=r"(aborted)
: [vecinptr]"r"(&vecin),
[vecoutptr]"r"(&vecout),
[map]"r"(a)
: "memory", "r0", "r3", "r4", "r5", "r6", "r7");
if (aborted && (vecin != vecout)){
printf("FAILED: vector state leaked on abort %f != %f\n",
(double)vecin, (double)vecout);
exit(1);
}
munmap(a, size);
close(fd);
printf("PASSED!\n");
return 0;
}
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds the early machine check function pointer in cputable for
CPU specific early machine check handling. The early machine handle routine
will be called in real mode to handle SLB and TLB errors. We can not reuse
the existing machine_check hook because it is always invoked in kernel
virtual mode and we would already be in trouble if we get SLB or TLB errors.
This patch just sets up a mechanism to invoke CPU specific handler. The
subsequent patches will populate the function pointer.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move machine check entry point into Linux. So far we were dependent on
firmware to decode MCE error details and handover the high level info to OS.
This patch introduces early machine check routine that saves the MCE
information (srr1, srr0, dar and dsisr) to the emergency stack. We allocate
stack frame on emergency stack and set the r1 accordingly. This allows us to be
prepared to take another exception without loosing context. One thing to note
here that, if we get another machine check while ME bit is off then we risk a
checkstop. Hence we restrict ourselves to save only MCE information and
register saved on PACA_EXMC save are before we turn the ME bit on. We use
paca->in_mce flag to differentiate between first entry and nested machine check
entry which helps proper use of emergency stack. We increment paca->in_mce
every time we enter in early machine check handler and decrement it while
leaving. When we enter machine check early handler first time (paca->in_mce ==
0), we are sure nobody is using MC emergency stack and allocate a stack frame
at the start of the emergency stack. During subsequent entry (paca->in_mce >
0), we know that r1 points inside emergency stack and we allocate separate
stack frame accordingly. This prevents us from clobbering MCE information
during nested machine checks.
The early machine check handler changes are placed under CPU_FTR_HVMODE
section. This makes sure that the early machine check handler will get executed
only in hypervisor kernel.
This is the code flow:
Machine Check Interrupt
|
V
0x200 vector ME=0, IR=0, DR=0
|
V
+-----------------------------------------------+
|machine_check_pSeries_early: | ME=0, IR=0, DR=0
| Alloc frame on emergency stack |
| Save srr1, srr0, dar and dsisr on stack |
+-----------------------------------------------+
|
(ME=1, IR=0, DR=0, RFID)
|
V
machine_check_handle_early ME=1, IR=0, DR=0
|
V
+-----------------------------------------------+
| machine_check_early (r3=pt_regs) | ME=1, IR=0, DR=0
| Things to do: (in next patches) |
| Flush SLB for SLB errors |
| Flush TLB for TLB errors |
| Decode and save MCE info |
+-----------------------------------------------+
|
(Fall through existing exception handler routine.)
|
V
machine_check_pSerie ME=1, IR=0, DR=0
|
(ME=1, IR=1, DR=1, RFID)
|
V
machine_check_common ME=1, IR=1, DR=1
.
.
.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
side: the HV and emulation flavors can now coexist in a single kernel
is probably the most interesting change from a user point of view.
On the x86 side there are nested virtualization improvements and a
few bugfixes. ARM got transparent huge page support, improved
overcommit, and support for big endian guests.
Finally, there is a new interface to connect KVM with VFIO. This
helps with devices that use NoSnoop PCI transactions, letting the
driver in the guest execute WBINVD instructions. This includes
some nVidia cards on Windows, that fail to start without these
patches and the corresponding userspace changes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJShPAhAAoJEBvWZb6bTYbyl48P/297GgmELHAGBgjvb6q7yyGu
L8+eHjKbh4XBAkPwyzbvUjuww5z2hM0N3JQ0BDV9oeXlO+zwwCEns/sg2Q5/NJXq
XxnTeShaKnp9lqVBnE6G9rAOUWKoyLJ2wItlvUL8JlaO9xJ0Vmk0ta4n2Nv5GqDp
db6UD7vju6rHtIAhNpvvAO51kAOwc01xxRixCVb7KUYOnmO9nvpixzoI/S0Rp1gu
w/OWMfCosDzBoT+cOe79Yx1OKcpaVW94X6CH1s+ShCw3wcbCL2f13Ka8/E3FIcuq
vkZaLBxio7vjUAHRjPObw0XBW4InXEbhI1DjzIvm8dmc4VsgmtLQkTCG8fj+jINc
dlHQUq6Do+1F4zy6WMBUj8tNeP1Z9DsABp98rQwR8+BwHoQpGQBpAxW0TE0ZMngC
t1caqyvjZ5pPpFUxSrAV+8Kg4AvobXPYOim0vqV7Qea07KhFcBXLCfF7BWdwq/Jc
0CAOlsLL4mHGIQWZJuVGw0YGP7oATDCyewlBuDObx+szYCoV4fQGZVBEL0KwJx/1
7lrLN7JWzRyw6xTgJ5VVwgYE1tUY4IFQcHu7/5N+dw8/xg9KWA3f4PeMavIKSf+R
qteewbtmQsxUnvuQIBHLs8NRWPnBPy+F3Sc2ckeOLIe4pmfTte6shtTXcLDL+LqH
NTmT/cfmYp2BRkiCfCiS
=rWNf
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM changes from Paolo Bonzini:
"Here are the 3.13 KVM changes. There was a lot of work on the PPC
side: the HV and emulation flavors can now coexist in a single kernel
is probably the most interesting change from a user point of view.
On the x86 side there are nested virtualization improvements and a few
bugfixes.
ARM got transparent huge page support, improved overcommit, and
support for big endian guests.
Finally, there is a new interface to connect KVM with VFIO. This
helps with devices that use NoSnoop PCI transactions, letting the
driver in the guest execute WBINVD instructions. This includes some
nVidia cards on Windows, that fail to start without these patches and
the corresponding userspace changes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
kvm, vmx: Fix lazy FPU on nested guest
arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu
arm/arm64: KVM: MMIO support for BE guest
kvm, cpuid: Fix sparse warning
kvm: Delete prototype for non-existent function kvm_check_iopl
kvm: Delete prototype for non-existent function complete_pio
hung_task: add method to reset detector
pvclock: detect watchdog reset at pvclock read
kvm: optimize out smp_mb after srcu_read_unlock
srcu: API for barrier after srcu read unlock
KVM: remove vm mmap method
KVM: IOMMU: hva align mapping page size
KVM: x86: trace cpuid emulation when called from emulator
KVM: emulator: cleanup decode_register_operand() a bit
KVM: emulator: check rex prefix inside decode_register()
KVM: x86: fix emulation of "movzbl %bpl, %eax"
kvm_host: typo fix
KVM: x86: emulate SAHF instruction
MAINTAINERS: add tree for kvm.git
Documentation/kvm: add a 00-INDEX file
...
Commit 9863c28a2a ("powerpc: Emulate sync
instruction variants") introduced a build breakage with
CONFIG_PPC_EMULATED_STATS enabled.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Kumar Gala <galak@kernel.org>
Cc: James Yang <James.Yang@freescale.com>
---
This patch modifies the Oops message in case of Software Emulation Exception.
The existing message is quite confusing because it refers to FPU Emulation
while most often the issue is due to either a non supported instruction
(not necessarily FPU related) or a stale instruction due to HW issues.
The new message tries to be more generic in order to make the user understand
that the Oops is due to something wrong with an instruction, not necessarily
due to an FPU instruction.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
This way we can use same data type struct with KVM and
also help in using other debug related function.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Acked-by: Michael Neuling <mikey@neuling.org>
[scottwood@freescale.com: removed obvious debug_reg comment]
Signed-off-by: Scott Wood <scottwood@freescale.com>
This way we can use same data type struct with KVM and
also help in using other debug related function.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Reserved fields of the sync instruction have been used for other
instructions (e.g. lwsync). On processors that do not support variants
of the sync instruction, emulate it by executing a sync to subsume the
effect of the intended instruction.
Signed-off-by: James Yang <James.Yang@freescale.com>
[scottwood@freescale.com: whitespace and subject line fix]
Signed-off-by: Scott Wood <scottwood@freescale.com>
This creates new 'thread_fp_state' and 'thread_vr_state' structures
to store FP/VSX state (including FPSCR) and Altivec/VSX state
(including VSCR), and uses them in the thread_struct. In the
thread_fp_state, the FPRs and VSRs are represented as u64 rather
than double, since we rarely perform floating-point computations
on the values, and this will enable the structures to be used
in KVM code as well. Similarly FPSCR is now a u64 rather than
a structure of two 32-bit values.
This takes the offsets out of the macros such as SAVE_32FPRS,
REST_32FPRS, etc. This enables the same macros to be used for normal
and transactional state, enabling us to delete the transactional
versions of the macros. This also removes the unused do_load_up_fpu
and do_load_up_altivec, which were in fact buggy since they didn't
create large enough stack frames to account for the fact that
load_up_fpu and load_up_altivec are not designed to be called from C
and assume that their caller's stack frame is an interrupt frame.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
As suggested by paulus we can simplify the Data Stream Control Register
(DSCR) Facility Status and Control Register (FSCR) handling.
Firstly, we simplify the asm by using a rldimi.
Secondly, we now use the FSCR only to control the DSCR facility, rather
than both the FSCR and HFSCR. Users will see no functional change from
this but will get a minor speedup as they will trap into the kernel only
once (rather than twice) when they first touch the DSCR. Also, this
changes removes a bunch of ugly FTR_SECTION code.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In the program check handler we handle some causes with interrupts off
and others with interrupts on.
We need to enable interrupts to handle the emulation cases, because they
access userspace memory and might sleep.
For faults in the kernel we don't want to do any emulation, and
emulate_instruction() enforces that. do_mathemu() doesn't but probably
should.
The other disadvantage of enabling interrupts for kernel faults is that
we may take another interrupt, and recurse. As seen below:
--- Exception: e40 at c000000000004ee0 performance_monitor_relon_pSeries_1
[link register ] c00000000000f858 .arch_local_irq_restore+0x38/0x90
[c000000fb185dc10] 0000000000000000 (unreliable)
[c000000fb185dc80] c0000000007d8558 .program_check_exception+0x298/0x2d0
[c000000fb185dd00] c000000000002f40 emulation_assist_common+0x140/0x180
--- Exception: e40 at c000000000004ee0 performance_monitor_relon_pSeries_1
[link register ] c00000000000f858 .arch_local_irq_restore+0x38/0x90
[c000000fb185dff0] 00000000008b9190 (unreliable)
[c000000fb185e060] c0000000007d8558 .program_check_exception+0x298/0x2d0
So avoid both problems by checking if the fault was in the kernel and
skipping the enable of interrupts and the emulation. Go straight to
delivering the SIGILL, which for kernel faults calls die() and so on,
dropping us in the debugger etc.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Alistair noticed we got a SIGILL on userspace mfpvr instructions.
Remove the little endian check in the emulation code, it is
probably there to protect against the old pseudo little endian
implementations but doesn't make sense for real little endian.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There are two invocations of do_mathemu() in traps.c. And the codes
in these two places are almost the same. Introduce a locale function
to eliminate the duplication. With this change we can also make sure
that in program_check_exception() the PPC_WARN_EMULATED is invoked for
the correctly emulated math instructions.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
By doing this we can make sure that the FPU state is only flushed to
the thread struct when it is really needed.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
POWER8 allows the DSCR to be accessed directly from userspace via a new SPR
number 0x3 (Rather than 0x11. DSCR SPR number 0x11 is still used on POWER8 but
like POWER7, is only accessible in HV and OS modes). Currently, we allow this
by setting H/FSCR DSCR bit on boot.
Unfortunately this doesn't work, as the kernel needs to see the DSCR change so
that it knows to no longer restore the system wide version of DSCR on context
switch (ie. to set thread.dscr_inherit).
This clears the H/FSCR DSCR bit initially. If a process then accesses the DSCR
(via SPR 0x3), it'll trap into the kernel where we set thread.dscr_inherit in
facility_unavailable_exception().
We also change _switch() so that we set or clear the H/FSCR DSCR bit based on
the thread.dscr_inherit.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A PCIe erratum of mpc85xx may causes a core hang when a link of PCIe
goes down. when the link goes down, Non-posted transactions issued
via the ATMU requiring completion result in an instruction stall.
At the same time a machine-check exception is generated to the core
to allow further processing by the handler. We implements the handler
which skips the instruction caused the stall.
This patch depends on patch:
powerpc/85xx: Add platform_device declaration to fsl_pci.h
Signed-off-by: Zhao Chenhui <b35336@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Liu Shuo <soniccat.liu@gmail.com>
Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQEcBAABAgAGBQJR0K2gAAoJEHm+PkMAQRiGWsEH+gMZSN1qRm34hZ82q1Tx7HvL
Eb/Gsl3Qw/7G2TlTqgjBUs36IdqV9O2cui/aa3/TfXvdvrx+0GlhRkEwQPc+ygcO
Mvoyoke4tT4+4jVFdCg1J8avREsa28/6oaHs0ZZxuVmJBBLTJH7aXaNsGn6eU1q9
9+p798MQis6naIiPC63somlZcCIiBhsuWCPWpEfLMn8G1HWAFTM3xXIbNBqe/brS
bmIOfhomlIZ5dcdaXGvjtP3+KJhkNDwhkPC4tVYu8JqqgSlrE+a+EGyEuuGqKk10
U+swiqyuD31uBI9ga54u/2FzSqDiAu6YOcMXevjo/m3g9XLdYbYLvN+nvN8alCQ=
=Ob6Z
-----END PGP SIGNATURE-----
Merge tag 'v3.10' into next
Merge 3.10 in order to get some of the last minute powerpc
changes, resolve conflicts and add additional fixes on top
of them.
Similar to the facility unavailble exception, except the facilities are
controlled by HFSCR.
Adapt the facility_unavailable_exception() so it can be called for
either the regular or Hypervisor facility unavailable exceptions.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The exception at 0xf60 is not the TM (Transactional Memory) unavailable
exception, it is the "Facility Unavailable Exception", rename it as
such.
Flesh out the handler to acknowledge the fact that it can be called for
many reasons, one of which is TM being unavailable.
Use STD_EXCEPTION_COMMON() for the exception body, for some reason we
had it open-coded, I've checked the generated code is identical.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
String instruction emulation would erroneously result in a segfault if
the upper bits of the EA are set and is so high that it fails access
check. Truncate the EA to 32 bits if the process is 32-bit.
Signed-off-by: James Yang <James.Yang@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is duplicated code from math-emu and implements such a small
subset of the FPU (load/stores/fmr) that it's essentially pointless
nowdays.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
(Including 64-bit ones)
This allow SW emulation by the kernel of optional instructions
such as fsqrt which aren't implemented on some processors, and
thus fixes some Fedora 19 issues such as Anaconda since the
compiler is set to generate those by default on 64-bit.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Normally, the kernel emulates a few instructions that are unimplemented
on some processors (e.g. the old dcba instruction), or privileged (e.g.
mfpvr). The emulation of unimplemented instructions is currently not
working on the PowerNV platform. The reason is that on these machines,
unimplemented and illegal instructions cause a hypervisor emulation
assist interrupt, rather than a program interrupt as on older CPUs.
Our vector for the emulation assist interrupt just calls
program_check_exception() directly, without setting the bit in SRR1
that indicates an illegal instruction interrupt. This fixes it by
making the emulation assist interrupt set that bit before calling
program_check_interrupt(). With this, old programs that use no-longer
implemented instructions such as dcba now work again.
CC: <stable@vger.kernel.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If we are emulating an instruction inside an active user transaction that
touches memory, the kernel can't emulate it as it operates in transactional
suspend context. We need to abort these transactions and send them back to
userspace for the hardware to rollback.
We can service these if the user transaction is in suspend mode, since the
kernel will operate in the same suspend context.
This adds a check to all alignment faults and to specific instruction
emulations (only string instructions for now). If the user process is in an
active (non-suspended) transaction, we abort the transaction go back to
userspace allowing the HW to roll back the transaction and tell the user of the
failure. This also adds new tm abort cause codes to report the reason of the
persistent error to the user.
Crappy test case here http://neuling.org/devel/junkcode/aligntm.c
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is the exception hooks for context tracking subsystem, including
data access, program check, single step, instruction breakpoint, machine check,
alignment, fp unavailable, altivec assist, unknown exception, whose handlers
might use RCU.
This patch corresponds to
[PATCH] x86: Exception hooks for userspace RCU extended QS
commit 6ba3c97a38
But after the exception handling moved to generic code, and some changes in
following two commits:
56dd9470d7
context_tracking: Move exception handling to generic code
6c1e0256fa
context_tracking: Restore correct previous context state on exception exit
it is able for exception hooks to use the generic code above instead of a
redundant arch implementation.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
POWER8 allows read and write of the DSCR in userspace. We added
kernel emulation so applications could always use the instructions
regardless of the CPU type.
Unfortunately there are two SPRs for the DSCR and we only added
emulation for the privileged one. Add code to match the non
privileged one.
A simple test was created to verify the fix:
http://ozlabs.org/~anton/junkcode/user_dscr_test.c
Without the patch we get a SIGILL and it passes with the patch.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
lockdep, but it's a mechanical change.
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRJAcuAAoJENkgDmzRrbjxsw0P/3eXb+LddYnx0V0uHYdKpCUf
4vdW7X0fX3Z+aUK69IWRL/6ahoO4TpaHYGHBDjEoivyQ0GDq14X7JNWsYYt3LdMf
3wmDgRc2cn/mZOJbFeVpNV8ox5l/xc0CUvV+iQ8tMjfQItXMXgWUFZKMECsXKSO6
eex3lrw9M2jAX2uL8LQPp9W8xtKu24nSZRC6tH5riE/8fCzi1cZPPAqfxP5c8Lee
ZXtbCRSyAFENZLpKyMe1PC7HvtJyi5NDn9xwOQiXULZV/VOlvP94DGBLIKCM/6dn
4QvZxpG0P0uOlpCgRAVLyh/z7g4XY4VF/fHopLCmEcqLsvgD+V2LQpQ9zWUalLPC
Z+pUpz2vu0gIddPU1nR8R6oGpEdJ8O12aJle62p/RSXWZGx12qUQ+Tamu0tgKcv1
AsiJfbUGNDYfxgU6sHsoQjl2f68LTVckCU1C1LqEbW/S104EIORtGx30CHM4LRiO
32kDC5TtgYDBKQAIqJ4bL48ZMh+9W3uX40p7xzOI5khHQjvswUKa3jcxupU0C1uv
lx8KXo7pn8WT33QGysWC782wJCgJuzSc2vRn+KQoqoynuHGM6agaEtR59gil3QWO
rQEcxH63BBRDgHlg4FM9IkJwwsnC3PWKL8gbX0uAWXAPMbgapJkuuGZAwt0WDGVK
+GszxsFkCjlW0mK0egTb
=tiSY
-----END PGP SIGNATURE-----
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module update from Rusty Russell:
"The sweeping change is to make add_taint() explicitly indicate whether
to disable lockdep, but it's a mechanical change."
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
MODSIGN: Add option to not sign modules during modules_install
MODSIGN: Add -s <signature> option to sign-file
MODSIGN: Specify the hash algorithm on sign-file command line
MODSIGN: Simplify Makefile with a Kconfig helper
module: clean up load_module a little more.
modpost: Ignore ARC specific non-alloc sections
module: constify within_module_*
taint: add explicit flag to show whether lock dep is still OK.
module: printk message when module signature fail taints kernel.
This hooks the new transactional memory code into context switching, FP/VMX/VMX
unavailable and exception return.
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We do lazy FP but not lazy TM (ie. userspace starts with MSR TM=1 FP=0). Hence
if userspace does an FP instruction during a transaction, we'll take an
fp unavailable exception.
This adds functions needed to handle this case. We have to inject the current
FP state into the checkpoint so that the hardware can decide what to do with
the transaction. We can't inject only the FP so we have to do a full treclaim
and recheckpoint to inject just the FP state. This will cause the transaction
to be marked as aborted by the hardware.
This just add the routines needed to do this for FP, VMX and VSX. It doesn't
hook them into the rest of the code yet.
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
These should never happen since we always turn on MSR TM when in userspace. We
don't do lazy TM.
Hence if we hit this, we barf and kill the task as something's gone horribly
wrong.
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds new macros for saving and restoring checkpointed architected state
from and to the thread_struct.
It also adds some debugging macros for when your brain explodes trying to debug
your transactional memory enabled kernel.
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a rewrite so that we don't assume we are using the DABR throughout the
code. We now use the arch_hw_breakpoint to store the breakpoint in a generic
manner in the thread_struct, rather than storing the raw DABR value.
The ptrace GET/SET_DEBUGREG interface currently passes the raw DABR in from
userspace. We keep this functionality, so that future changes (like the POWER8
DAWR), will still fake the DABR to userspace.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The watchdog and FIT code has been #if 0'd for ever, if the CPU takes
an exception to either of those vectors it will jump into the middle
of the PIT or Data TLB code and surely crash.
At least some (all?) 405 cores have both the WDT and FIT
vectors defined, so lets have proper entry points for them.
Tested that the WDT vector works on a 405F6 core.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we update the DSCR either via emulation of mtspr(DSCR) or via
a change to dscr_default in sysfs we don't update thread.dscr.
We will eventually update it at context switch time but there is
a period where thread.dscr is incorrect.
If we fork at this point we will copy the old value of thread.dscr
into the child. To avoid this, always keep thread.dscr in sync with
reality.
This issue was found with the following testcase:
http://ozlabs.org/~anton/junkcode/dscr_inherit_test.c
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> # 3.0+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add thread_struct.trap_nr and use it to store the last exception
the thread experienced. In this patch, we populate the field at
various places where we force_sig_info() to the process.
This is also used in uprobes to determine if the probed instruction
caused an exception.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Alignment was the last user of the ENABLE_INTS macro, which we can
now remove. All non-syscall exceptions now disable interrupts on
entry, they get re-enabled conditionally from C code. Don't
unconditionally re-enable in program check either, check the
original context.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Disintegrate asm/system.h for PowerPC.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cc: linuxppc-dev@lists.ozlabs.org
If we get a floating point, altivec or vsx unavaible interrupt in
kernel, we trigger a kernel error. There is no point preserving
the interrupt state, in fact, that can even make debugging harder
as the processor state might change (we may even preempt) between
taking the exception and landing in a debugger.
So just make those 3 disable interrupts unconditionally.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
v2: On BookE only disable when hitting the kernel unavailable
path, otherwise it will fail to restore softe as
fast_exception_return doesn't do it.
When registered for firmware assisted dump on powerpc, firmware preserves
the registers for the active CPUs during a system crash. This patch reads
the cpu register data stored in Firmware-assisted dump format (except for
crashing cpu) and converts it into elf notes and updates the PT_NOTE program
header accordingly. The exact register state for crashing cpu is saved to
fadump crash info structure in scratch area during crash_fadump() and read
during second kernel boot.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Our die() code was based off a very old x86 version. Update it to
mirror the current x86 code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We have a lot of complicated logic that handles possible recursion between
kdump and a system reset exception. We can solve this in a much simpler
way using the same setjmp/longjmp tricks xmon does.
As a first step, this patch removes the old system reset code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I've been seeing truncated output when people send system reset info
to me. We should see a backtrace for every CPU, but the panic() code
takes the box down before they all make it out to the console. The
panic code runs unlocked so we also see corrupted console output.
If we are going to panic, then delay 1 second before calling into the
panic code. Move oops_exit inside the die lock and put a newline
between oopses for clarity.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
All of DebugException is already protected by CONFIG_PPC_ADV_DEBUG_REGS
there is no need to have another such ifdef inside the function.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If the L1 D-Cache is in write shadow mode the HW will auto-recover the
error. However we might still log the error and cause a machine check
(if L1CSR0[CPE] - Cache error checking enable). We should only treat
the non-write shadow case as non-recoverable.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
giveup_spe() saves the SPE state which is protected by MSR[SPE].
However, modifying SPEFSCR does not trap when MSR[SPE]=0.
And since SPEFSCR is already saved/restored in _switch(),
not all the callers want to save SPEFSCR again.
Thus, saving SPEFSCR should not belong to giveup_spe().
This patch moves SPEFSCR saving to flush_spe_to_thread(),
and cleans up the caller that needs to save SPEFSCR accordingly.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Since printk_ratelimit() shouldn't be used anymore (see comment in
include/linux/printk.h), replace it with printk_ratelimited.
Signed-off-by: Christian Dietrich <christian.dietrich@informatik.uni-erlangen.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The wrong MCSR bit was being used on e500mc. MCSR_BUS_RBERR only exists
on e500v1/v2. Use MCSR_LD on e500mc, and remove all MCSR checking
in fsl_rio_mcheck_exception as we now no longer call that function
if the appropriate bit in MCSR is not set.
If RIO support was enabled at compile-time, but was never probed, just
return from fsl_rio_mcheck_exception rather than dereference a NULL
pointer.
TODO: There is still a remaining, though comparitively minor, issue in
that this recovery mechanism will falsely engage if there's an unrelated
MCSR_LD event at the same time as a RIO error.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add support for machine_check support into machine_check_e500 and
machine_check_e500mc.
Signed-off-by: Shaohui Xie <b21989@freescale.com>
Cc: Li Yang <leoli@freescale.com>
Cc: Roy Zang <tie-fei.zang@freescale.com>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Manual merge of arch/powerpc/kernel/smp.c and add missing scheduler_ipi()
call to arch/powerpc/platforms/cell/interrupt.c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On some arches (x86, sh, arm, unicore, powerpc) the oops message would
print out the last sysfs file accessed.
This was very useful in finding a number of sysfs and driver core bugs
in the 2.5 and early 2.6 development days, but it has been a number of
years since this file has actually helped in debugging anything that
couldn't also be trivially determined from the stack traceback.
So it's time to delete the line. This is good as we need all the space
we can get for oops messages at times on consoles.
Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Adapt new API.
Almost change is trivial. Most important change is the below line
because we plan to change task->cpus_allowed implementation.
- ctx->cpus_allowed = current->cpus_allowed;
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use the new MSR_64BIT in a few places. Some of these are already ifdef'ed
for BOOKE vs BOOKS, but it's still clearer, MSR_SF does not immediately
parse as "MSR bit for 64bit".
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The DSCR (aka Data Stream Control Register) is supported on some
server PowerPC chips and allow some control over the prefetch
of data streams.
This patch allows the value to be specified per thread by emulating
the corresponding mfspr and mtspr instructions. Children of such
threads inherit the value. Other threads use a default value that
can be specified in sysfs - /sys/devices/system/cpu/dscr_default.
If a thread starts with non default value in the sysfs entry,
all children threads inherit this non default value even if
the sysfs value is changed later.
Signed-off-by: Alexey Kardashevskiy <aik@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If a machine check comes from userspace we send a SIGBUS to the task and
fail to printk anything.
If we are taking machine checks due to bad hardware we want to know about
it right away. Furthermore if we don't complain loudly then it will look
a lot like a bug in the userspace application, potentially causing a lot
of confusion.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We are calling debugger_fault_handler twice in machine_check_exception.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We should never force MSR_RI on. If we take a machine check with MSR_RI off
then we have no chance of recovering safely.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The new e5500 core is similar to the e500mc core but adds 64-bit
support. We support running it in 32-bit mode as it is identical to the
e500mc.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
If an alignment interrupt occurs on an instruction that is being
single-stepped, the alignment interrupt handler currently handles
the single-step condition by unconditionally sending a SIGTRAP to
the process. Other synchronous interrupts that result in the
instruction being emulated do likewise.
With hw_breakpoint support, the hw_breakpoint code needs to be able
to intercept these single-step events as well as those where the
instruction executes normally and a trace interrupt happens.
Fix this by making emulate_single_step() use the existing
single_step_exception() function instead of calling _exception()
directly. We then make single_step_exception() use the abstracted
clear_single_step() rather than clearing bits in the MSR image
directly so that emulate_single_step() will continue to work
correctly on Book 3E processors.
Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (92 commits)
powerpc: Remove unused 'protect4gb' boot parameter
powerpc: Build-in e1000e for pseries & ppc64_defconfig
powerpc/pseries: Make request_ras_irqs() available to other pseries code
powerpc/numa: Use ibm,architecture-vec-5 to detect form 1 affinity
powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to enable zone reclaim
powerpc: Use smt_snooze_delay=-1 to always busy loop
powerpc: Remove check of ibm,smt-snooze-delay OF property
powerpc/kdump: Fix race in kdump shutdown
powerpc/kexec: Fix race in kexec shutdown
powerpc/kexec: Speedup kexec hash PTE tear down
powerpc/pseries: Add hcall to read 4 ptes at a time in real mode
powerpc: Use more accurate limit for first segment memory allocations
powerpc/kdump: Use chip->shutdown to disable IRQs
powerpc/kdump: CPUs assume the context of the oopsing CPU
powerpc/crashdump: Do not fail on NULL pointer dereferencing
powerpc/eeh: Fix oops when probing in early boot
powerpc/pci: Check devices status property when scanning OF tree
powerpc/vio: Switch VIO Bus PM to use generic helpers
powerpc: Avoid bad relocations in iSeries code
powerpc: Use common cpu_die (fixes SMP+SUSPEND build)
...
Most of the MSCR bit assigments are different in e500mc versus
e500, and they are now write-one-to-clear.
Some e500mc machine check conditions are made recoverable (as long as
they aren't stuck on), most notably L1 instruction cache parity errors.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The only way the debugger can handle a trap in inside rcu_lock,
notify_die, or atomic_notifier_call_chain without a recursive fault is
to allow the kernel debugger to handle the exception first in
program_check_exception().
The other change here is to make sure that kgdb_handle_exception() is
called with correct parameters when catching an oops, because kdb
needs to know if the entry was an oops, single step, or breakpoint
exception.
[benh@kernel.crashing.org: move debugger_bpt instead of #ifdef]
CC: Paul Mackerras <paulus@samba.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The 47x core's MCSR varies from 44x, so it needs it's own machine check
handler.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
die.lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/booke: Add support for advanced debug registers
From: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Based on patches originally written by Torez Smith.
This patch defines context switch and trap related functionality
for BookE specific Debug Registers. It adds support to ptrace()
for setting and getting BookE related Debug Registers
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Torez Smith <lnxtorez@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <dwg@au1.ibm.com>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Sergio Durigan Junior <sergiodj@br.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@br.ibm.com>
Cc: linuxppc-dev list <Linuxppc-dev@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/booke: Introduce new CONFIG options for advanced debug registers
From: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Introduce new config options to simplify the ifdefs pertaining to the
advanced debug registers for booke and 40x processors:
CONFIG_PPC_ADV_DEBUG_REGS - boolean: true for dac-based processors
CONFIG_PPC_ADV_DEBUG_IACS - number of IAC registers
CONFIG_PPC_ADV_DEBUG_DACS - number of DAC registers
CONFIG_PPC_ADV_DEBUG_DVCS - number of DVC registers
CONFIG_PPC_ADV_DEBUG_DAC_RANGE - DAC ranges supported
Beginning conservatively, since I only have the facilities to test 440
hardware. I believe all 40x and booke platforms support at least 2 IAC
and 2 DAC registers. For 440, 4 IAC and 2 DVC registers are enabled, as
well as the DAC ranges.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Acked-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
With NO_HZ it is useful to know how often the decrementer is going off. The
patch below adds an entry for it and also adds it into the /proc/stat
summaries.
While here, I added performance monitoring and machine check exceptions.
I found it useful to keep an eye on the PMU exception rate
when using the perf tool. Since it's possible to take a completely
handled machine check on a System p box it also sounds like a good idea to
keep a machine check summary.
The event naming matches x86 to keep gratuitous differences to a minimum.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add printout of last accessed sysfs file, added to x86 in
ae87221d3c (sysfs: crash debugging)
Also add the notify_die hook that allows us to print out the ftrace
buffer on oops. This is useful in conjunction with ftrace function_graph:
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=128 NUMA pSeries
last sysfs file: /sys/class/net/tunl0/type
Dumping ftrace buffer:
...
0) | .sysrq_handle_crash() {
0) 0.476 us | .hash_page();
0) 0.488 us | .xmon_fault_handler();
0) | .bad_page_fault() {
0) | .search_exception_tables() {
0) 0.590 us | .search_module_extables();
0) 2.546 us | }
0) | .printk() {
0) | .vprintk() {
0) 0.488 us | ._raw_spin_lock();
0) 0.572 us | .emit_log_char();
Showing the function graph of a sysrq-c crash.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Using perf to trace L1 dcache misses and dumping data addresses I found a few
variables taking a lot of misses. Since they are almost never written, they
should go into the __read_mostly section.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The code under "if (is_global_init())" is bogus, and is_global_init()
itself is not right in mt case.
Contrary to what the comment says, nowadays force_sig_info() does kill
init even if the handler is SIG_DFL. Note that force_sig_info() clears
SIGNAL_UNKILLABLE exactly for this case.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
perf_event wants a separate event for alignment and emulation faults,
so create another emulation event. This will make it easy to hook in
perf_event at one spot.
We pass in regs which will be required for these events.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Reworked by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds block-step support on powerpc, including a PTRACE_SINGLEBLOCK
request for ptrace.
The BookE implementation is tweaked to fire a single step after a
block step in order to mimmic the server behaviour.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If CONFIG_PPC_EMULATED_STATS is enabled, make available counters for the
various classes of emulated instructions under
/sys/kernel/debug/powerpc/emulated_instructions/ (assumed debugfs is mounted on
/sys/kernel/debug). Optionally (controlled by
/sys/kernel/debug/powerpc/emulated_instructions/do_warn), rate-limited warnings
can be printed to the console when instructions are emulated.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The e500mc supports the new msgsnd/doorbell mechanisms that were added in
the Power ISA 2.05 architecture. We use the normal level doorbell for
doing SMP IPIs at this point.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Create a new header that becomes a single location for defining PowerPC
opcodes used by code that is either generationg instructions
at runtime (fixups, debug, etc.), emulating instructions, or just
compiling instructions old assemblers don't know about.
We currently don't handle the floating point emulation or alignment decode
as both are better handled by the specific decode support they already
have.
Added support for the new dcbzl, dcbal, msgsnd, tlbilx, & wait instructions
since older assemblers don't know about them.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch add the handlers of SPE/EFP exceptions.
The code is used to emulate float point arithmetic,
when MSR(SPE) is enabled and receive EFP data interrupt or EFP round interrupt.
This patch has no conflict with or dependence on FP math-emu.
The code has been tested by TestFloat.
Now the code doesn't support SPE/EFP instructions emulation
(it won't be called when receive program interrupt),
but it could be easily added.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This patch also includes the required removal of (unused) inclusion of
<asm/a.out.h> <linux/a.out.h>'s in the arch/ code for these
architectures.
[dwmw2: updated for 2.6.27-rc]
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>