This changes the way that the exception prologs transfer control to
the handlers in 64-bit kernels with the aim of making it possible to
have the prologs separate from the main body of the kernel. Now,
instead of computing the address of the handler by taking the top
32 bits of the paca address (to get the 0xc0000000........ part) and
ORing in something in the bottom 16 bits, we get the base address of
the kernel by doing a load from the paca and add an offset.
This also replaces an mfmsr and an ori to compute the MSR value for
the handler with a load from the paca. That makes it unnecessary to
have a separate version of EXCEPTION_PROLOG_PSERIES that forces 64-bit
mode.
We can no longer use a direct branches in the exception prolog code,
which means that the SLB miss handlers can't branch directly to
.slb_miss_realmode any more. Instead we have to compute the address
and do an indirect branch. This is conditional on CONFIG_RELOCATABLE;
for non-relocatable kernels we use a direct branch as before. (A later
change will allow CONFIG_RELOCATABLE to be set on 64-bit powerpc.)
Since the secondary CPUs on pSeries start execution in the first 0x100
bytes of real memory and then have to get to wherever the kernel is,
we can't use a direct branch to get there. Instead this changes
__secondary_hold_spinloop from a flag to a function pointer. When it
is set to a non-NULL value, the secondary CPUs jump to the function
pointed to by that value.
Finally this eliminates one code difference between 32-bit and 64-bit
by making __secondary_hold be the text address of the secondary CPU
spinloop rather than a function descriptor for it.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This rearranges head_64.S so that we have all the first-level exception
prologs together starting at 0x100, followed by all the second-level
handlers that are invoked from the first-level prologs, followed by
other code. This doesn't make any functional change but will make
following changes for relocatable kernel support easier.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Kdump kernel needs to use only those memory regions that it is allowed
to use (crashkernel, rtas, tce, etc.). Each of these regions have
their own sizes and are currently added under 'linux,usable-memory'
property under each memory@xxx node of the device tree.
The ibm,dynamic-memory property of ibm,dynamic-reconfiguration-memory
node (on POWER6) now stores in it the representation for most of the
logical memory blocks with the size of each memory block being a
constant (lmb_size). If one or more or part of the above mentioned
regions lie under one of the lmb from ibm,dynamic-memory property,
there is a need to identify those regions within the given lmb.
This makes the kernel recognize a new 'linux,drconf-usable-memory'
property added by kexec-tools. Each entry in this property is of the
form of a count followed by that many (base, size) pairs for the above
mentioned regions. The number of cells in the count value is given by
the #size-cells property of the root node.
Signed-off-by: Chandru Siddalingappa <chandru@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It was introduced by "vsprintf: add support for '%pS' and '%pF' pointer
formats" in commit 0fe1ef24f7. However,
the current way its coded doesn't work on parisc64. For two reasons: 1)
parisc isn't in the #ifdef and 2) parisc has a different format for
function descriptors
Make dereference_function_descriptor() more accommodating by allowing
architecture overrides. I put the three overrides (for parisc64, ppc64
and ia64) in arch/kernel/module.c because that's where the kernel
internal linker which knows how to deal with function descriptors sits.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The calculation to get TI_CPU based off of SPRG3 was just plain wrong,
meaning that we were getting garbage for the CPU number on 6xx/G3/G4
based SMP boxes in this code.
Just offset off the stack pointer (to get to thread_info) like all the
other references to TI_CPU do.
This was pointed out by Chen Gong <G.Chen@freescale.com>
[paulus@samba.org - use rlwinm r12,r11,... instead of
rlwinm r12,r1,...; tophys()]
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This bug is causing random crashes
(http://bugzilla.kernel.org/show_bug.cgi?id=11414).
-fno-omit-frame-pointer is only needed on powerpc when -pg is also
supplied, and there is a gcc bug that causes incorrect code generation
on 32-bit powerpc when -fno-omit-frame-pointer is used---it uses stack
locations below the stack pointer, which is not allowed by the ABI
because those locations can and sometimes do get corrupted by an
interrupt.
This ensures that CONFIG_FRAME_POINTER is only selected by ftrace.
When CONFIG_FTRACE is enabled we also pass -mno-sched-epilog to work
around the gcc codegen bug.
Patch based on work by:
Andreas Schwab <schwab@suse.de>
Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This makes core_kernel_text() (and therefore kernel_text_address())
return the correct result. Currently all the __devinit routines (at
least) will not be considered to be kernel text.
This is just a quick fix for 2.6.27 - hopefully we will be able to fix
this better in 2.6.28.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes an uninitialised variable in the VSX alignment code. It can
cause warnings from GCC (noticed with gcc-4.1.1). Gcc is actually
correct in this instance, and this bug could cause the alignment
interrupt handler to send a SIGSEGV to the process on a legitimate
access.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The file arch/powerpc/kernel/sysfs.c is currently only compiled for
64-bit kernels. It contain code to register CPU sysdevs in sysfs and
add various properties such as cache topology and raw access by root
to performance monitor counters (PMCs). A lot of that can be re-used
as is on 32-bits.
This makes the file be built for both, with appropriate ifdef'ing
for the few bits that are really 64-bit specific, and adds some
support for the raw PMCs for 75x and 74xx processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When removing a directory, the sysfs core takes care of removing files
in the directory (see sysfs_remove_dir()). So when we are about to
delete a kobject (and thus cause its sysfs directory to be removed),
we don't have to explicitly remove the files attached to it, although
it's harmless to do so.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
__FUNCTION__ is gcc-specific, use __func__
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
[akpm@linux-foundation.org: exclude prom_init.c]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There is a small passage of code in ret_from_except_lite which is
only required on iSeries. For a multi-platform kernel on non-iSeries
machines this means we end up executing ~15 nops in ret_from_except_lite.
It would be nicer if non-iSeries could skip the code entirely, and on
iSeries we can jump out of line to execute the code.
I have no performance numbers to justify this, other than the assertion
that executing 15 nops takes longer than executing 0.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When CMO is enabled and booted on a non CMO system and the VIO
device's probe function fails, an oops can result since
vio_cmo_bus_remove is called when it should not. This fixes it by
avoiding the vio_cmo_bus_remove call on platforms that don't implement
CMO.
cpu 0x0: Vector: 300 (Data Access) at [c00000000e13b3d0]
pc: c000000000020d34: .vio_cmo_bus_remove+0xc0/0x1f4
lr: c000000000020ca4: .vio_cmo_bus_remove+0x30/0x1f4
sp: c00000000e13b650
msr: 8000000000009032
dar: 0
dsisr: 40000000
current = 0xc00000000e0566c0
paca = 0xc0000000006f9b80
pid = 2428, comm = modprobe
enter ? for help
[c00000000e13b6e0] c000000000021d94 .vio_bus_probe+0x2f8/0x33c
[c00000000e13b7a0] c00000000029fc88 .driver_probe_device+0x13c/0x200
[c00000000e13b830] c00000000029fdac .__driver_attach+0x60/0xa4
[c00000000e13b8c0] c00000000029f050 .bus_for_each_dev+0x80/0xd8
[c00000000e13b980] c00000000029f9ec .driver_attach+0x28/0x40
[c00000000e13ba00] c00000000029f630 .bus_add_driver+0xd4/0x284
[c00000000e13baa0] c0000000002a01bc .driver_register+0xc4/0x198
[c00000000e13bb50] c00000000002168c .vio_register_driver+0x40/0x5c
[c00000000e13bbe0] d0000000003b3f1c .ibmvfc_module_init+0x70/0x109c [ibmvfc]
[c00000000e13bc70] c0000000000acf08 .sys_init_module+0x184c/0x1a10
[c00000000e13be30] c000000000008748 syscall_exit+0x0/0x40
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Recent of_platform changes made of_bus_type_init() overwrite the bus
type's .dev_attrs list, meaning that the "name" attribute that ibmebus
devices previously had is no longer present. This is a user-visible
regression which breaks the userspace eHCA support, since the eHCA
userspace driver relies on the name attribute to check for valid
adapters.
This fixes it by providing the "name" attribute in the generic OF
device code instead. Tested on POWER.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A change to __ioremap() broke reading /dev/oldmem because we're no
longer able to ioremap pfn 0 (d177c207, "[PATCH] powerpc: IOMMU: don't
ioremap null addresses").
We actually don't need to ioremap for anything that's part of the linear
mapping, so just read it directly.
Also make sure we're only reading one page or less at a time.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use the generic compat_sys_old_readdir instead of the powerpc one which
is almost the same except for the almost complete lack of error
handling.
Note that we can't just use SYSCALL() in systbl.h because the native
syscall is named old_readdir, not sys_old_readdir.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit 163f6876f5 missed one, resulting in
the following compile error:
AS arch/powerpc/kernel/misc_32.o
arch/powerpc/kernel/misc_32.S: Assembler messages:
arch/powerpc/kernel/misc_32.S:902: Error: unsupported relocation against KEXEC_CONTROL_CODE_SIZE
make[2]: *** [arch/powerpc/kernel/misc_32.o] Error 1
make[1]: *** [arch/powerpc/kernel] Error 2
make: *** [vmlinux] Error 2
I grepped arch/ and found no further instances.
Signed-off-by: Paul Collins <paul@ondioline.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Doing some various "make randconfig", I came across an error when
CONFIG_BUG was not set:
arch/powerpc/kernel/module.c: In function 'module_find_bug':
arch/powerpc/kernel/module.c:111: error: increment of pointer to unknown structure
arch/powerpc/kernel/module.c:111: error: arithmetic on pointer to an incomplete type
arch/powerpc/kernel/module.c:112: error: dereferencing pointer to incomplete type
Looking further into this, I found that module_find_bug, defined in
powerpc arch code, is not called anywhere, so this just removes it.
There is a static module_find_bug in lib/bug.c but that is a separate issue.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a field in lparcfg output to indicate whether the kernel is
running on a dedicated or shared memory lpar. Added fields to show
the paging space pool IDs and the CMO page size.
Submitted-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The intent of "flush_tlbs" is to invalidate all TLB entries by doing a
TLB invalidate instruction for all pages in the address range 0 to
0x00400000. A loop counter is set up at the high value and
decremented by page size. However, the loop is only done once as the
sense of the conditional branch at the loop end does not match the
setup/decrement. This fixes it to do the whole range by correcting
the branch condition.
Signed-off-by: Rocky Craig <rocky.craig@hp.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control
page is used for not only code on some platform. For example in kexec
jump, it is used for data and stack too.
[akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we have an ISA memory hole (ie, a PCI window that allows us to
generate PCI memory cycles at low PCI address) mixed with other
resources using a different CPU <=> PCI mapping, we must not keep
the ISA hole in the bridge resource list.
If we do, things might start trying to allocate device resources
in there and will get the PCI addresses wrong.
This fixes it by arranging to remove the ISA memory hole resource in
this case. This fixes various cases of PCMCIA breakage on PowerBooks
using the MPC106 "grackle" bridge.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The kernel copy of the rtas args struct contains the return
value(s) for the specified rtas call. These are copied back
to user space with the assumption that every value has been
set by the rtas call, which turns out to be not always true.
Thus userspace can see random values and think the call failed
when in fact it succeeded, but for some reason didn't set one
of the return values.
This fixes the problem by zeroing out the return value fields
of the rtas args struct before processing the rtas call.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that arch/ppc is gone and CONFIG_PPC_MERGE is always set, remove
the dead code associated with !CONFIG_PPC_MERGE from arch/powerpc
and include/asm-powerpc.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In PTRACE_GET/SETVSRREGS, we should be using the thread we are
ptracing rather than current.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix cut-and-paste error in the size setting for ptrace buffers for VSX.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix bug where PTRACE_GET/SETVSRREGS are not connected for 32 bit processes.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The code to handle writes to /proc/ppc64/lparcfg incorrectly
assumes that the return code from the helper routines to update
processor or memory entitlement return a hcall return value. It
then assumes any non-hcall return value is bad and sets the return
code for the write to be -EIO.
The update_[mp]pp routines can return values other than a hcall
return value. This patch removes the automatic setting of any
return code that is not an hcall return value from these routines
to -EIO.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This removes the non-working code in legacy_serial that tried to handle
the powermac SCC ports, and instead add a (now working) function to the
powermac platform code to find the default serial console if any.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Collect cache information from the OF device tree and display it in
the cpu hierarchy in sysfs. This is intended to be compatible at the
userspace level with x86's implementation[1], hence some of the funny
attribute names. The arrangement of cache info is not immediately
intuitive, but (again) it's for compatibility's sake.
The cache attributes exposed are:
type (Data, Instruction, or Unified)
level (1, 2, 3...)
size
coherency_line_size
number_of_sets
ways_of_associativity
All of these can be derived on platforms that follow the OF PowerPC
Processor binding. The code "publishes" only those attributes for
which it is able to determine values; attributes for values which
cannot be determined are not created at all.
[1] arch/x86/kernel/cpu/intel_cacheinfo.c
BenH: Turned some printk's into pr_debug, added better NULL checking
in a couple of places.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Existing Open Firmware practice is to report each processor core as a
separate node in the device tree. Report the value of the "reg" OF
property corresponding to a logical CPU's device node as the core_id
attribute in /sys/devices/system/cpu/cpu*/topology/core_id.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Implement the notion of "core siblings" for powerpc. This makes
/sys/devices/system/cpu/cpu*/topology/core_siblings present sensible
values, indicating online CPUs which share an L2 cache.
BenH: Made cpu_to_l2cache() use of_find_node_by_phandle() instead
of IBM-specific open coded search
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arch/powerpc/kernel/vio.c:533: error: too few arguments to function 'dma_mapping_error'
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds TIF_NOTIFY_RESUME support for powerpc. When set,
we call tracehook_notify_resume() on the way to user mode.
This overloads do_signal() to do the work, but changes its
arguments to it has the TIF_* bits handy in a register and
drops the useless first argument that was always zero.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This changes powerpc syscall tracing to use the new tracehook.h entry
points. There is no change, only cleanup.
In addition, the assembly changes allow do_syscall_trace_enter() to
abort the syscall without losing the information about the original
r0 value.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This makes the powerpc signal handling code call tracehook_signal_handler()
after a handler is set up. This means that using PTRACE_SINGLESTEP to
enter a signal handler will report to ptrace on the first instruction of
the handler, instead of the second. This is consistent with what x86 and
other machines do, and what users and debuggers want.
BenH: Fixed up the test for the trap value.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Rather doing one initialization pass over all the per-cpu
cpu_sibling_maps at boot, update the maps at cpu online/offline time.
This is a behavior change -- the thread_siblings attribute now
reflects only online siblings, whereas it would display offline
siblings before. The new behavior matches that of x86, and is
arguably more useful.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It is called only in cpu online paths.
(caught by CONFIG_DEBUG_SECTION_MISMATCH=y)
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This piece of code is broken for >2 threads, and possibly in some
other subtle ways (such as comparing a value obtained from an
"ibm,ppc-interrupt-server#s" property to a value obtained from a
"reg" property) and doesn't seem to have any useful purpose in the
first place other than a dubious warning in case NR_CPUS is too
small, which probably isn't the right place to do so.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arch/powerpc/kernel/vio.c:1034: warning: function declaration isnât a prototype
arch/powerpc/kernel/vio.c:1035: warning: function declaration isnât a prototype
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* CONFIG_BOOKE is selected by CONFIG_44x so we dont need both
* Fixed a few comments
* Go back to only using DBCR0_IDM to determine if we are using
debug resources.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Removed duplicated include file <linux/module.h> in
arch/powerpc/kernel/stacktrace.c.
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.
Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.c
This is flag day, yes.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch provides an enhancement to kexec/kdump. It implements the
following features:
- Backup/restore memory used by the original kernel before/after
kexec.
- Save/restore CPU state before/after kexec.
The features of this patch can be used as a general method to call program in
physical mode (paging turning off). This can be used to call BIOS code under
Linux.
kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:
source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10
Usage example of calling some physical mode code and return:
1. Compile and install patched kernel with following options selected:
CONFIG_X86_32=y
CONFIG_KEXEC=y
CONFIG_PM=y
CONFIG_KEXEC_JUMP=y
2. Build patched kexec-tool or download the pre-built one.
3. Build some physical mode executable named such as "phy_mode"
4. Boot kernel compiled in step 1.
5. Load physical mode executable with /sbin/kexec. The shell command
line can be as follow:
/sbin/kexec --load-preserve-context --args-none phy_mode
6. Call physical mode executable with following shell command line:
/sbin/kexec -e
Implementation point:
To support jumping without reserving memory. One shadow backup page (source
page) is allocated for each page used by kexeced code image (destination
page). When do kexec_load, the image of kexeced code is loaded into source
pages, and before executing, the destination pages and the source pages are
swapped, so the contents of destination pages are backupped. Before jumping
to the kexeced code image and after jumping back to the original kernel, the
destination pages and the source pages are swapped too.
C ABI (calling convention) is used as communication protocol between
kernel and called code.
A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.
Now, only the i386 architecture is supported.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 9115d13453 ("powerpc: Enable
AT_BASE_PLATFORM aux vector") broke boot on 32-bit powerpc systems; we
have to use PTRRELOC to initialize powerpc_base_platform this early in
boot.
Bug reported by Jon Smirl.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (34 commits)
powerpc: Wireup new syscalls
Move update_mmu_cache() declaration from tlbflush.h to pgtable.h
powerpc/pseries: Remove kmalloc call in handling writes to lparcfg
powerpc/pseries: Update arch vector to indicate support for CMO
ibmvfc: Add support for collaborative memory overcommit
ibmvscsi: driver enablement for CMO
ibmveth: enable driver for CMO
ibmveth: Automatically enable larger rx buffer pools for larger mtu
powerpc/pseries: Verify CMO memory entitlement updates with virtual I/O
powerpc/pseries: vio bus support for CMO
powerpc/pseries: iommu enablement for CMO
powerpc/pseries: Add CMO paging statistics
powerpc/pseries: Add collaborative memory manager
powerpc/pseries: Utilities to set firmware page state
powerpc/pseries: Enable CMO feature during platform setup
powerpc/pseries: Split retrieval of processor entitlement data into a helper routine
powerpc/pseries: Add memory entitlement capabilities to /proc/ppc64/lparcfg
powerpc/pseries: Split processor entitlement retrieval and gathering to helper routines
powerpc/pseries: Remove extraneous error reporting for hcall failures in lparcfg
powerpc: Fix compile error with binutils 2.15
...
Fixed up conflict in arch/powerpc/platforms/52xx/Kconfig manually.
Currently list of kretprobe instances are stored in kretprobe object (as
used_instances,free_instances) and in kretprobe hash table. We have one
global kretprobe lock to serialise the access to these lists. This causes
only one kretprobe handler to execute at a time. Hence affects system
performance, particularly on SMP systems and when return probe is set on
lot of functions (like on all systemcalls).
Solution proposed here gives fine-grain locks that performs better on SMP
system compared to present kretprobe implementation.
Solution:
1) Instead of having one global lock to protect kretprobe instances
present in kretprobe object and kretprobe hash table. We will have
two locks, one lock for protecting kretprobe hash table and another
lock for kretporbe object.
2) We hold lock present in kretprobe object while we modify kretprobe
instance in kretprobe object and we hold per-hash-list lock while
modifying kretprobe instances present in that hash list. To prevent
deadlock, we never grab a per-hash-list lock while holding a kretprobe
lock.
3) We can remove used_instances from struct kretprobe, as we can
track used instances of kretprobe instances using kretprobe hash
table.
Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
with return probes set on all systemcalls looks like this.
cacheline non-cacheline Un-patched kernel
aligned patch aligned patch
===============================================================================
real 9m46.784s 9m54.412s 10m2.450s
user 40m5.715s 40m7.142s 40m4.273s
sys 2m57.754s 2m58.583s 3m17.430s
===========================================================
Time duration for kernel compilation ("make -j 8) on the same system, when
kernel is not probed.
=========================
real 9m26.389s
user 40m8.775s
sys 2m7.283s
=========================
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are only 4 valid name=value pairs for writes to
/proc/ppc64/lparcfg. Current code allocates a buffer to copy
this information in from the user. Since the longest name=value
pair will easily fit into a buffer of 64 characters, simply
put the buffer on the stack instead of allocating the buffer.
Signed-off-by: Nathan Fotenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>