linux/arch/powerpc/lib/Makefile

#
# Makefile for ppc-specific library files..
#

subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror

ccflags-$(CONFIG_PPC64)	:= -mno-minimal-toc

CFLAGS_REMOVE_code-patching.o = -pg
CFLAGS_REMOVE_feature-fixups.o = -pg

obj-y			:= string.o alloc.o \
			   checksum_$(CONFIG_WORD_SIZE).o crtsavres.o
obj-$(CONFIG_PPC32)	+= div64.o copy_32.o
obj-$(CONFIG_HAS_IOMEM)	+= devres.o

obj-$(CONFIG_PPC64)	+= copypage_64.o copyuser_64.o \
			   memcpy_64.o usercopy_64.o mem_64.o string.o \
			   checksum_wrappers_64.o hweight_64.o \
			   copyuser_power7.o string_64.o copypage_power7.o
obj-$(CONFIG_XMON)	+= sstep.o ldstfp.o
obj-$(CONFIG_KPROBES)	+= sstep.o ldstfp.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= sstep.o ldstfp.o

ifeq ($(CONFIG_PPC64),y)
obj-$(CONFIG_SMP)	+= locks.o
obj-$(CONFIG_ALTIVEC)	+= vmx-helper.o
endif

obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o

obj-y			+= code-patching.o
obj-y			+= feature-fixups.o
obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o
powerpc: Merge enough to start building in arch/powerpc. This creates the directory structure under arch/powerpc and a bunch of Kconfig files. It does a first-cut merge of arch/powerpc/mm, arch/powerpc/lib and arch/powerpc/platforms/powermac. This is enough to build a 32-bit powermac kernel with ARCH=powerpc. For now we are getting some unmerged files from arch/ppc/kernel and arch/ppc/syslib, or arch/ppc64/kernel. This makes some minor changes to files in those directories and files outside arch/powerpc. The boot directory is still not merged. That's going to be interesting. Signed-off-by: Paul Mackerras <paulus@samba.org> 2005-09-26 06:04:21 +00:00			`#`
			`# Makefile for ppc-specific library files..`
			`#`

powerpc: Add configurable -Werror for arch/powerpc Add the option to build the code under arch/powerpc with -Werror. The intention is to make it harder for people to inadvertantly introduce warnings in the arch/powerpc code. It needs to be configurable so that if a warning is introduced, people can easily work around it while it's being fixed. The option is a negative, ie. don't enable -Werror, so that it will be turned on for allyes and allmodconfig builds. The default is n, in the hope that developers will build with -Werror, that will probably lead to some build breaks, I am prepared to be flamed. It's not enabled for math-emu, which is a steaming pile of warnings. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 2009-06-09 20:48:51 +00:00			`subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror`

powerpc/Makefiles: Change to new flag variables Replace EXTRA_CFLAGS with ccflags-y and EXTRA_AFLAGS with asflags-y. Signed-off-by: matt mooney <mfm@muteddisk.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 2010-09-22 20:51:09 +00:00			`ccflags-$(CONFIG_PPC64) := -mno-minimal-toc`
[POWERPC] Optimise some TOC usage Micro-optimisation - add no-minimal-toc to some more arch/powerpc Makefiles. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org> 2006-06-10 10:23:54 +00:00
powerpc/ppc32: static ftrace fixes for PPC32 Impact: fix for PowerPC 32 code There were some early init code that was not safe for static ftrace to boot on my PowerBook. This code must only use relative addressing, and static mcount performs a compare of the ftrace_trace_function pointer, and gets that with an absolute address. In the early init boot up code, this will cause a fault. This patch removes tracing from the files containing the offending functions. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> 2008-11-26 20:54:46 +00:00			`CFLAGS_REMOVE_code-patching.o = -pg`
			`CFLAGS_REMOVE_feature-fixups.o = -pg`

[POWERPC] Create and use CONFIG_WORD_SIZE Linus made this suggestion for the x86 merge and this starts the process for powerpc. We assume that CONFIG_PPC64 implies CONFIG_PPC_MERGE and CONFIG_PPC_STD_MMU_32 implies CONFIG_PPC_STD_MMU. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org> 2007-09-21 00:16:20 +00:00			`obj-y := string.o alloc.o \`
powerpc: Fix module building for gcc 4.5 and 64 bit Gcc 4.5 is now generating out of line register save and restore in the function prefix and postfix when we use -Os. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 2010-06-29 20:08:42 +00:00			`checksum_$(CONFIG_WORD_SIZE).o crtsavres.o`
			`obj-$(CONFIG_PPC32) += div64.o copy_32.o`
[POWERPC] ppc: More compile fixes This fixes a few more miscellaneous compile problems with ARCH=ppc. 1. Don't compile devres.c on ARCH=ppc, it doesn't have ioremap_flags. 2. Include <asm/irq.h> in setup.c for the __DO_IRQ_CANON definition. 3. Include <linux/proc_fs.h> in residual.c for the definition of create_proc_read_entry. 4. Fix xchg_ptr to be a static inline to eliminate a compiler warning. Signed-off-by: Paul Mackerras <paulus@samba.org> 2008-05-12 12:57:51 +00:00			`obj-$(CONFIG_HAS_IOMEM) += devres.o`
powerpc: Get 64-bit configs to compile with ARCH=powerpc This is a bunch of mostly small fixes that are needed to get ARCH=powerpc to compile for 64-bit. This adds setup_64.c from arch/ppc64/kernel/setup.c and locks.c from arch/ppc64/lib/locks.c. Signed-off-by: Paul Mackerras <paulus@samba.org> 2005-10-10 12:50:37 +00:00
[POWERPC] Create and use CONFIG_WORD_SIZE Linus made this suggestion for the x86 merge and this starts the process for powerpc. We assume that CONFIG_PPC64 implies CONFIG_PPC_MERGE and CONFIG_PPC_STD_MMU_32 implies CONFIG_PPC_STD_MMU. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org> 2007-09-21 00:16:20 +00:00			`obj-$(CONFIG_PPC64) += copypage_64.o copyuser_64.o \`
powerpc: Optimise 64bit csum_partial_copy_generic and add csum_and_copy_from_user We use the same core loop as the new csum_partial, adding in the stores and exception handling code. To keep things simple we do all the exception fixup in csum_and_copy_from_user. This wrapper function is modelled on the generic checksum code and is careful to always calculate a complete checksum even if we only copied part of the data to userspace. To test this I forced checksumming on over loopback and ran socklib (a simple TCP benchmark). On a POWER6 575 throughput improved by 19% with this patch. If I forced both the sender and receiver onto the same cpu (with the hope of shifting the benchmark from being cache bandwidth limited to cpu limited), adding this patch improved performance by 55% Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 2010-08-02 20:09:52 +00:00			`memcpy_64.o usercopy_64.o mem_64.o string.o \`
powerpc: POWER7 optimised copy_to_user/copy_from_user using VMX Implement a POWER7 optimised copy_to_user/copy_from_user using VMX. For large aligned copies this new loop is over 10% faster, and for large unaligned copies it is over 200% faster. If we take a fault we fall back to the old version, this keeps things relatively simple and easy to verify. On POWER7 unaligned stores rarely slow down - they only flush when a store crosses a 4KB page boundary. Furthermore this flush is handled completely in hardware and should be 20-30 cycles. Unaligned loads on the other hand flush much more often - whenever crossing a 128 byte cache line, or a 32 byte sector if either sector is an L1 miss. Considering this information we really want to get the loads aligned and not worry about the alignment of the stores. Microbenchmarks confirm that this approach is much faster than the current unaligned copy loop that uses shifts and rotates to ensure both loads and stores are aligned. We also want to try and do the stores in cacheline aligned, cacheline sized chunks. If the store queue is unable to merge an entire cacheline of stores then the L2 cache will have to do a read/modify/write. Even worse, we will serialise this with the stores in the next iteration of the copy loop since both iterations hit the same cacheline. Based on this, the new loop does the following things: 1 - 127 bytes Get the source 8 byte aligned and use 8 byte loads and stores. Pretty boring and similar to how the current loop works. 128 - 4095 bytes Get the source 8 byte aligned and use 8 byte loads and stores, 1 cacheline at a time. We aren't doing the stores in cacheline aligned chunks so we will potentially serialise once per cacheline. Even so it is much better than the loop we have today. 4096 - bytes If both source and destination have the same alignment get them both 16 byte aligned, then get the destination cacheline aligned. Do cacheline sized loads and stores using VMX. If source and destination do not have the same alignment, we get the destination cacheline aligned, and use permute to do aligned loads. In both cases the VMX loop should be optimal - we always do aligned loads and stores and are always doing stores in cacheline aligned, cacheline sized chunks. To be able to use VMX we must be careful about interrupts and sleeping. We don't use the VMX loop when in an interrupt (which should be rare anyway) and we wrap the VMX loop in disable/enable_pagefault and fall back to the existing copy_tofrom_user loop if we do need to sleep. The VMX breakpoint of 4096 bytes was chosen using this microbenchmark: http://ozlabs.org/~anton/junkcode/copy_to_user.c Since we are using VMX and there is a cost to saving and restoring the user VMX state there are two broad cases we need to benchmark: - Best case - userspace never uses VMX - Worst case - userspace always uses VMX In reality a userspace process will sit somewhere between these two extremes. Since we need to test both aligned and unaligned copies we end up with 4 combinations. The point at which the VMX loop begins to win is: 0% VMX aligned 2048 bytes unaligned 2048 bytes 100% VMX aligned 16384 bytes unaligned 8192 bytes Considering this is a microbenchmark, the data is hot in cache and the VMX loop has better store queue merging properties we set the breakpoint to 4096 bytes, a little below the unaligned breakpoints. Some future optimisations we can look at: - Looking at the perf data, a significant part of the cost when a task is always using VMX is the extra exception we take to restore the VMX state. As such we should do something similar to the x86 optimisation that restores FPU state for heavy users. ie: /* * If the task has used fpu the last 5 timeslices, just do a full * restore of the math state immediately to avoid the trap; the * chances of needing FPU soon are obviously high now / preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5; and / * fpu_counter contains the number of consecutive context switches * that the FPU is used. If this is over a threshold, the lazy fpu * saving becomes unlazy to save the trap. This is an unsigned char * so that after 256 times the counter wraps and the behavior turns * lazy again; this to deal with bursty apps that only use FPU for * a short time */ - We could create a paca bit to mirror the VMX enabled MSR bit and check that first, avoiding multiple calls to calling enable_kernel_altivec. That should help with iovec based system calls like readv. - We could have two VMX breakpoints, one for when we know the user VMX state is loaded into the registers and one when it isn't. This could be a second bit in the paca so we can calculate the break points quickly. - One suggestion from Ben was to save and restore the VSX registers we use inline instead of using enable_kernel_altivec. [BenH: Fixed a problem with preempt and fixed build without CONFIG_ALTIVEC] Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 2011-12-07 20:11:45 +00:00			`checksum_wrappers_64.o hweight_64.o \`
powerpc: POWER7 optimised copy_page using VMX and enhanced prefetch Implement a POWER7 optimised copy_page using VMX and enhanced prefetch instructions. We use enhanced prefetch hints to prefetch both the load and store side. We copy a cacheline at a time and fall back to regular loads and stores if we are unable to use VMX (eg we are in an interrupt). The following microbenchmark was used to assess the impact of the patch: http://ozlabs.org/~anton/junkcode/page_fault_file.c We test MAP_PRIVATE page faults across a 1GB file, 100 times: # time ./page_fault_file -p -l 1G -i 100 Before: 22.25s After: 18.89s 17% faster Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 2012-05-29 19:33:12 +00:00			`copyuser_power7.o string_64.o copypage_power7.o`
powerpc: Emulate most Book I instructions in emulate_step() This extends the emulate_step() function to handle a large proportion of the Book I instructions implemented on current 64-bit server processors. The aim is to handle all the load and store instructions used in the kernel, plus all of the instructions that appear between l[wd]arx and st[wd]cx., so this handles the Altivec/VMX lvx and stvx and the VSX lxv2dx and stxv2dx instructions (implemented in POWER7). The new code can emulate user mode instructions, and checks the effective address for a load or store if the saved state is for user mode. It doesn't handle little-endian mode at present. For floating-point, Altivec/VMX and VSX instructions, it checks that the saved MSR has the enable bit for the relevant facility set, and if so, assumes that the FP/VMX/VSX registers contain valid state, and does loads or stores directly to/from the FP/VMX/VSX registers, using assembly helpers in ldstfp.S. Instructions supported now include: * Loads and stores, including some but not all VMX and VSX instructions, and lmw/stmw * Atomic loads and stores (l[dw]arx, st[dw]cx.) * Arithmetic instructions (add, subtract, multiply, divide, etc.) * Compare instructions * Rotate and mask instructions * Shift instructions * Logical instructions (and, or, xor, etc.) * Condition register logical instructions * mtcrf, cntlz[wd], exts[bhw] * isync, sync, lwsync, ptesync, eieio * Cache operations (dcbf, dcbst, dcbt, dcbtst) The overflow-checking arithmetic instructions are not included, but they appear not to be ever used in C code. This uses decimal values for the minor opcodes in the switch statements because that is what appears in the Power ISA specification, thus it is easier to check that they are correct if they are in decimal. If this is used to single-step an instruction where a data breakpoint interrupt occurred, then there is the possibility that the instruction is a lwarx or ldarx. In that case we have to be careful not to lose the reservation until we get to the matching st[wd]cx., or we'll never make forward progress. One alternative is to try to arrange that we can return from interrupts and handle data breakpoint interrupts without losing the reservation, which means not using any spinlocks, mutexes, or atomic ops (including bitops). That seems rather fragile. The other alternative is to emulate the larx/stcx and all the instructions in between. This is why this commit adds support for a wide range of integer instructions. Signed-off-by: Paul Mackerras <paulus@samba.org> 2010-06-15 04:48:58 +00:00			`obj-$(CONFIG_XMON) += sstep.o ldstfp.o`
			`obj-$(CONFIG_KPROBES) += sstep.o ldstfp.o`
powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors Implement perf-events based hw-breakpoint interfaces for PowerPC 64-bit server (Book III S) processors. This allows access to a given location to be used as an event that can be counted or profiled by the perf_events subsystem. This is done using the DABR (data breakpoint register), which can also be used for process debugging via ptrace. When perf_event hw_breakpoint support is configured in, the perf_event subsystem manages the DABR and arbitrates access to it, and ptrace then creates a perf_event when it is requested to set a data breakpoint. [Adopted suggestions from Paul Mackerras <paulus@samba.org> to - emulate_step() all system-wide breakpoints and single-step only the per-task breakpoints - perform arch-specific cleanup before unregistration through arch_unregister_hw_breakpoint() ] Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> 2010-06-15 06:05:19 +00:00			`obj-$(CONFIG_HAVE_HW_BREAKPOINT) += sstep.o ldstfp.o`
powerpc: Merge xmon The merged version follows the ppc64 version pretty closely mostly, and in fact ARCH=ppc64 now uses the arch/powerpc/xmon version. The main difference for ppc64 is that the 'p' command to call show_state (which was always pretty dodgy) has been replaced by the ppc32 'p' command, which calls a given procedure (so in fact the old 'p' command behaviour can be achieved with 'p $show_state'). Signed-off-by: Paul Mackerras <paulus@samba.org> 2005-10-28 12:53:37 +00:00
ppc64: use lockc.c from powerpc/lib since it is effectively the same as locks.c from ppc64/lib. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> 2005-10-27 06:45:38 +00:00			`ifeq ($(CONFIG_PPC64),y)`
			`obj-$(CONFIG_SMP) += locks.o`
powerpc: Rename copyuser_power7_vmx.c to vmx-helper.c Subsequent patches will add more VMX library functions and it makes sense to keep all the c-code helper functions in the one file. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 2012-05-29 19:31:24 +00:00			`obj-$(CONFIG_ALTIVEC) += vmx-helper.o`
ppc64: use lockc.c from powerpc/lib since it is effectively the same as locks.c from ppc64/lib. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> 2005-10-27 06:45:38 +00:00			`endif`
POWERPC: Move generic cpm2 stuff to powerpc This moves the cpm2 common code and PIC stuff to the powerpc. Most of the files were just copied from ppc/, with minor tuning to make it compile, and, subsequently, work. Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com> 2006-09-21 18:18:53 +00:00
[POWERPC] rheap: Changes config mechanism Instead of having in the makefile all the option that requires rheap, we define a configuration symbol and when needed we make sure it's selected. Signed-off-by: Sylvain Munaut <tnt@246tNt.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca> 2007-09-16 10:53:25 +00:00			`obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o`
powerpc: Move code patching code into arch/powerpc/lib/code-patching.c We currently have a few routines for patching code in asm/system.h, because they didn't fit anywhere else. I'd like to clean them up a little and add some more, so first move them into a dedicated C file - they don't need to be inlined. While we're moving the code, drop create_function_call(), it's intended caller never got merged and will be replaced in future with something different. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> 2008-06-24 01:32:21 +00:00
			`obj-y += code-patching.o`
powerpc: Split out do_feature_fixups() from cputable.c The logic to patch CPU feature sections lives in cputable.c, but these days it's used for CPU features as well as firmware features. Move it into it's own file for neatness and as preparation for some additions. While we're moving the code, we pull the loop body logic into a separate routine, and remove a comment which doesn't apply anymore. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> 2008-06-24 01:32:36 +00:00			`obj-y += feature-fixups.o`
powerpc: Add self-tests of the feature fixup code This commit adds tests of the feature fixup code, they are run during boot if CONFIG_FTR_FIXUP_SELFTEST=y. Some of the tests manually invoke the patching routines to check their behaviour, and others use the macros and so are patched during the normal patching done during boot. Because we have two sets of macros with different names, we use a macro to generate the test of the macros, very niiiice. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> 2008-06-24 01:33:03 +00:00			`obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o`