sh port of the sLeAZY-fpu feature currently implemented for some architectures
such us i386.
Right now the SH kernel has a 100% lazy fpu behaviour.
This is of course great for applications that have very sporadic or no FPU use.
However for very frequent FPU users... you take an extra trap every context
switch.
The patch below adds a simple heuristic to this code: after 5 consecutive
context switches of FPU use, the lazy behavior is disabled and the context
gets restored every context switch.
After 256 switches, this is reset and the 100% lazy behavior is returned.
Tests with LMbench showed no regression.
I saw a little improvement due to the prefetching (~2%).
The tests below also show that, with this sLeazy patch, indeed,
the number of FPU exceptions is reduced.
To test this. I hacked the lat_ctx LMBench to use the FPU a little more.
sLeasy implementation
===========================================
switch_to calls | 79326
sleasy calls | 42577
do_fpu_state_restore calls| 59232
restore_fpu calls | 59032
Exceptions: 0x800 (FPU disabled ): 16604
100% Leazy (default implementation)
===========================================
switch_to calls | 79690
do_fpu_state_restore calls | 53299
restore_fpu calls | 53101
Exceptions: 0x800 (FPU disabled ): 53273
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The typename member of struct irq_chip was kept for migration purposes
and is obsolete since more than 2 years. Fix up the leftovers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-sh@vger.kernel.org
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
SPIN_LOCK_UNLOCKED is deprecated. Use __SPIN_LOCK_UNLOCKED instead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-sh@vger.kernel.org
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The icache may also contain aliases so we must account for them just
like we do when manipulating the dcache. We usually get away with
aliases in the icache because the instructions that are read from memory
are read-only, i.e. they never change. However, the place where this
bites us is when the code has been modified.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The dwarf unwinder presently attempts to provide a sane PC value if none
is provided, however the logic is broken and cases where a previous valid
dwarf frame exists along with a bogus PC value can still proceed. This
fixes up the test and prevents the unwinder from blowing up.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The hugetlb dependencies presently depend on SUPERH && MMU while the
hugetlb page size definitions depend on CPU_SH4 or CPU_SH5. This
unfortunately allows SH-3 + MMU configurations to enable hugetlbfs
without a corresponding HPAGE_SHIFT definition, resulting in the build
blowing up.
As SH-3 doesn't support variable page sizes, we tighten up the
dependenies a bit to prevent hugetlbfs from being enabled. These days
we also have a shiny new SYS_SUPPORTS_HUGETLBFS, so switch to using
that rather than adding to the list of corner cases in fs/Kconfig.
Reported-by: Kristoffer Ericson <kristoffer.ericson@gmail.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add an uImage.bin target to allow uncompressed uImages.
Useful for boards with busted u-boot decompression like
the rsk7203 on my desk.
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
When CONFIG_FUNCTION_GRAPH_TRACER is enabled the function graph tracer
may patch return addresses on the stack with the address of
return_to_handler(). This really confuses the DWARF unwinder because it
will try find the caller of return_to_handler(), not the caller of the
real return address.
So teach the DWARF unwinder how to find the real return address whenever
it encounters return_to_handler().
This patch does not cope very well when multiple return addresses on the
stack have been patched. To make it work properly it would require state
to track how many return_to_handler()'s have been seen so that we'd know
where to look in current->curr_ret_stack[]. So for now, instead of
trying to handle this, just moan if more than one return address on the
stack has been patched.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This adds an __irq_entry annotation for do_IRQ() so that the IRQ
annotation in the function graph tracer works as advertized. We already
have the IRQENTRY section wired up, so this is just a trivial addition
to actually make use of it.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The open function got the BKL via the big push down. Replace it by
preempt_enable/disable as this is sufficient for an UP machine.
The ioctl can be unlocked because there is no functionality which
requires serialization. The usage by multiple callers is broken with
and without the BKL due to the local static variable addr.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add code to handle the cache disabled case. Fixes breakage introduced by
37443ef3f0 ("sh: Migrate SH-4 cacheflush
ops to function pointers."). Without this patch configuring caches off
with CONFIG_CACHE_OFF=y makes kfr2r09 and migo-r lock up in fbdev
deferred io or early user space.
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Presently The SH-4 cache flushing code uses flush_cache_4096() for most
of the real flushing work, which breaks down to a fixed 4096 unroll and
increment. Not only is this sub-optimal for larger page sizes, it's also
uncovered a bug in sh4_flush_dcache_page() when large page sizes are used
and we have no cache aliases -- resulting in only a part of the page's
D-cache lines being written back.
Signed-off-by: Valentin Sitdikov <valentin.sitdikov@siemens.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The resume_userspace path had TRACE_IRQS_OFF written incorrectly and so
never handled the transition properly. This was fixed once before but
seems to have made it back in the tree. Fix it for good.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This only needs to flush the return code via the legacy path, and just
invalidates uselessly otherwise. This makes the behaviour consistent for
all of the trampoline setup paths.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The secondary CPU info was seeing corrupted results due to not entering
all of the setup paths taken by the boot CPU. So we just memcpy() the
boot cpu data over directly, and then fix up the per-CPU bits.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
We do not want to use smp_processor_id() from these paths, as they trip
preempt BUGs. Switch the test over to the boot cpu directly.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Sync up with latest core changes in the syscalls tracing area:
- tracing: Map syscall name to number (syscall_name_to_nr())
- tracing: Call arch_init_ftrace_syscalls at boot
- tracing: add support tracepoint ids (set_syscall_{enter,exit}_id())
Taken from the s390 change.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This too follows the ARM change, given that the issue at hand applies to
all platforms that implement lazy D-cache writeback.
This fixes up the case when a page mapping disappears between the
flush_dcache_page() call (when PG_dcache_dirty is set for the page) and
the update_mmu_cache() call -- such as in the case of swap cache being
freed early. This kills off the mapping test in update_mmu_cache() and
switches to simply testing for PG_dcache_dirty.
Reported-by: Nitin Gupta <ngupta@vflare.org>
Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This follows the ARM change, as SH had all of the same issues:
Make die() better match x86:
- add printing of the last accessed sysfs file
- ensure console_verbose() is called under the lock
- ensure we panic outside of oops_exit()
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Currently, we've got the less than ideal situation where if we need to
allocate a 256MB mapping we'll allocate four entries like so,
entry 1: 128MB
entry 2: 64MB
entry 3: 16MB
entry 4: 16MB
This is because as we execute the loop in pmb_remap() we will
progressively try mapping the remaining address space with smaller and
smaller sizes. This isn't good because the size we use on one iteration
may be the perfect size to use on the next iteration, for instance when
the initial size is divisible by one of the PMB mapping sizes.
With this patch, we now only need two entries in the PMB to map 256MB of
address space,
entry 1: 128MB
entry 2: 128MB
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
We should favour PMB mappings when the physical address cannot be
reached with 29-bits.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
If we fail to allocate a PMB entry in pmb_remap() we must remember to
clear and free any PMB entries that we may have previously allocated,
e.g. if we were allocating a multiple entry mapping.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Fix some callers of jump_to_uncached() and back_to_cached() that were
not annotated with __uses_jump_to_uncached.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Extend the ecovec24 board code to enable Power
Management LEDs showing the current sh7724 sleep state.
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Certain networking and USB workloads generate floods of these accesses,
so just disable it by default (thereby restoring the old behaviour). The
option remains configurable from userspace, and can still be used as a
debugging aid.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
There is now no need for the magicpanelr2 and dreamcast platforms to set
their own I/O port bas values, given that the generic machvec code now
sets this to P2SEG for everyone.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
There's already code in do_page_fault() to conditionally enable
interrupts, so we don't need to unconditonally enable them before
calling it. This fixes a lockdep warning where we called
trace_hardirqs_off() but with irqs still enabled.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This bumps up the default I/O base to P2SEG, which allows legacy probing
to bail out gracefully rather than oopsing. Platforms that have a real
PIO offset still need to fix this up on their own, although most
platforms are content with P2SEG already.
The previous change to teach ioport_map() about >= P1SEG offsets in
combination with this patch allows both the already remapped and the
legacy address probing to pass through and succeed.
Fixes up an oops with i8042 on the sh7785lcr board.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This fixes up the case where certain drivers already do their own
remapping and subsequently attempt to use the PIO calls for I/O. In this
case there is no additional remapping that needs to be done, and the
address can be casted in to the cookie directly.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Kuninori Morimoto <morimoto.kuninori@renesas.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* 'for-linus' of git://neil.brown.name/md: (97 commits)
md: raid-1/10: fix RW bits manipulation
md: remove unnecessary memset from multipath.
md: report device as congested when suspended
md: Improve name of threads created by md_register_thread
md: remove sparse warnings about lock context.
md: remove sparse waring "symbol xxx shadows an earlier one"
async_tx/raid6: add missing dma_unmap calls to the async fail case
ioat3: fix uninitialized var warnings
drivers/dma/ioat/dma_v2.c: fix warnings
raid6test: fix stack overflow
ioat2: clarify ring size limits
md/raid6: cleanup ops_run_compute6_2
md/raid6: eliminate BUG_ON with side effect
dca: module load should not be an error message
ioat: driver version 4.0
dca: registering requesters in multiple dca domains
async_tx: remove HIGHMEM64G restriction
dmaengine: sh: Add Support SuperH DMA Engine driver
dmaengine: Move all map_sg/unmap_sg for slave channel to its client
fsldma: Add DMA_SLAVE support
...
In the unaligned kernel exception fixup case the printk() was ordered
before the copy_from_user(), resulting in a nonsensical instruction
value. This fixes up the ordering properly.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This adds some sanity checking in the unaligned instruction handler to
verify the instruction size, which enables basic support for 16-bit
fixups on SH-2A parts.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
fix the following 'make includecheck' warning:
arch/sh/kernel/dwarf.c: asm/dwarf.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (39 commits)
cpumask: Move deprecated functions to end of header.
cpumask: remove unused deprecated functions, avoid accusations of insanity
cpumask: use new-style cpumask ops in mm/quicklist.
cpumask: use mm_cpumask() wrapper: x86
cpumask: use mm_cpumask() wrapper: um
cpumask: use mm_cpumask() wrapper: mips
cpumask: use mm_cpumask() wrapper: mn10300
cpumask: use mm_cpumask() wrapper: m32r
cpumask: use mm_cpumask() wrapper: arm
cpumask: Use accessors for cpu_*_mask: um
cpumask: Use accessors for cpu_*_mask: powerpc
cpumask: Use accessors for cpu_*_mask: mips
cpumask: Use accessors for cpu_*_mask: m32r
cpumask: remove arch_send_call_function_ipi
cpumask: arch_send_call_function_ipi_mask: s390
cpumask: arch_send_call_function_ipi_mask: powerpc
cpumask: arch_send_call_function_ipi_mask: mips
cpumask: arch_send_call_function_ipi_mask: m32r
cpumask: arch_send_call_function_ipi_mask: alpha
cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: ia64
...