This adds optimized memset/bzero/page-clear routines for Niagara-4.
We basically can do what powerpc has been able to do for a decade (via
the "dcbz" instruction), which is use cache line clearing stores for
bzero and memsets with a 'c' argument of zero.
As long as we make the cache initializing store to each 32-byte
subblock of the L2 cache line, it works.
As with other Niagara-4 optimized routines, the key is to make sure to
avoid any usage of the %asi register, as reads and writes to it cost
at least 50 cycles.
For the user clear cases, we don't use these new routines, we use the
Niagara-1 variants instead. Those have to use %asi in an unavoidable
way.
A Niagara-4 8K page clear costs just under 600 cycles.
Add definitions of the MRU variants of the cache initializing store
ASIs. By default, cache initializing stores install the line as Least
Recently Used. If we know we're going to use the data immediately
(which is true for page copies and clears) we can use the Most
Recently Used variant, to decrease the likelyhood of the lines being
evicted before they get used.
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a Niagara 2 memcpy fix in this tree and I have
a Kconfig fix from Dave Jones which requires the sparc-next
changes which went upstream yesterday.
Signed-off-by: David S. Miller <davem@davemloft.net>
It gets clobbered by the kernel's VISEntryHalf, so we have to save it
in a different register than the set clobbered by that macro.
The instance in glibc is OK and doesn't have this problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
To use this, an architecture simply needs to:
1) Provide a user_addr_max() implementation via asm/uaccess.h
2) Add "select GENERIC_STRNCPY_FROM_USER" to their arch Kcnfig
3) Remove the existing strncpy_from_user() implementation and symbol
exports their architecture had.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: David Howells <dhowells@redhat.com>
Hide details of maximum user address calculation in a new
asm/uaccess.h interface named user_addr_max().
Provide little-endian implementation in find_zero(), which should work
but can probably be improved.
Abstrace alignment check behind IS_UNALIGNED() macro.
Kill double-semicolon, noticed by David Howells.
Signed-off-by: David S. Miller <davem@davemloft.net>
Compute a mask that will only have 0x80 in the bytes which
had a zero in them. The formula is:
~(((x & 0x7f7f7f7f) + 0x7f7f7f7f) | x | 0x7f7f7f7f)
In the inner word iteration, we have to compute the "x | 0x7f7f7f7f"
part, so we can reuse that in the above calculation.
Once we have this mask, we perform divide and conquer to find the
highest 0x80 location.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus removed the end-of-address-space hackery from
fs/namei.c:do_getname() so we really have to validate these edge
conditions and cannot cheat any more (as x86 used to as well).
Move to a common C implementation like x86 did. And if both
src and dst are sufficiently aligned we'll do word at a time
copies and checks as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise if no references exist in the static kernel image,
we won't export the symbol properly to modules.
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on copy from microblaze add ucmpdi2 implementation.
This fixes build of niu driver which failed with:
drivers/built-in.o: In function `niu_get_nfc':
niu.c:(.text+0x91494): undefined reference to `__ucmpdi2'
This driver will never be used on a sparc32 system,
but patch added to fix build breakage with all*config builds.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the explicit calls to .udiv/.umul in assembler, I made a
mechanical (read as: safe) transformation. I didn't attempt
to make any simplifications.
In particular, __ndelay and __udelay can be simplified significantly.
Some of the %y reads are unnecessary and these routines have no need
any longer for allocating a register window, they can be leaf
functions.
Signed-off-by: David S. Miller <davem@davemloft.net>
We always have this instruction available, so no need to use
btfixup for it any more.
This also eradicates the whole of atomic_32.S and thus the
__atomic_begin and __atomic_end symbols completely.
Signed-off-by: David S. Miller <davem@davemloft.net>
Both sparc 32-bit's software divide assembler and MPILIB provide
clz_tab[] with identical contents.
Break it out into a seperate object file and select it when
SPARC32 or MPILIB is set.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: James Morris <jmorris@namei.org>
Many architectures don't want to pull in iomap.c,
so they ended up duplicating pci_iomap from that file.
That function isn't trivial, and we are going to modify it
https://lkml.org/lkml/2011/11/14/183
so the duplication hurts.
This reduces the scope of the problem significantly,
by moving pci_iomap to a separate file and
referencing that from all architectures.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJPBZXBAAoJECgfDbjSjVRpuuYIAIMD0wE96MuTOSBJX4VG8VAP
UyjL9dsfMRy8CKioQo5/fxpTY07YBCWmNauSSX7pzgcoUKBfYIGn4Z1qwGYsWK9M
CzLs6PXLTugw0FtKobHZl/klRTWEBS6YOUjp9x568rplwF+Ppk7b993uj7eS/g+e
T0mUKzqg4/UavbHd9+W5KgC4drQ5hgtu2WZHoUxBK4umnd3C2G+U82Sthg50o/XU
SC8IGm39K8I36HoIWgXj3Y7nkOP3mQELohOT4ZPiVSmLvGS4i47+ix75anO+8ZvZ
jxHr8RC85IK1Nd89NZhbKOyvx0QQiwoKUZaTwcWXJNSOADzZnM6icdIsodc+Elo=
=ccQZ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
lib: use generic pci_iomap on all architectures
Many architectures don't want to pull in iomap.c,
so they ended up duplicating pci_iomap from that file.
That function isn't trivial, and we are going to modify it
https://lkml.org/lkml/2011/11/14/183
so the duplication hurts.
This reduces the scope of the problem significantly,
by moving pci_iomap to a separate file and
referencing that from all architectures.
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
alpha: drop pci_iomap/pci_iounmap from pci-noop.c
mn10300: switch to GENERIC_PCI_IOMAP
mn10300: add missing __iomap markers
frv: switch to GENERIC_PCI_IOMAP
tile: switch to GENERIC_PCI_IOMAP
tile: don't panic on iomap
sparc: switch to GENERIC_PCI_IOMAP
sh: switch to GENERIC_PCI_IOMAP
powerpc: switch to GENERIC_PCI_IOMAP
parisc: switch to GENERIC_PCI_IOMAP
mips: switch to GENERIC_PCI_IOMAP
microblaze: switch to GENERIC_PCI_IOMAP
arm: switch to GENERIC_PCI_IOMAP
alpha: switch to GENERIC_PCI_IOMAP
lib: add GENERIC_PCI_IOMAP
lib: move GENERIC_IOMAP to lib/Kconfig
Fix up trivial conflicts due to changes nearby in arch/{m68k,score}/Kconfig
atomic24 support was used to semaphores in the past - but is no longer used.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sparc copied pci_iomap from generic code, probably to avoid
pulling the rest of iomap.c in. Since that's in
a separate file now, we can reuse the common implementation.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Properly return the original destination buffer pointer.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Kjetil Oftedal <oftedal@gmail.com>
This is setting things up so that we can correct the return
value, so that it properly returns the original destination
buffer pointer.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Kjetil Oftedal <oftedal@gmail.com>
Don't use floating point on Niagara2, use the traditional
plain Niagara code instead.
Unroll Niagara loops to 128 bytes for copy, and 256 bytes
for clear.
Signed-off-by: David S. Miller <davem@davemloft.net>
Should have been done in commit 1af08a1407f4 ("This is in preparation
for more generic atomic").
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arun Sharma <asharma@fb.com>
Cc: David Miller <davem@davemloft.net>
Cc: "Hans-Christian Egtvedt" <hans-christian.egtvedt@atmel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we are in the label cc_dword_align, registers %o0 and %o1 have the same last 2 bits,
but it's not guaranteed one of them is zero. So we can get unaligned memory access
in label ccte. Example of parameters which lead to this:
%o0=0x7ff183e9, %o1=0x8e709e7d, %g1=3
With the parameters I had a memory corruption, when the additional 5 bytes were rewritten.
This patch corrects the error.
One comment to the patch. We don't care about the third bit in %o1, because cc_end_cruft
stores word or less.
Signed-off-by: Tkhai Kirill <tkhai@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use bitmap_set() instead of calling __set_bit() each bit.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
SPIN_LOCK_UNLOCK is deprecated. Use the lockdep capable variant
instead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David S. Miller <davem@davemloft.net>
As noticed by Mikulas Patocka, the backoff macros don't
completely nop out for UP builds, we still get a
branch always and a delay slot nop.
Fix this by making the branch to the backoff spin loop
selective, then we can nop out the spin loop completely.
Signed-off-by: David S. Miller <davem@davemloft.net>
Simple microoptimizations for sparc64 atomic functions:
Save one instruction by using a delay slot.
Use %g1 instead of %g7, because %g1 is written earlier.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Basically tip-off the powerpc code, use a 64-bit type and atomic64_t
interfaces for the implementation.
This gets us off of the by-hand asm code I wrote, which frankly I
think probably ruins I-cache hit rates.
The idea was the keep the call chains less deep, but anything taking
the rw-semaphores probably is also calling other stuff and therefore
already has allocated a stack-frame. So no real stack frame savings
ever.
Ben H. has posted patches to make powerpc use 64-bit too and with some
abstractions we can probably use a shared header file somewhere.
With suggestions from Sam Ravnborg.
Signed-off-by: David S. Miller <davem@davemloft.net>
128 bytes is sufficient for the register window save area, but the
calling conventions allow the callee to save up to 6 incoming argument
registers into the stack frame after the register window save area.
This means a minimal stack frame is 176 bytes (128 + (6 * 8)).
This fixes random crashes when using the function tracer.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's the only way we'll be able to implement the function
graph tracer properly.
A positive is that we no longer have to worry about the
linker over-optimizing the tail call, since we don't
use a tail call any more.
Signed-off-by: David S. Miller <davem@davemloft.net>
Check function_trace_stop at ftrace_caller
Toss mcount_call and dummy call of ftrace_stub, unnecessary.
Document problems we'll have if the final kernel image link
ever turns on relaxation.
Properly size 'ftrace_call' so it looks right when inspecting
instructions under gdb et al.
Signed-off-by: David S. Miller <davem@davemloft.net>
This mirrors x86 commit 9f0cf4adb6
(x86: Use __builtin_object_size() to validate the buffer size for copy_from_user())
Signed-off-by: David S. Miller <davem@davemloft.net>