If an SPE attempts a DMA put to a local store after already doing
a get, the kernel must update the HW PTE to allow the write access.
This case was not being handled correctly.
From: Mike Kistler <mkistler@us.ibm.com>
Signed-off-by: Mike Kistler <mkistler@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
I'm not sure where the information came from, but I assumed
that doing cache-inhibited mappings for mmio regions was
sufficient.
It seems we also need the guarded bit set, like everyone
else, which is the default for ioremap.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
As noticed by Milton Miller, setting the initial affinity in
spider-pic can go wrong if the target node field was not orinally
empty.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Change the dynamic PCI probe function for pSeries to use
ppc_md.pci_probe_mode() when appropriate.
Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Do not call prom exit prom_panic. It clears the screen and the exit
message is lost.
On some (or all?) pmacs it causes another crash when OF tries to print
the date and time in its banner.
Set of_platform earlier to catch more prom_panic() calls.
Signed-off-by: Olaf Hering <olh@suse.de>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The return statement is to prevent `warning: 'nid' might be used uninitialized
in this function'.
Cc: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The address of variable val in prom_init_stdout is passed to prom_getprop.
prom_getprop casts the pointer to u32 and passes it to call_prom in the hope
that OpenFirmware stores something there.
But the pointer is truncated in the lower bits and the expected value is
stored somewhere else.
In my testing I had a stackpointer of 0x0023e6b4. val was at offset 120,
wich has address 0x0023e72c. But the value passed to OF was 0x0023e728.
c00000000040b710: 3b 01 00 78 addi r24,r1,120
...
c00000000040b754: 57 08 00 38 rlwinm r8,r24,0,0,28
...
c00000000040b784: 80 01 00 78 lwz r0,120(r1)
...
c00000000040b798: 90 1b 00 0c stw r0,12(r27)
...
The stackpointer came from 32bit code.
The chain was yaboot -> zImage -> vmlinux
PowerMac OpenFirmware does appearently not handle the ELF sections
correctly. If yaboot was compiled in
/usr/src/packages/BUILD/lilo-10.1.1/yaboot, then the stackpointer is
unaligned. But the stackpointer is correct if yaboot is compiled in
/tmp/yaboot.
This bug triggered since 2.6.15, now prom_getprop is an inline
function. gcc clears the lower bits, instead of just clearing the
upper 32 bits.
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
the mfc member of a new context was not initialized to zero,
which potentially leads to wild memory accesses.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch is layered on top of CONFIG_SPARSEMEM
and is patterned after direct mapping of LS.
This patch allows mmap() of the following regions:
"mfc", which represents the area from [0x3000 - 0x3fff];
"cntl", which represents the area from [0x4000 - 0x4fff];
"signal1" which begins at offset 0x14000; "signal2" which
begins at offset 0x1c000.
The signal1 & signal2 files may be mmap()'d by regular user
processes. The cntl and mfc file, on the other hand, may
only be accessed if the owning process has CAP_SYS_RAWIO,
because they have the potential to confuse the kernel
with regard to parallel access to the same files with
regular file operations: the kernel always holds a spinlock
when accessing registers in these areas to serialize them,
which can not be guaranteed with user mmaps,
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds a new file called 'mfc' to each spufs directory.
The file accepts DMA commands that are a subset of what would
be legal DMA commands for problem state register access. Upon
reading the file, a bitmask is returned with the completed
tag groups set.
The file is meant to be used from an abstraction in libspe
that is added by a different patch.
From the kernel perspective, this means a process can now
offload a memory copy from or into an SPE local store
without having to run code on the SPE itself.
The transfer will only be performed while the SPE is owned
by one thread that is waiting in the spu_run system call
and the data will be transferred into that thread's
address space, independent of which thread started the
transfer.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
An SPU does not have a way to implement system calls
itself, but it can create intercepts to the kernel.
This patch uses the method defined by the JSRE interface
for C99 host library calls from an SPU to implement
Linux system calls. It uses the reserved SPU stop code
0x2104 for this, using the structure layout and syscall
numbers for ppc64-linux.
I'm still undecided wether it is better to have a list
of allowed syscalls or a list of forbidden syscalls,
since we can't allow an SPU to call all syscalls that
are defined for ppc64-linux.
This patch implements the easier choice of them, with a
blacklist that only prevents an SPU from calling anything
that interacts with its own execution, e.g fork, execve,
clone, vfork, exit, spu_run and spu_create and everything
that deals with signals.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
powerpc currently declares some of its own system calls
in <asm/unistd.h>, but not all of them. That place also
contains remainders of the now almost unused kernel syscall
hack.
- Add a new <asm/syscalls.h> with clean declarations
- Include that file from every source that implements one
of these
- Get rid of old declarations in <asm/unistd.h>
This patch is required as a base for implementing system
calls from an SPU, but also makes sense as a general
cleanup.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Apparently we have found a bug in the CPU that causes
external interrupts to sometimes get disabled indefinitely.
This adds a workaround for the problem.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The current interrupt controller setup on Cell is done
in a rather ad-hoc way with device tree properties
that are not standardized at all.
In an attempt to do something that follows the OF standard
(or at least the IBM extensions to it) more closely,
we have now come up with this patch. It still provides
a fallback to the old behaviour when we find older firmware,
that hack can not be removed until the existing customer
installations have upgraded.
Cc: hpenner@de.ibm.com
Cc: stk@de.ibm.com
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Milton Miller <miltonm@bga.com>
Cc: benh@kernel.crashing.org
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The default configuration in mainline got a little out of
sync with what we use internally.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A small bug crept in the iommu driver when we made it more
generic. This patch is needed for boards that have a dma
window that does not start at bus address zero.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
So that we can use firmware_has_feature() in a BUG_ON() and have the compiler
elide the code entirely if the feature can never be set, change
firmware_has_feature to a macro. Unfortunate, but necessary at least until
GCC bug #26724 is fixed.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Change BUG_ON and WARN_ON to give the compiler a chance to perform
compile-time optimsations. Depending on the complexity of the condition,
the compiler may not do this very well, so if it's important check the
object code.
Current GCC's (4.x) produce good code as long as the condition does not
include a function call, including a static inline.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Christoph noticed that sparse warned about all the enum tags in cuptable.h
that had values that required them to be type log. (enum tags are ints
according to the standard.)
This patch attempts to fix them in the least intrusive way possible by
turning them all into #defines except for the 32 bit CPU_FTRS_POSSIBLE and
CPU_FTRS_ALWAYS which are hard to construct that way. This works because
these last two contain no bits above 2^31.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Ingo's sem2mutex patch incorrectly replaced one reference to ipc/sem.c
with ipc/mutex.c in a comment.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
drivers/char/ftape/lowlevel/fdc-io.c: Correct a comment
Kconfig help: MTD_JEDECPROBE already supports Intel
Remove ugly debugging stuff
do_mounts.c: Minor ROOT_DEV comment cleanup
BUG_ON() Conversion in drivers/s390/block/dasd_devmap.c
BUG_ON() Conversion in mm/mempool.c
BUG_ON() Conversion in mm/memory.c
BUG_ON() Conversion in kernel/fork.c
BUG_ON() Conversion in ipc/sem.c
BUG_ON() Conversion in fs/ext2/
BUG_ON() Conversion in fs/hfs/
BUG_ON() Conversion in fs/dcache.c
BUG_ON() Conversion in fs/buffer.c
BUG_ON() Conversion in input/serio/hp_sdc_mlc.c
BUG_ON() Conversion in md/dm-table.c
BUG_ON() Conversion in md/dm-path-selector.c
BUG_ON() Conversion in drivers/isdn
BUG_ON() Conversion in drivers/char
BUG_ON() Conversion in drivers/mtd/
<linux@horizon.com> wrote:
This is an extremely well-known technique. You can see a similar version that
uses a multiply for the last few steps at
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel whch
refers to "Software Optimization Guide for AMD Athlon 64 and Opteron
Processors"
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF
It's section 8.6, "Efficient Implementation of Population-Count Function in
32-bit Mode", pages 179-180.
It uses the name that I am more familiar with, "popcount" (population count),
although "Hamming weight" also makes sense.
Anyway, the proof of correctness proceeds as follows:
b = a - ((a >> 1) & 0x55555555);
c = (b & 0x33333333) + ((b >> 2) & 0x33333333);
d = (c + (c >> 4)) & 0x0f0f0f0f;
#if SLOW_MULTIPLY
e = d + (d >> 8)
f = e + (e >> 16);
return f & 63;
#else
/* Useful if multiply takes at most 4 cycles */
return (d * 0x01010101) >> 24;
#endif
The input value a can be thought of as 32 1-bit fields each holding their own
hamming weight. Now look at it as 16 2-bit fields. Each 2-bit field a1..a0
has the value 2*a1 + a0. This can be converted into the hamming weight of the
2-bit field a1+a0 by subtracting a1.
That's what the (a >> 1) & mask subtraction does. Since there can be no
borrows, you can just do it all at once.
Enumerating the 4 possible cases:
0b00 = 0 -> 0 - 0 = 0
0b01 = 1 -> 1 - 0 = 1
0b10 = 2 -> 2 - 1 = 1
0b11 = 3 -> 3 - 1 = 2
The next step consists of breaking up b (made of 16 2-bir fields) into
even and odd halves and adding them into 4-bit fields. Since the largest
possible sum is 2+2 = 4, which will not fit into a 4-bit field, the 2-bit
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
"which will not fit into a 2-bit field"
fields have to be masked before they are added.
After this point, the masking can be delayed. Each 4-bit field holds a
population count from 0..4, taking at most 3 bits. These numbers can be added
without overflowing a 4-bit field, so we can compute c + (c >> 4), and only
then mask off the unwanted bits.
This produces d, a number of 4 8-bit fields, each in the range 0..8. From
this point, we can shift and add d multiple times without overflowing an 8-bit
field, and only do a final mask at the end.
The number to mask with has to be at least 63 (so that 32 on't be truncated),
but can also be 128 or 255. The x86 has a special encoding for signed
immediate byte values -128..127, so the value of 255 is slower. On other
processors, a special "sign extend byte" instruction might be faster.
On a processor with fast integer multiplies (Athlon but not P4), you can
reduce the final few serially dependent instructions to a single integer
multiply. Consider d to be 3 8-bit values d3, d2, d1 and d0, each in the
range 0..8. The multiply forms the partial products:
d3 d2 d1 d0
d3 d2 d1 d0
d3 d2 d1 d0
+ d3 d2 d1 d0
----------------------
e3 e2 e1 e0
Where e3 = d3 + d2 + d1 + d0. e2, e1 and e0 obviously cannot generate
any carries.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
By defining generic hweight*() routines
- hweight64() will be defined on all architectures
- hweight_long() will use architecture optimized hweight32() or hweight64()
I found two possible cleanups by these reasons.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
generic_{ffs,fls,fls64,hweight{64,32,16,8}}() were moved into
include/asm-generic/bitops.h. So all architectures don't use them.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now the only user who are using generic_ffs() is ntfs filesystem. This patch
isolates generic_ffs() as ntfs_ffs() for ntfs.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The find_*_bit() routines are defined to work on a pointer to unsigned long.
But partial_page.bitmap is unsigned int and it is passed to find_*_bit() in
arch/ia64/ia32/sys_ia32.c. So the compiler will print warnings.
This patch changes to unsigned long instead.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The test_bit() routines are defined to work on a pointer to unsigned long.
But thread_info.flags is __u32 (unsigned int) on sh and it is passed to flag
set/clear/test wrappers in include/linux/thread_info.h. So the compiler will
print warnings.
This patch changes to unsigned long instead.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently include/asm-generic/bitops.h is not referenced from anywhere. But
it will be the benefit of those who are trying to port Linux to another
architecture.
So update it by same manner
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>