RCU free the struct inode. This will allow:
- Subsequent store-free path walking patch. The inode must be consulted for
permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
page lock to follow page->mapping.
The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.
In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.
The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
dget_locked was a shortcut to avoid the lazy lru manipulation when we already
held dcache_lock (lru manipulation was relatively cheap at that point).
However, how that the lru lock is an innermost one, we never hold it at any
caller, so the lock cost can now be avoided. We already have well working lazy
dcache LRU, so it should be fine to defer LRU manipulations to scan time.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Protect d_unhashed(dentry) condition with d_lock. This means keeping
DCACHE_UNHASHED bit in synch with hash manipulations.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Hi,
We can get rid of a memset in
arch/powerpc/platforms/cell/spufs/lscsa_alloc.c::spu_alloc_lscsa_std() by
using vzalloc() rather than vmalloc()+memset().
Completely untested patch below since I have no hardware nor tools to
compile this.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
direct_dma_ops is the default pci dma ops.
No need to call a function to get the pci dma ops, we know they are the
dma_direct_ops.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (71 commits)
powerpc/44x: Update ppc44x_defconfig
powerpc/watchdog: Make default timeout for Book-E watchdog a Kconfig option
fsl_rio: Add comments for sRIO registers.
powerpc/fsl-booke: Add e55xx (64-bit) smp defconfig
powerpc/fsl-booke: Add p5020 DS board support
powerpc/fsl-booke64: Use TLB CAMs to cover linear mapping on FSL 64-bit chips
powerpc/fsl-booke: Add support for FSL Arch v1.0 MMU in setup_page_sizes
powerpc/fsl-booke: Add support for FSL 64-bit e5500 core
powerpc/85xx: add cache-sram support
powerpc/85xx: add ngPIXIS FPGA device tree node to the P1022DS board
powerpc: Fix compile error with paca code on ppc64e
powerpc/fsl-booke: Add p3041 DS board support
oprofile/fsl emb: Don't set MSR[PMM] until after clearing the interrupt.
powerpc/fsl-booke: Add PCI device ids for P2040/P3041/P5010/P5020 QoirQ chips
powerpc/mpc8xxx_gpio: Add support for 'qoriq-gpio' controllers
powerpc/fsl_booke: Add support to boot from core other than 0
powerpc/p1022: Add probing for individual DMA channels
powerpc/fsl_soc: Search all global-utilities nodes for rstccr
powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT
powerpc/mpc83xx: Support for MPC8308 P1M board
...
Fix up conflict with the generic irq_work changes in arch/powerpc/kernel/time.c
The default for llseek is changing, so we need
explicit operations everywhere.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
Add calls to of_node_put in the error handling code following calls to
of_find_node_by_path and of_find_node_by_phandle.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r exists@
local idexpression x;
expression E,E1;
statement S;
@@
*x =
(of_find_node_by_path
|of_find_node_by_name
|of_find_node_by_phandle
|of_get_parent
|of_get_next_parent
|of_get_next_child
|of_find_compatible_node
|of_match_node
)(...);
...
if (x == NULL) S
<... when != x = E
*if (...) {
... when != of_node_put(x)
when != if (...) { ... of_node_put(x); ... }
(
return <+...x...+>;
|
* return ...;
)
}
...>
of_node_put(x);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When looking at some issues with the virtual ethernet driver I noticed
that TCE allocation was following a very strange pattern:
address 00e9000 length 2048
address 0409000 length 2048 <-----
address 0429000 length 2048
address 0449000 length 2048
address 0469000 length 2048
address 0489000 length 2048
address 04a9000 length 2048
address 04c9000 length 2048
address 04e9000 length 2048
address 4009000 length 2048 <-----
address 4029000 length 2048
Huge unexplained gaps in what should be an empty TCE table. It turns out
it_blocksize, the amount we want to align the next allocation to, was
c0000000fe903b20. Completely bogus.
Initialise it to something reasonable in the VIO IOMMU code, and use kzalloc
everywhere to protect against this when we next add a non compulsary
field to iommu code and forget to initialise it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
no need for list_for_each_entry_safe()/resetting with superblock list
Fix sget() race with failing mount
vfs: don't hold s_umount over close_bdev_exclusive() call
sysv: do not mark superblock dirty on remount
sysv: do not mark superblock dirty on mount
btrfs: remove junk sb_dirt change
BFS: clean up the superblock usage
AFFS: wait for sb synchronization when needed
AFFS: clean up dirty flag usage
cifs: truncate fallout
mbcache: fix shrinker function return value
mbcache: Remove unused features
add f_flags to struct statfs(64)
pass a struct path to vfs_statfs
update VFS documentation for method changes.
All filesystems that need invalidate_inode_buffers() are doing that explicitly
convert remaining ->clear_inode() to ->evict_inode()
Make ->drop_inode() just return whether inode needs to be dropped
fs/inode.c:clear_inode() is gone
fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
...
Fix up trivial conflicts in fs/nilfs2/super.c
Replace inode_setattr with opencoded variants of it in all callers. This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.
In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:
spufs: explicitly checks for ATTR_SIZE earlier
btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above
In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This list used was by only two platforms with all other platforms defining an
own list of valid bus id's to pass to of_platform_bus_probe. This patch:
i) copies the default list to the two platforms that depended on it (powerpc)
ii) remove the usage of of_default_bus_ids in of_platform_bus_probe
iii) removes the definition of the list from all architectures that defined it
Passing a NULL 'matches' parameter to of_platform_bus_probe is still valid; the
function returns no error in that case as the NULL value is equivalent to an
empty list.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
[grant.likely@secretlab.ca: added __initdata annotations, warn on and return error on missing match table, and fix whitespace errors]
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
of_device is just a #define alias to platform_device. This patch
replaces all references to it with platform_device.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
of_platform_bus was being used in the same manner as the platform_bus.
The only difference being that of_platform_bus devices are generated
from data in the device tree, and platform_bus devices are usually
statically allocated in platform code. Having them separate causes
the problem of device drivers having to be registered twice if it
was possible for the same device to appear on either bus.
This patch removes of_platform_bus_type and registers all of_platform
bus devices and drivers on the platform bus instead. A previous patch
made the of_device structure an alias for the platform_device structure,
and a shim is used to adapt of_platform_drivers to the platform bus.
After all of of_platform_bus drivers are converted to be normal platform
drivers, the shim code can be removed.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
via following scripts
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/lmb/memblock/g' \
-e 's/LMB/MEMBLOCK/g' \
$FILES
for N in $(find . -name lmb.[ch]); do
M=$(echo $N | sed 's/lmb/memblock/g')
mv $N $M
done
and remove some wrong change like lmbench and dlmb etc.
also move memblock.c from lib/ to mm/
Suggested-by: Ingo Molnar <mingo@elte.hu>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix smatch warning: warning: constant 0x800000000 is so big it is long
Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We don't name our generic fsync implementations very well currently.
The no-op implementation for in-memory filesystems currently is called
simple_sync_file which doesn't make too much sense to start with,
the the generic one for simple filesystems is called simple_fsync
which can lead to some confusion.
This patch renames the generic file fsync method to generic_file_fsync
to match the other generic_file_* routines it is supposed to be used
with, and the no-op implementation to noop_fsync to make it obvious
what to expect. In addition add some documentation for both methods.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Merging in current state of Linus' tree to deal with merge conflicts and
build failures in vio.c after merge.
Conflicts:
drivers/i2c/busses/i2c-cpm.c
drivers/i2c/busses/i2c-mpc.c
drivers/net/gianfar.c
Also fixed up one line in arch/powerpc/kernel/vio.c to use the
correct node pointer.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
.name, .match_table and .owner are duplicated in both of_platform_driver
and device_driver. This patch is a removes the extra copies from struct
of_platform_driver and converts all users to the device_driver members.
This patch is a pretty mechanical change. The usage model doesn't change
and if any drivers have been missed, or if anything has been fixed up
incorrectly, then it will fail with a compile time error, and the fixup
will be trivial. This patch looks big and scary because it touches so
many files, but it should be pretty safe.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Sean MacLennan <smaclennan@pikatech.com>
This patch eliminates the node pointer from struct of_device and the
of_node (or prom_node) pointer from struct dev_archdata since the node
pointer is now part of struct device proper when CONFIG_OF is set, and
all users of the old pointer locations have already been converted over
to use device->of_node.
Also remove dev_archdata_{get,set}_node() as it is no longer used by
anything.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
The following structure elements duplicate the information in
'struct device.of_node' and so are being eliminated. This patch
makes all readers of these elements use device.of_node instead.
(struct of_device *)->node
(struct dev_archdata *)->prom_node (sparc)
(struct dev_archdata *)->of_node (powerpc & microblaze)
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Dynamically allocate cpu_sibling_map and cpu_core_map cpumasks.
We don't need to set_cpu_online() the boot cpu in smp_prepare_boot_cpu,
init/main.c does it for us.
We also postpone setting of the boot cpu in cpu_sibling_map and cpu_core_map
until when the memory allocator is available (smp_prepare_cpus), similar
to x86.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
beat_htab_lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
beatic_irq_mask_lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now we use printf style alignment there is no need to manually space
these fields.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
machine is compatible is an OF-specific call. It should have
the of_ prefix to protect the global namespace.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Michal Simek <monstr@monstr.eu>
In struct device_node, the phandle is named 'linux_phandle' for PowerPC
and MicroBlaze, and 'node' for SPARC. There is no good reason for the
difference, it is just an artifact of the code diverging over a couple
of years. This patch renames both to simply .phandle.
Note: the .node also existed in PowerPC/MicroBlaze, but the only user
seems to be arch/powerpc/platforms/powermac/pfunc_core.c. It doesn't
look like the assignment between .linux_phandle and .node is
significantly different enough to warrant the separate code paths
unless ibm,phandle properties actually appear in Apple device trees.
I think it is safe to eliminate the old .node property and use
phandle everywhere.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Make sure compiler won't do weird things with limits. E.g. fetching
them twice may return 2 different values after writable limits are
implemented.
I.e. either use rlimit helpers added in
3e10e716ab
or ACCESS_ONCE if not applicable.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'for-33' of git://repo.or.cz/linux-kbuild: (29 commits)
net: fix for utsrelease.h moving to generated
gen_init_cpio: fixed fwrite warning
kbuild: fix make clean after mismerge
kbuild: generate modules.builtin
genksyms: properly consider EXPORT_UNUSED_SYMBOL{,_GPL}()
score: add asm/asm-offsets.h wrapper
unifdef: update to upstream revision 1.190
kbuild: specify absolute paths for cscope
kbuild: create include/generated in silentoldconfig
scripts/package: deb-pkg: use fakeroot if available
scripts/package: add KBUILD_PKG_ROOTCMD variable
scripts/package: tar-pkg: use tar --owner=root
Kbuild: clean up marker
net: add net_tstamp.h to headers_install
kbuild: move utsrelease.h to include/generated
kbuild: move autoconf.h to include/generated
drop explicit include of autoconf.h
kbuild: move compile.h to include/generated
kbuild: drop include/asm
kbuild: do not check for include/asm-$ARCH
...
Fixed non-conflicting clean merge of modpost.c as per comments from
Stephen Rothwell (modpost.c had grown an include of linux/autoconf.h
that needed to be changed to generated/autoconf.h)
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
m68k: rename global variable vmalloc_end to m68k_vmalloc_end
percpu: add missing per_cpu_ptr_to_phys() definition for UP
percpu: Fix kdump failure if booted with percpu_alloc=page
percpu: make misc percpu symbols unique
percpu: make percpu symbols in ia64 unique
percpu: make percpu symbols in powerpc unique
percpu: make percpu symbols in x86 unique
percpu: make percpu symbols in xen unique
percpu: make percpu symbols in cpufreq unique
percpu: make percpu symbols in oprofile unique
percpu: make percpu symbols in tracer unique
percpu: make percpu symbols under kernel/ and mm/ unique
percpu: remove some sparse warnings
percpu: make alloc_percpu() handle array types
vmalloc: fix use of non-existent percpu variable in put_cpu_var()
this_cpu: Use this_cpu_xx in trace_functions_graph.c
this_cpu: Use this_cpu_xx for ftrace
this_cpu: Use this_cpu_xx in nmi handling
this_cpu: Use this_cpu operations in RCU
this_cpu: Use this_cpu ops for VM statistics
...
Fix up trivial (famous last words) global per-cpu naming conflicts in
arch/x86/kvm/svm.c
mm/slab.c
There is no longer any use of the include2/ directory.
The generated files has moved to include/generated.
Drop all references to said directory.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Writing a driver using SCLPC on the MPC5200B I detected, that the
intspec arrays to map irqs to Linux virq cannot be const, because the
mapping and xlate functions only take non const pointers. All those
functions do not modify the intspec, so a const pointer could be used.
Signed-off-by: Roman Fietze <roman.fietze@telemotive.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The typename member of struct irq_chip was kept for migration purposes
and is obsolete since more than 2 years. Fix up the leftovers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@ozlabs.org
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
size_t len cannot be less than 0.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
get_irq_desc() is a powerpc-specific version of irq_to_desc(). That
is reason enough to remove it, but it also doesn't know about sparse
irq_desc support which irq_to_desc() does (when we enable it).
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch updates percpu related symbols in powerpc such that percpu
symbols are unique and don't clash with local symbols. This serves
two purposes of decreasing the possibility of global percpu symbol
collision and allowing dropping per_cpu__ prefix from percpu symbols.
* arch/powerpc/kernel/perf_callchain.c: s/callchain/cpu_perf_callchain/
* arch/powerpc/kernel/setup-common.c: s/pvr/cpu_pvr/
* arch/powerpc/platforms/pseries/dtl.c: s/dtl/cpu_dtl/
* arch/powerpc/platforms/cell/interrupt.c: s/iic/cpu_iic/
Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
which cause name clashes" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
cppcheck found a memory leak in axon_msi, if dcr_base or dcr_len are zero,
we have already allocated msic, so we should free it in the error path.
Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code
But leave TTM code alone, something is fishy there with global vm_ops
being used.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes this is used to hold a simple offset, and sometimes
it is used to hold a pointer. This patch changes it to a union containing
void * and dma_addr_t. get/set accessors are also provided, because it was
getting a bit ugly to get to the actual data.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that the last users of markers have migrated to the event
tracer we can kill off the (now orphan) support code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090917173527.GA1699@lst.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This converts uses dma_map_ops struct (in include/linux/dma-mapping.h)
instead of POWERPC homegrown dma_mapping_ops.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I wrote sputrace before generic tracing infrastrucure was available.
Now that we have the generic event tracer we can convert it over and
remove a lot of code:
8 files changed, 45 insertions(+), 285 deletions(-)
To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
the spufs trace channel by
echo 1 > /sys/kernel/debug/tracing/events/spufs/spufs_context/enable
and then read the trace records using e.g.
cat /sys/kernel/debug/tracing/trace
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Those definitions are currently declared extern in the .c file where
they are used, move them to a header file instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Replace strncpy() and explicit null-termination by strlcpy()
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
pr_debug() can now result in code being generated even when DEBUG
is not defined. That's not really desirable in some places.
With CONFIG_DYNAMIC_DEBUG=y:
size before:
text data bss dec hex filename
7083 1616 0 8699 21fb arch/powerpc/../axon_msi.o
size after:
text data bss dec hex filename
5772 1208 0 6980 1b44 arch/powerpc/../axon_msi.o
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Several platforms use their own copy of what is essentially the same code,
using RTAS to synchronize the timebases when bringing up new CPUs. This
moves it all into a single common implementation and additionally
turns the spinlock into a raw spinlock since the former can rely on
the timebase not being frozen when spinlock debugging is enabled, and finally
masks interrupts while the timebase is disabled.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This allows the callers to now pass down the full set of FAULT_FLAG_xyz
flags to handle_mm_fault(). All callers have been (mechanically)
converted to the new calling convention, there's almost certainly room
for architectures to clean up their code and then add FAULT_FLAG_RETRY
when that support is added.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* akpm: (182 commits)
fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset
fbdev: *bfin*: fix __dev{init,exit} markings
fbdev: *bfin*: drop unnecessary calls to memset
fbdev: bfin-t350mcqb-fb: drop unused local variables
fbdev: blackfin has __raw I/O accessors, so use them in fb.h
fbdev: s1d13xxxfb: add accelerated bitblt functions
tcx: use standard fields for framebuffer physical address and length
fbdev: add support for handoff from firmware to hw framebuffers
intelfb: fix a bug when changing video timing
fbdev: use framebuffer_release() for freeing fb_info structures
radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb?
s3c-fb: CPUFREQ frequency scaling support
s3c-fb: fix resource releasing on error during probing
carminefb: fix possible access beyond end of carmine_modedb[]
acornfb: remove fb_mmap function
mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF
mb862xxfb: restrict compliation of platform driver to PPC
Samsung SoC Framebuffer driver: add Alpha Channel support
atmel-lcdc: fix pixclock upper bound detection
offb: use framebuffer_alloc() to allocate fb_info struct
...
Manually fix up conflicts due to kmemcheck in mm/slab.c
Now we have __initconst, we can finally move the external declarations for
the various Linux logo structures to <linux/linux_logo.h>.
James' ack dates back to the previous submission (way to long ago), when the
logos were still __initdata, which caused failures on some platforms with some
toolchain versions.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: James Simmons <jsimmons@infradead.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Callers of alloc_pages_node() can optionally specify -1 as a node to mean
"allocate from the current node". However, a number of the callers in
fast paths know for a fact their node is valid. To avoid a comparison and
branch, this patch adds alloc_pages_exact_node() that only checks the nid
with VM_BUG_ON(). Callers that know their node is valid are then
converted.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Paul Mundt <lethal@linux-sh.org> [for the SLOB NUMA bits]
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Axon MSI driver incorrectly uses platform_data, rather than
the proper accessors for driver_data.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Both arch/powerpc/platforms/cell/iommu.c and arch/powerpc/platforms/ps3/mm.c
contain the same Cell IOMMU page table entry definitions. Extract them and move
them to <asm/iommu.h>, while adding a CBE_ prefix.
This also allows them to be used by drivers.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 45db924089 ("powerpc/spufs: Remove
double check for non-negative dentry") removed the only user of the
out_dput label, so remove it and the code following it.
Gets rid of this warning:
arch/powerpc/platforms/cell/spufs/inode.c: In function 'spufs_create':
arch/powerpc/platforms/cell/spufs/inode.c:647: warning: label 'out_dput' defined but not used
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch removes an unnecessary double check if the dentry returned by
lookup_create() is actually non-negative. Since lookup_create() itself returns
an error in this case just remove the check.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We shouldn't directly access sysdata to get the device node to just
go get the pci_controller. We can call pci_bus_to_host() for this
purpose.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There have been a series of checkstops on QS21 related to
ptcal being set up incorrectly. On systems that only
have memory on a single node, ptcal fails when it gets
a pointer to memory on the remote node.
Moreover, agressive prefetching in memcpy and other
functions may accidentally touch the first cache line
of the page that we reserve for ptcal, which causes
an ECC checkstop.
We now allocate pages only from the specified node, moves the
ptcal area into the middle of the allocated page to avoid
potential prefetch problems and prints the address of the
ptcal area to facilitate diagnostics.
Signed-off-by: Gerhard Stenzel <gerhard.stenzel@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently PPC_CELL_NATIVE selects PPC_OF_PLATFORM_PCI, but does not
select PCI. This can lead to a config with the former and the latter
disabled, which does not build.
To fix this PPC_CELL_NATIVE should select PCI. However, that would
force PCI on for QPACE, which also selects PPC_CELL_NATIVE. So
instead move the select of PPC_OF_PLATFORM_PCI and PCI under both
IBM_CELL_BLADE and CELLEB.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Replace all DMA_64BIT_MASK macro with DMA_BIT_MASK(64)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits)
x86: disable __do_IRQ support
sparseirq, powerpc/cell: fix unused variable warning in interrupt.c
genirq: deprecate obsolete typedefs and defines
genirq: deprecate __do_IRQ
genirq: add doc to struct irqaction
genirq: use kzalloc instead of explicit zero initialization
genirq: make irqreturn_t an enum
genirq: remove redundant if condition
genirq: remove unused hw_irq_controller typedef
irq: export remove_irq() and setup_irq() symbols
irq: match remove_irq() args with setup_irq()
irq: add remove_irq() for freeing of setup_irq() irqs
genirq: assert that irq handlers are indeed running in hardirq context
irq: name 'p' variables a bit better
irq: further clean up the free_irq() code flow
irq: refactor and clean up the free_irq() code flow
irq: clean up manage.c
irq: use GFP_KERNEL for action allocation in request_irq()
kernel/irq: fix sparse warning: make symbol static
irq: optimize init_kstat_irqs/init_copy_kstat_irqs
...
Currently, we will report a page fault as a segment fault, and report
a segment fault as both a page and segment fault.
Fix the SPF_P definition to be correct according to the iommu docs, and
mask before comparing.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since a number of powerpc chips are SoCs we end up having dma-able
devices that are registered as platform or of_platform devices. We need
to hook the archdata to setup proper dma_ops for these devices.
Rather than having to add a bus_notify to each platform we add a default
one at the highest priority (called first) to set the default dma_ops for
of_platform and platform devices to dma_direct_ops. This allows platform
code to override the ops by providing their own notifier call back.
In the future to enable >4G DMA support on ppc32 we can hook swiotlb ops.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Makes code futureproof against the impending change to mm->cpu_vm_mask.
It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This new compiler warning:
arch/powerpc/platforms/cell/interrupt.c: In function 'handle_iic_irq':
arch/powerpc/platforms/cell/interrupt.c:240: warning: unused variable 'cpu'
Triggers because the local variable 'cpu' became unused due to commit:
dee4102: sparseirq: use kstat_irqs_cpu instead
Remove the variable.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: ppc-dev <linuxppc-dev@ozlabs.org>
LKML-Reference: <20090316185256.4a160374.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
CONFIG_PPC_MULTIPLATFORM is a remain of the pre-powerpc days and isn't
really meaningful anymore. It was basically equivalent to PPC64 || 6xx.
This removes it along with the following changes:
- 32-bit platforms that relied on PPC32 && PPC_MULTIPLATFORM now rely
on 6xx which is what they want anyway.
- A new symbol, PPC_BOOK3S, is defined that represent compliance with
the "Server" variant of the architecture. This is set when either 6xx
or PPC64 is set and open the door for future BOOK3E 64-bit.
- 64-bit platforms that relied on PPC64 && PPC_MULTIPLATFORM now use
PPC64 && PPC_BOOK3S
- A separate and selectable CONFIG_PPC_OF_BOOT_TRAMPOLINE option is now
used to control the use of prom_init.c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The Axon MSI driver depends on more than just PCI_MSI, so add a
Kconfig fragment for it. Fixes randconfig build failures.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to offset by *pos bytes, not *pos words.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Based on an original patch from Roel Kluin <roel.kluin@gmail.com>.
The write size calculated during regs and fpcr writes may currently
go negative. Because size is unsigned, this will wrap, and our
check for EFBIG will fail.
Instead, do the check for EFBIG before subtracting from size.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
spuctx_switch_state() warns if ktime goes backwards, but it
sometimes compares an uninitialized value, which showed that
the data was unreliable when we actually saw the warning.
Initialize it to the current time in order to get correct data.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds the necessary bits and pieces to powerpc implementation of
ioremap to benefit from caller tracking in /proc/vmallocinfo, at least
for ioremap's done after mem init as the older ones aren't tracked.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The new firmware release exports further RTC calls. This
patch adds these calls to the QPACE platform setup file.
Signed-off-by: Benjamin Krill <ben@codiert.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cbe_cpufreq has a partial dependency on cbe_cpufreq_pmi, which cannot
be easily expressed in Kconfig. This fixes it by introducing an
extra Kconfig symbol CBE_CPUFREQ_PMI_ENABLE. To make the dependency
clearer, turn PPC_PMI into an automatic symbol.
Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The spufs context directory contents definitions are not changed after
initialisation, so we can declare them as const. We can do the same
with the spu coredump reader callbacks too.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, we may setup the MFC for isolated mode initilaisation with
the purge still active. This means that DMAs required to perform the
init do not happen.
This change clears the purge status after doing the purge, so that
the isolated init can proceed.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, spu_handle_mm_fault disregards the 'ret' variable and always
returns -EFAULT on error.
This change refactos spu_handle_mm_fault a little, to return the
ret variable as appropriate. This allows us to combine the error and
sucess paths.
Also, remove the #if-0-ed IS_VALID_EA() check, it has never been
used.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: new timer API
Based on an idea from Martin Josefsson with the help of
Patrick McHardy and Stephen Hemminger:
introduce the mod_timer_pending() API which is a mod_timer()
offspring that is an invariant on already removed timers.
(regular mod_timer() re-activates non-pending timers.)
This is useful for the networking code in that it can
allow unserialized mod_timer_pending() timer-forwarding
calls, but a single del_timer*() will stop the timer
from being reactivated again.
Also while at it:
- optimize the regular mod_timer() path some more, the
timer-stat and a debug check was needlessly duplicated
in __mod_timer().
- make the exports come straight after the function, as
most other exports in timer.c already did.
- eliminate __mod_timer() as an external API, change the
users to mod_timer().
The regular mod_timer() code path is not impacted
significantly, due to inlining optimizations and due to
the simplifications.
Based-on-patch-from: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Convert arch/powerpc/ over to long long based u64:
-#ifdef __powerpc64__
-# include <asm-generic/int-l64.h>
-#else
-# include <asm-generic/int-ll64.h>
-#endif
+#include <asm-generic/int-ll64.h>
This will avoid reoccuring spurious warnings in core kernel code that
comes when people test on their own hardware. (i.e. x86 in ~98% of the
cases) This is what x86 uses and it generally helps keep 64-bit code
32-bit clean too.
[Adjusted to not impact user mode (from paulus) - sfr]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This updates the cpufreq drivers in arch/powerpc so they build again
after the core cpufreq changes that broke them in commit
in835481d9bcd65720b473db6b38746a74a3964218.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: build fix
Ingo Molnar wrote:
> tip/arch/blackfin/kernel/irqchip.c: In function 'show_interrupts':
> tip/arch/blackfin/kernel/irqchip.c:85: error: 'struct kernel_stat' has no member named 'irqs'
> make[2]: *** [arch/blackfin/kernel/irqchip.o] Error 1
> make[2]: *** Waiting for unfinished jobs....
>
So could move kstat_irqs array to irq_desc struct.
(s390, m68k, sparc) are not touched yet, because they don't support genirq
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (53 commits)
serial: Add driver for the Cell Network Processor serial port NWP device
powerpc: enable dynamic ftrace
powerpc/cell: Fix the prototype of create_vma_map()
powerpc/mm: Make clear_fixmap() actually work
powerpc/kdump: Use ppc_save_regs() in crash_setup_regs()
powerpc: Export cacheable_memzero as its now used in a driver
powerpc: Fix missing semicolons in mmu_decl.h
powerpc/pasemi: local_irq_save uses an unsigned long
powerpc/cell: Fix some u64 vs. long types
powerpc/cell: Use correct types in beat files
powerpc: Use correct type in prom_init.c
powerpc: Remove unnecessary casts
mtd/ps3vram: Use _PAGE_NO_CACHE in memory ioremap
mtd/ps3vram: Use msleep in waits
mtd/ps3vram: Use proper kernel types
mtd/ps3vram: Cleanup ps3vram driver messages
mtd/ps3vram: Remove ps3vram debug routines
mtd/ps3vram: Add modalias support to the ps3vram driver
mtd/ps3vram: Add ps3vram driver for accessing video RAM as MTD
powerpc: Fix iseries drivers build failure without CONFIG_VIOPATH
...
When I review ocfs2 code, find there are 2 typos to "successfull". After
doing grep "successfull " in kernel tree, 22 typos found totally -- great
minds always think alike :)
This patch fixes all the similar typos. Thanks for Randy's ack and comments.
Signed-off-by: Coly Li <coyli@suse.de>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
in/out_be64() work on u64s.
The first parameter to ppc_md.ioremap is a phys_addr_t.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Only pass the address of a u64 if that is what the function requires.
[Split out of a larger patch - sfr]
[update comment - sfr]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
... and don't bother in callers. Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.
i_mode is not worth the effort; it has no common default value.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits)
powerpc/44x: Support 16K/64K base page sizes on 44x
powerpc: Force memory size to be a multiple of PAGE_SIZE
powerpc/32: Wire up the trampoline code for kdump
powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M
powerpc/32: Allow __ioremap on RAM addresses for kdump kernel
powerpc/32: Setup OF properties for kdump
powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs()
powerpc: Prepare xmon_save_regs for use with kdump
powerpc: Remove default kexec/crash_kernel ops assignments
powerpc: Make default kexec/crash_kernel ops implicit
powerpc: Setup OF properties for ppc32 kexec
powerpc/pseries: Fix cpu hotplug
powerpc: Fix KVM build on ppc440
powerpc/cell: add QPACE as a separate Cell platform
powerpc/cell: fix build breakage with CONFIG_SPUFS disabled
powerpc/mpc5200: fix error paths in PSC UART probe function
powerpc/mpc5200: add rts/cts handling in PSC UART driver
powerpc/mpc5200: Make PSC UART driver update serial errors counters
powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver
powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver
...
Fix trivial conflict in drivers/char/Makefile as per Paul's directions
Impact: New APIs
The old node_to_cpumask/node_to_pcibus returned a cpumask_t: these
return a pointer to a struct cpumask. Part of removing cpumasks from
the stack.
(Also replaces powerpc internal uses of node_to_cpumask).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since the QPACE (Chromodynamics Parallel Computing on the
Cell Broadband Engine) platform doesn't use a iommu, doesn't
have PCI devices and a MPIC much lesser setup and
configurations are needed. So far all devices are detected
as OF device. A notifier function is used to set the dma_ops
for the of_platform bus. Further this patch splits the
PPC_CELL_NATIVE into PPC_CELL_COMMON which are parts that are
shared with the QPACE platform and the rest.
Signed-off-by: Benjamin Krill <ben@codiert.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
CBE_THERM and OPROFILE_CELL both cannot be built without
SPU_FS disabled, so make the dependency explicit.
Reported-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in
in the hash code based on some CPU feature bit. We also manipulate
_PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places.
This changes the logic so that instead, the PTE now contains
_PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms
that need it. The hash code clears it if the feature bit is not set.
It also adds some clean accessors to setup various valid combinations
of access flags and change various bits of code to use them instead.
This should help having the PTE actually containing the bit
combinations that we really want.
I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead
set it explicitely from the TLB miss. I will ultimately remove it
completely as it appears that it might not be needed after all
but in the meantime, having it in the TLB miss makes things a
lot easier.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit d015fe995 'powerpc/cell/axon-msi: Retry on missing interrupt'
has turned a rare failure to kexec on QS22 into a reproducible
error, which we have now analysed.
The problem is that after a kexec, the MSIC hardware still points
into the middle of the old ring buffer. We set up the ring buffer
during reboot, but not the offset into it. On older kernels, this
would cause a storm of thousands of spurious interrupts after a
kexec, which would most of the time get dropped silently.
With the new code, we time out on each interrupt, waiting for
it to become valid. If more interrupts come in that we time
out on, this goes on indefinitely, which eventually leads to
a hard crash.
The solution in this commit is to read the current offset from
the MSIC when reinitializing it. This now works correctly, as
expected.
Reported-by: Dirk Herrendoerfer <d.herrendoerfer@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Conflicts:
fs/nfsd/nfs4recover.c
Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().
Signed-off-by: James Morris <jmorris@namei.org>
An earlier patch from Jens Osterkamp attempted to fix GDB
watchpoints by enabling the DABRX register at boot time.
Unfortunately, this did not work on SMP setups, where
secondary CPUs were still using the power-on DABRX value.
This introduces the same change for secondary CPUs on cell
as well.
Reported-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Tested-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The MSI capture logic for the axon bridge can sometimes
lose interrupts in case of high DMA and interrupt load,
when it signals an MSI interrupt to the MPIC interrupt
controller while we are already handling another MSI.
Each MSI vector gets written into a FIFO buffer in main
memory using DMA, and that DMA access is normally flushed
by the actual interrupt packet on the IOIF. An MMIO
register in the MSIC holds the position of the last
entry in the FIFO buffer that was written. However,
reading that position does not flush the DMA, so that
we can observe stale data in the buffer.
In a stress test, we have observed the DMA to arrive
up to 14 microseconds after reading the register.
This patch works around this problem by retrying the
access to the FIFO buffer.
We can reliably detect the conditioning by writing
an invalid MSI vector into the FIFO buffer after
reading from it, assuming that all MSIs we get
are valid. After detecting an invalid MSI vector,
we udelay(1) in the interrupt cascade for up to
100 times before giving up.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, we can end up in an infinite loop if we get a signal
while the kernel has faulted in spufs_ps_fault. Eg:
alarm(1);
write(fd, some_spu_psmap_register_address, 4);
- the write's copy_from_user will fault on the ps mapping, and
signal_pending will be non-zero. Because returning from the fault
handler will never clear TIF_SIGPENDING, so we'll just keep faulting,
resulting in an unkillable process using 100% of CPU.
This change returns VM_FAULT_SIGBUS if there's a fatal signal pending,
letting us escape the loop.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Conflicts:
security/keys/internal.h
security/keys/process_keys.c
security/keys/request_key.c
Fixed conflicts above by using the non 'tsk' versions.
Signed-off-by: James Morris <jmorris@namei.org>
Pass credentials through dentry_open() so that the COW creds patch can have
SELinux's flush_unauthorized_files() pass the appropriate creds back to itself
when it opens its null chardev.
The security_dentry_open() call also now takes a creds pointer, as does the
dentry_open hook in struct security_operations.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: James Morris <jmorris@namei.org>
This fixes this error on Cell when CONFIG_KEXEC = n:
arch/powerpc/platforms/cell/ras.c:299: error: implicit declaration of function 'crash_shutdown_register'
We have to include <asm/kexec.h> because it contains the dummy
definition of crash_shutdown_register that is used when
CONFIG_KEXEC=n, but <linux/kexec.h> doesn't include <asm/kexec.h> in
that case.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
After the merge of the 32 and 64bit DMA code, dma_direct_ops lost
their map/unmap_single() functions but gained map/unmap_page(). This
caused a problem for Cell because Cell's dma_iommu_fixed_ops called
the dma_direct_ops if the fixed linear mapping was to be used or the
iommu ops if the dynamic window was to be used. So in order to fix
this problem we need to update the 64bit DMA code to use
map/unmap_page.
First, we update the generic IOMMU code so that iommu_map_single()
becomes iommu_map_page() and iommu_unmap_single() becomes
iommu_unmap_page(). Then we propagate these changes up through all
the callers of these two functions and in the process update all the
dma_mapping_ops so that they have map/unmap_page rahter than
map/unmap_single. We can do this because on 64bit there is no HIGHMEM
memory so map/unmap_page ends up performing exactly the same function
as map/unmap_single, just taking different arguments.
This has no affect on drivers because the dma_map_single_attrs() just
ends up calling the map_page() function of the appropriate
dma_mapping_ops and similarly the dma_unmap_single_attrs() calls
unmap_page().
This fixes an oops on Cell blades, which oops on boot without this
because they call dma_direct_ops.map_single, which is NULL.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
linux/crash_dump.h defines is_kdump_kernel() to be used by code that
needs to know if the previous kernel crashed instead of a (clean) boot
or reboot.
This updates the just added powerpc code to use it. This is needed
for the next commit, which will remove __kdump_flag.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds relocatable kernel support for kdump. With this one can
use the same regular kernel to capture the kdump. A signature (0xfeed1234)
is passed in r6 from panic code to the next kernel through kexec_sequence
and purgatory code. The signature is used to differentiate between
kdump kernel and non-kdump kernels.
The purgatory code compares the signature and sets the __kdump_flag in
head_64.S. During the boot up, kernel code checks __kdump_flag and if it
is set, the kernel will behave as relocatable kdump kernel. This kernel
will boot at the address where it was loaded by kexec-tools ie. at the
address reserved through crashkernel boot parameter.
CONFIG_CRASH_DUMP depends on CONFIG_RELOCATABLE option to build kdump
kernel as relocatable. So the same kernel can be used as production and
kdump kernel.
This patch incorporates the changes suggested by Paul Mackerras to avoid
GOT use and to avoid two copies of the code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Mohan Kumar M <mohan@in.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We used to assume that even numbered threads were the primary
threads, ie those that would be listed and started as a cpu from
open firmware. Replace a left over is even (% 2) check with a check
for it being a primary thread and update the comments.
Tested with a debug print on pseries, identical code found for cell.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds a comment to clarify why atomic_dec_if_positive is being used
to decrement gang's aff_sched_count on SPU context unbind.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
This patch improves redability of the code responsible for trying to find
a node with enough SPUs not committed to other affinity gangs.
An additional check is also added, to avoid taking into account gangs that
have no SPU affinity.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
With most file readers (eg cat, dd), reading a context's regs file will
result in two reads: the first to read the data, and the second to
return EOF. Because each read performs a spu_acquire_saved, we end up
descheduling and re-scheduling the context twice.
This change does a simple check to see if we'd return EOF before
calling spu_acquire_saved(), saving the extra schedule operation.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, read() on the sputrace log will block until the read buffer
is full. This makes it difficult to retrieve the end of the buffer, as
the user will need to read with the right-sized buffer.
In a similar method as 91553a1b5e0df006a3573a88d98ee7cd48a3818a, this
change makes the switch_log return if there has already been data
read.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we use ctx->mapping_lock and ctx->switch_log->lock for the
context switch log. The mapping lock only prevents concurrent open()s,
so we require the switch_lock->lock for reads.
Since writes to the switch log buffer occur on context switches, we're
better off synchronising with the state_mutex, which is held during a
switch. Since we're serialised througout the buffer reads and writes,
we can use the state mutex to protect open and release too, and
can now kfree() the log buffer on release. This allows us to perform
the switch log notify without taking any extra locks.
Because the buffer is only present while the file is open, we can use
it to prevent multiple simultaneous openers.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, read() on the sputrace buffer will only return data when
the user buffer is exhausted. This may mean that we never see the
end of the event log, unless we read() with exactly the right-sized
buffer.
This change makes sputrace_read not block if we have data ready to
return.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, sputrace will start logging to the event buffer before the
log buffer has been open()ed. This results in a heap of "lost samples"
warnings if the sputrace file hasn't yet been opened.
Since the buffer is reset on open() anyway, there's no need to enable
logging when no-one has opened the log.
Because open clears the log, make it return EBUSY for mutliple open
calls.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We need a marker_synchronize_unregister() before the end of exit() to make sure
every probe callers have exited the non preemptible section and thus are not
executing the probe code anymore.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.
This was posted for review some time ago and I believe its been in -mm
since then.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A mutex_unlock(&gang->aff_mutex) in spufs_create_context() is missing
in case spufs_context_open() fails. As a result, spu_create syscall
and spu_get_idle() may block.
This patch adds the mutex_unlock.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Andre Detsch <adetsch@br.ibm.com>
Currently, an empty spufs root inode has nlink count of 1. However,
the directory has two links; / -> spu and /spu/ -> .
This change increments the link count of the root inode in spufs.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Use the struct device's numa_node instead; use accessor functions
to get/set numa_node.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
We currently have a race when scheduling a context to a SPE -
after we have found a runnable context in spusched_tick, the same
context may have been scheduled by spu_activate().
This may result in a panic if we try to unschedule a context that has
been freed in the meantime.
This change exits spu_schedule() if the context has already been
scheduled, so we don't end up scheduling it twice.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We currently have a race for a free SPE. With one thread doing a
spu_yield(), and another doing a spu_activate():
thread 1 thread 2
spu_yield(oldctx) spu_activate(ctx)
__spu_deactivate(oldctx)
spu_unschedule(oldctx, spu)
spu->alloc_state = SPU_FREE
spu = spu_get_idle(ctx)
- searches for a SPE in
state SPU_FREE, gets
the context just
freed by thread 1
spu_schedule(ctx, spu)
spu->alloc_state = SPU_USED
spu_schedule(newctx, spu)
- assumes spu is still free
- tries to schedule context on
already-used spu
This change introduces a 'free_spu' flag to spu_unschedule, to indicate
whether or not the function should free the spu after descheduling the
context. We only set this flag if we're not going to re-schedule
another context on this SPU.
Add a comment to document this behaviour.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Commit 8d5636fbca introduced a reference
count on SPU contexts during find_victim, but this may cause a leak in
the reference count if we later find a better contender for a context to
unschedule.
Change the reference to after we've found our victim context, so we
don't do the extra get_spu_context().
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Based on an original patch from Christoph Hellwig <hch@lst.de>.
Currently, there is a possible reference-after-free in the spusched
code - contexts may be freed after we have released their state_mutex
in spusched_tick and find_victim.
This change takes a reference to the context before releasing the
mutex, so that the context doesn't get destroyed.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, spu_run ignores the npc argument for contexts created with
SPU_CREATE_NOSCHED. While this is correct for isolated contexts,
there's no need to enforce the npc restriction on non-isolated NOSCHED
contexts.
This means that NOSCHED contexts can only ever run with an entry point
of 0x0.
This change to spu_run_init allows setting of the npc (and, while we're
at it, the privcntl) for non-isolated NOSCHED contexts. This allows
us to run NOSCHED contexts from any entry point.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Ingo Molnar provided a fix to not call _PPC at processor driver
initialization time in "[PATCH] ACPI: fix cpufreq regression" (git
commit e4233dec74)
But it can still happen that _PPC is called at processor driver
initialization time.
This patch should make sure that this is not possible anymore.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.
Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.c
This is flag day, yes.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:
This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread). So I
CC'ed this to KVM camp. Comments are appreciated.
A pointer to dma_mapping_ops to struct dev_archdata is added. If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's
NULL, the system-wide dma_ops pointer is used as before.
If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging). It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.
The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations. So x86 can't have dma_mapping_ops per
device. Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.
The first patch adds the device argument to dma_mapping_error. The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.
This patch:
dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations. So we can't have dma_mapping_ops per device.
Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device
argument.
[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To support Cooperative Memory Overcommitment (CMO), we need to check
for failure from some of the tce hcalls.
These changes for the pseries platform affect the powerpc architecture;
patches for the other affected platforms are included in this patch.
pSeries platform IOMMU code changes:
* platform TCE functions must handle H_NOT_ENOUGH_RESOURCES errors and
return an error.
Architecture IOMMU code changes:
* Calls to ppc_md.tce_build need to check return values and return
DMA_MAPPING_ERROR for transient errors.
Architecture changes:
* struct machdep_calls for tce_build*_pSeriesLP functions need to change
to indicate failure.
* all other platforms will need updates to iommu functions to match the new
calling semantics; they will return 0 on success. The other platforms
default configs have been built, but no further testing was performed.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
At the moment the fixed mapping is by default strongly ordered (the
iommu_fixed=weak boot option must be used to make the fixed mapping weakly
ordered). If we're on a setup where the southbridge is being used in
endpoint mode (triblade and CAB boards) the default should be a weakly
ordered fixed mapping.
This adds a check so that if a node of type pcie-endpoint can be found in
the device tree the fixed mapping is set to be weak by default (but can be
overridden using iommu_fixed=strong).
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This uses the new vm_ops->access to allow gdb to access the SPU local
store. We currently prevent access to problem state registers, this can
be done later if really needed but it's safer not to.
[akpm@linux-foundation.org: fix typo]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adjusts the placement of a reference context from
a spu affinity chain. The reference context can now be placed
only on nodes that have enough spus not intended to be used by
another gang (already running on the node).
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currenlt,, it is possible to lock aff_mutex and
cbe_spu_info[n].list_mutex in different orders, allowing a deadlock to
occur. With this change, aff_mutex is not taken within a list_mutex
critical section anymore.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
kcalloc is supposed to be called with the count as its first argument and
the element size as the second.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (49 commits)
powerpc: Fix build bug with binutils < 2.18 and GCC < 4.2
powerpc/eeh: Don't panic when EEH_MAX_FAILS is exceeded
fbdev: Teaches offb about palette on radeon r5xx/r6xx
powerpc/cell/edac: Log a syndrome code in case of correctable error
powerpc/cell: Add DMA_ATTR_WEAK_ORDERING dma attribute and use in Cell IOMMU code
powerpc: Indicate which oprofile counters to use while in compat mode
powerpc/boot: Change spaces to tabs
powerpc: Remove duplicate 6xx option in Kconfig
powerpc: Use PPC_LONG and PPC_LONG_ALIGN in lib/string.S
powerpc: Use PPC_LONG_ALIGN in uaccess.h
powerpc: Add a #define for aligning to a long-sized boundary
powerpc: Fix OF parsing of 64 bits PCI addresses
powerpc: Use WARN_ON(1) instead of __WARN()
powerpc: Fix support for latencytop
powerpc/ps3: Update ps3_defconfig
powerpc/ps3: Add a sub-match id to ps3_system_bus
powerpc: Add a 6xx defconfig
powerpc/dma: Use the struct dma_attrs in iommu code
powerpc/cell: Add support for power button of future IBM cell blades
powerpc/cell: Cleanup sysreset_hack for IBM cell blades
...
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.
I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.
I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.
Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Introduce a new dma attriblue DMA_ATTR_WEAK_ORDERING to use weak ordering
on DMA mappings in the Cell processor. Add the code to the Cell's IOMMU
implementation to use this code.
Dynamic mappings can be weakly or strongly ordered on an individual basis
but the fixed mapping has to be either completely strong or completely weak.
This is currently decided by a kernel boot option (pass iommu_fixed=weak
for a weakly ordered fixed linear mapping, strongly ordered is the default).
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Update iommu_alloc() to take the struct dma_attrs and pass them on to
tce_build(). This change propagates down to the tce_build functions of
all the platforms.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds support for the power button on future IBM cell blades.
It actually doesn't shut down the machine. Instead it exposes an
input device /dev/input/event0 to userspace which sends KEY_POWER
if power button has been pressed.
haldaemon actually recognizes the button, so a plattform independent acpid
replacement should handle it correctly.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds a config option for the sysreset_hack used for
IBM Cell blades. The code is moves from pervasive.c into ras.c and
gets it's own init method.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds a cpufreq governor that takes the number of running spus
into account. It's very similar to the ondemand governor, but not as complex.
Instead of hacking spu load into the ondemand governor it might be easier to
have cpufreq accepting multiple governors per cpu in future.
Don't know if this is the right way, but it would keep the governors simple.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It is okay for both _PAGE_GUARDED and _PAGE_COHERENT (G and M) to be set
in the same pte. In fact, even if that were not the case, there doesn't
seem to be any place where G is set without also setting I (_PAGE_NO_CACHE),
so the test for I is sufficient as a condition to clear _PAGE_COHERENT
when filling the hash table.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Make cell_dma_dev_setup_iommu() return a pointer to the struct iommu_table
(or NULL if no table can be found) rather than putting this pointer into
dev->archdata.dma_data (let the caller do that), and rename this function
to cell_get_iommu_table() to reflect this change.
This will allow us to get the iommu table for a device that doesn't have
the table in the archdata.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
As nr_active counter includes also spus waiting for syscalls to return
we need a seperate counter that only counts spus that are currently running
on spu side. This counter shall be used by a cpufreq governor that targets
a frequency dependent from the number of running spus.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, the .ctx debug file in spu context directories is always
present.
We'd prefer to prevent users from relying on this file, so add a
"debug" mount option to spufs. The .ctx file will only be added to
the context directories when this option is present.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Populate the size member of a few context files. Leave out files that
have different semantics with read vs mmap, or contain a
variable-length hex string.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, spufs never specifies the i_size for the files in context
directories, so stat() always reports 0-byte files.
This change adds allows the spufs_dir_(nosched_)contents arrays to
specify a file size. This allows stat() to report correct file sizes,
and makes SEEK_END work.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
An spu context shouldn't get an extra tick if the time slice code
couldn't find something else to run. This means contexts that are not
within spu_run (ie, SPU_SCHED_SPU_RUN is cleared) will not receive
extra ticks while we have no other contexts waiting.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Add a ctxt file to spufs that shows spu context information that is used
in scheduling. This info can be used for debugging spufs scheduler
issues, and to isolate between application and spufs problems as it
shows a lot of state such as priorities and dispatch counts.
This file contains internal spufs state and is subject to change at any
time, and therefore no applications should depend on it. The file is
intended for the use of spufs kernel developers.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We need to disable ptcal before starting a new kernel after a crash,
in order to avoid overwriting data in the kdump kernel.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This converts ppc to use the new helpers for smp_call_function() and
friends, and adds support for smp_call_function_single().
ppc loses the timeout functionality of smp_call_function_mask() with
this change, as the generic code does not provide that.
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
There is a delay in the transition to the stopped state for class 2
interrupts. In some cases, the controlling thread detects the state of
the spu as running, and goes back to sleep resulting in a hung
application as the event is missed.
This change detects the stop condition and re-generates the wakeup event
after a context save.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Time slicing can occur at the same time as spu exception handling
resulting in the wakeup of the wrong thread.
This change uses the the spu's register_lock to enforce synchronization
between bind/unbind and spu exception handling so that they are
mutually exclusive.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
According to the CBEA, the SPU dsisr is not updated for class 0
exceptions.
spu_stopped() is testing the dsisr that was passed to it from the class
0 exception handler, so we return a false positive here.
This patch cleans up the interrupt handler and erroneous tests in
spu_stopped. It also removes the fields from the csa since it is not
needed to process class 0 events.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
If the spu is stopping (ie, the SPU_STATUS_RUNNING bit is still set),
re-read the register to get the final stopped value.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
When I changed irq_alloc_host() to take an of_node
(52964f87c6: "Add an optional
device_node pointer to the irq_host"), I botched the reference
counting semantics.
Stephen pointed out that it's irq_alloc_host()'s business if
it needs to take an additional reference to the device_node,
the caller shouldn't need to care.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If we do the call to irq_of_parse_and_map() first, then we don't
need to worry about freeing the irq_host.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds some debugging code to the Axon MSI driver. It creates a
file per MSIC in /sys/kernel/debug/powerpc, which allows the user to
trigger a fake MSI interrupt by writing to the file.
This can be used to test some of the MSI generation path. In
particular, that the MSIC recognises a write to the MSI address,
generates an interrupt and writes the MSI packet into the ring buffer.
All the code is inside #ifdef DEBUG so it causes no harm unless it's
enabled.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix following warnings:
WARNING: arch/powerpc/platforms/cell/built-in.o(.devinit.text+0x9c): Section mismatch in reference from the function .cell_setup_phb() to the function .init.text:.iowa_register_bus()
WARNING: arch/powerpc/platforms/cell/built-in.o(.devinit.text+0xa4): Section mismatch in reference from the function .cell_setup_phb() to the function .init.text:.io_workaround_init()
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Signed-off-by: Paul Mackerras <paulus@samba.org>
With CONFIG_VIRT_CPU_ACCOUNTING disabled, I got the following error:
linux-2.6/arch/powerpc/platforms/cell/spufs/file.c: In function 'spu_switch_log_notify':
linux-2.6/arch/powerpc/platforms/cell/spufs/file.c:2542: error: implicit declaration of function 'get_tb'
make[4]: *** [arch/powerpc/platforms/cell/spufs/file.o] Error 1
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If victim (not ctx) is in spu_run, add victim to rq.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We need to acquire the parent i_mutex with I_MUTEX_PARENT to keep
lockdep happy.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We should not requeue the victim context in find_victim if the owner is
not in spu_run. It's first not needed because leaving the context on
the spu is an optimization and second is harmful because it means the
owner could re-enter spu_run when the context is on the runqueue and
trip the BUG_ON in __spu_update_sched_info.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Creating a spufs context or gand using spu_create should send an inotify
event so that things like performance monitors have an easy way to find
out about newly created contexts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, page fault handlers don't issue a mfc restart if the context
switch pending flag is set, which can leave us with a hanging DMA after
a context restore.
This patch introduces fault pending flag that is set by the fault
handler and read by the context switch code, so that the latter can add
the restart bit at the right spot, after it has successfuly saved the
state of the mfc control register.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
SPU class 0 & 1 exceptions may occur in parallel, so we may end up
overwriting csa.dsisr.
This change adds dedicated fields for each class to the spu and the spu
context so that fault data is not overwritten.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we re-route SPU interrupts to the current cpu, which may be
on a remote node. In the case of time slicing, all spu interrupts will
end up routed to the same cpu, where the spusched_tick occurs.
This change routes mfc interrupts to the cpu where the controlling
thread last ran, provided that cpu is on the same node as the spu
(otherwise don't reroute interrupts).
This should improve performance and provide a more predictable
environment for processing spu exceptions. In the past we have seen
concurrent delivery of spu exceptions to two cpus. This eliminates that
concern.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
synchronize_irq() provides the serialization for
SPU_CONTEXT_SWITCH_PENDING which is read with a simple load. This
routine guarantees that the relevant interrupt handlers are not running,
so that the next time they do run they will see the update
memory value.
This must be done correctly so that exception handling code does not
restart the mfc in the middle of a context switch while we are trying
to atomically stop it and save state.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
There's currently no way to tell if spu_process_callback has
returned with the state mutex held, as -EINTR may be returned
by either the syscall or the spu_acquire fail case.
Instead, just do a non-interruptible mutex_lock here.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we update the SPU master run control bit (ie,
spu_enable_spu) in spufs_run_spu before we grab the context mutex. This
can result in races with other processes accessing this context's
resources.
This change moves the spu_enable_spu to after we have acquired the
context lock.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We currently have two issues with the MFC save code:
* save_mfc_decr doesn't handle a transition of 1 -> 0 of the Ds bit
* The Q bit may be stale in the CSA
This change fixes the first issue by clearing the relevant bits from
the MFC_CNTL value in the CSA before or-ing in the updated status.
Also, we add the Q bit to the updated status.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we can introduce invalid entries into the MFC queues:
1) context starts a DMA
2) context gets scheduled out during a DMA
- kernel saves MFC queue to CSA
- kernel saves 0x0 in csa->mfc_control_RW
3) context gets scheduled in
- csa->mfc_control[Q] ('queues empty') isn't set, so DMA queues are
restored from the CSA
4) context's DMA is completed
5) context gets scheduled out again, no DMA occuring this time
- kernel sees that MFC_CNTL[Q] ('queues empty') is set, so doesn't
touch saved queue data in CSA
- kernel saves 0x0 in csa->mfc_control_RW
6) context gets scheduled in
- csa->mfc_control[Q] ('queues empty') isn't set (we saved is as 0!),
so DMA queues are restored from the CSA
In this last restore, we've restored the queue status from step 2,
which are now invalid.
This change makes save_mfc_cntl() closer to the save/restore sequence,
as specified in the CBE handbook.
With changes from Luke Browning.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
When we issue a MFC purge request, we may inadvertantly clear the
suspended status.
This change adds the MFC_CNTL_SUSPEND_MASK when we issue a purge
request, so that the suspend bit is masked out.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We may currently lose interrupts during SPE context switch, as we alter
the INT_Route register. Because the IIC uses a per-thread priority
status, changing the interrupt routing to a different thread means that
the IRQ is no longer masked by the priority status, so we end up with
two fasteoi IRQ handlers executing for the one irq_desc. The fasteoi
handler doesn't handle multiple IRQs, so drops the second one.
Fix this by using our own flow handler. This is based on
handle_edge_irq, but issues an eoi after IRQs are handled, and doesn't
do any mask/unmasking.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The sputrace module contained a trace entry for spu_acquire_saved, but
this marker was not placed anywhere. Fix this by adding a marker to the
routine.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Fix a typo in the marker for the find_victim function, which prevented
it from being traced. It previously read find_vitim.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The sputrace module contained a reference to a marker for
destroy_spu_context, but this marker did not appear in the code. Fix
this by adding a marker in the function.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The markers facility defines the marker parameters to be of the form
'name %format'. Add parameter names to sputrace, to specify the context
and %spu paramerters, instead of just specifying the '%format' part.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
There are userspace instrumentation tools that need to monitor spu
context switches. This patch adds a new file called 'switch_log' to
each spufs context directory that can be used to monitor the context
switches.
Context switch in/out and exit from spu_run are monitored after the
file was first opened and can be read from it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.
Add correct ->owner to proc_fops to fix reading/module unloading race.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds support for PCI Express port on Celleb. I/O space of this
PCI Express port is not mapped in memory space. So we use the
io-workaround mechanism to make accesses indirect.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves miscellaneous files for Beat into platforms/cell/.
All files in this patch are used by celleb-beat only.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves SPU support code on Beat into platforms/cell/.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves files for mmu and iommu on Beat into platforms/cell/.
All files in this patch are used by celleb-beat only.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves files for Beat hvcall interfaces into platforms/cell/.
All files in this patch are used by celleb-beat only.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves the SCC (Super Companion Chip) related code for celleb
into platforms/cell/.
All files in this patch are used by celleb-beat and celleb-native
commonly.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves the base code for celleb support into platforms/cell/.
All files in this patch are used by celleb-beat and celleb-native
commonly.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now, we can use generic io-workarounds mechanism and the workaround
code for spider-pci. This changes Celleb PCI code to use spider-pci
code.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This splits cell io-workaround code into spider-pci dependent code and
a generic part, and also moves io-workarounds initialization into
cell_setup_phb.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Replace two open-coded occurences of the of_get_next_parent() logic.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (202 commits)
[POWERPC] Fix compile breakage for 64-bit UP configs
[POWERPC] Define copy_siginfo_from_user32
[POWERPC] Add compat handler for PTRACE_GETSIGINFO
[POWERPC] i2c: Fix build breakage introduced by OF helpers
[POWERPC] Optimize fls64() on 64-bit processors
[POWERPC] irqtrace support for 64-bit powerpc
[POWERPC] Stacktrace support for lockdep
[POWERPC] Move stackframe definitions to common header
[POWERPC] Fix device-tree locking vs. interrupts
[POWERPC] Make pci_bus_to_host()'s struct pci_bus * argument const
[POWERPC] Remove unused __max_memory variable
[POWERPC] Simplify xics direct/lpar irq_host setup
[POWERPC] Use pseries_setup_i8259_cascade() in pseries_mpic_init_IRQ()
[POWERPC] Turn xics_setup_8259_cascade() into a generic pseries_setup_i8259_cascade()
[POWERPC] Move xics_setup_8259_cascade() into platforms/pseries/setup.c
[POWERPC] Use asm-generic/bitops/find.h in bitops.h
[POWERPC] 83xx: mpc8315 - fix USB UTMI Host setup
[POWERPC] 85xx: Fix the size of qe muram for MPC8568E
[POWERPC] 86xx: mpc86xx_hpcn - Temporarily accept old dts node identifier.
[POWERPC] 86xx: mark functions static, other minor cleanups
...
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At present, ppu-gdb can't trace spu infomation with coredump generated
by the kernel. While the core dumps notes have correct contents, they
have the wrong names, as the file descriptors used to generate the note
names are off-by-one. An application that opens a SPE context as fd 3,
the current core dump code will generate notes like:
SPU/4/mem
SPU/4/regs
etc.
This confuses GDB, which knows it is looking for SPE context 3 (from
parsing the spu_context_run system call arguments), and cannot find
any notes that match context 3.
This change corrects the file descriptor counting, to only increment
the fd until after we've written the note name.
Signed-off-by: Gerhard Stenzel <stenzel@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
During the context save process, we currently save the MFC command
channel after purging the MFC queues. This causes a systemsim warning,
as the command channel may be in an unknown state after the purge.
This change does the save before purging the MFC queues.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
During spu_process callback, we release then acquire the SPU, but keep a
pointer to the local store memory. Since the context may have been
scheduled out during the callback, the ls pointer may become invalid.
This change reacquires the pointer to the context local store after
spu_acquire()-ing, so that it isn't invalidated by a context switch.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
All of the single-value files in spufs are terminated by a newline,
except for signal1_type and signal2_type.
This change adds a trailing newline to these two files.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The PCI bridge representing the PCIE root complex on Axon, contains
device BARs for a memory range and ROM that define inbound accesses.
This confuses the kernel resource management code -- the resources
need to be hidden when Axon is a host bridge.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The cell IOMMU code to parse the dma-ranges properties, used for the fixed
mapping, was broken in two ways for some devices.
Firstly it didn't cope with empty dma-ranges properties. An empty property
implies no translation so can be safely skipped.
The code also wrongly assumed it would be looking at PCI devices, and hard
coded the number of address and size cells.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At present, we can hit the BUG_ON in __spu_update_sched_info by reading
the regs file of a context between two calls to spu_run. The
spu_release_saved called by spufs_regs_read() is resulting in the (now
non-runnable) context being placed back on the run queue, so the next
call to spu_run ends up in the bug condition.
This change uses the SPU_SCHED_SPU_RUN flag to only reschedule a context
if it's still in spu_run().
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
commit 4ef11014 introduced a usage of SCHED_IDLE to detect when
a context is within spu_run.
Instead of SCHED_IDLE (which has other meaning), add a flag to
sched_flags to tell if a context should be running.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The only tricky part is we need to adjust the PTE insertion loop to
cater for holes in the page table. The PTEs for each segment start on
a 4K boundary, so with 16M pages we have 16 PTEs per segment and then
a gap to the next 4K page boundary.
It might be possible to allocate the PTEs for each segment separately,
saving the memory currently filling the gaps. However we'd need to
check that's OK with the hardware, and that it actually saves memory.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Make some preliminary changes to cell_iommu_alloc_ptab() to allow it to
take the page size as a parameter rather than assuming IOMMU_PAGE_SIZE.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
We use n_pte_pages to calculate the stride through the page tables, but
we also use it to set the NPPT value in the segment table entry. That is
defined as the number of 4K pages per segment, so we should calculate
it as such regardless of the IOMMU page size.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Currently the cell IOMMU code allocates the entire IOMMU page table in a
contiguous chunk. This is nice and tidy, but for machines with larger
amounts of RAM the page table allocation can fail due to it simply being
too large.
So split the segment table and page table setup routine, and arrange to
have the dynamic and fixed page tables allocated separately.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
There's no need to allocate the pad page unless we're going to actually
use it - so move the allocation to where we know we're going to use it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The cell IOMMU code no longer needs to save the pte_offset variable
separately, it is incorporated into tbl->it_offset.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The cell IOMMU tce build and free routines use pte_offset to convert
the index passed from the generic IOMMU code into a page table offset.
This takes into account the SPIDER_DMA_OFFSET which sets the top bit
of every DMA address.
However it doesn't cater for the IOMMU window starting at a non-zero
address, as the base of the window is not incorporated into pte_offset
at all.
As it turns out tbl->it_offset already contains the value we need, it
takes into account the base of the window and also pte_offset. So use
it instead!
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
It's called the fixed mapping, not the static mapping.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Ulrich Weigand has found that the hardware watchpoints on cell were not
working back in November :
http://ozlabs.org/pipermail/linuxppc-dev/2007-November/046135.html
This patch sets them during initialization.
Signed-off-by: Jens Osterkamp <jens@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The spu_runcntl_RW register is restored within spu_restore function.
So, at the end of spu_bind_context, the SPU context is not just loaded,
but running.
This change corrects the state switch to account the time as USER.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
There is a potential race between flushes of the entire SLB in the MFC
and the point where new entries are being established. The problem is
that we might put a ESID entry into the MFC SLB when the VSID entry has
just been cleared by the global flush.
This can be circumvented by holding the register_lock throughout both
the flushing and the creation of SLB entries.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
When we replace an SLB entry in the MFC after using up all the available
entries, there is a short window in which an incorrect entry is marked
as valid.
The problem is that the 'valid' bit is stored in the ESID, which is
always written after the VSID. Overwriting the VSID first will make the
original ESID entry point to the new VSID, which means that any
concurrent DMA accessing the old ESID ends up being redirected to the
new virtual address. A few cycles later, we write the new ESID and
everything is fine again.
That race can be closed by writing a zero entry to the ESID first, which
makes sure that the VSID is not accessed until we write the new ESID.
Note that we don't actually need to invalidate the SLB entry using the
invalidation register, which would also flush any ERAT entries for that
segment, because the segment translation does not become invalid but is
only removed from the SLB cache.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
There is a small race between the context save procedure
and the SPU interrupt handling, where we expect all interrupt
processing to have finished after disabling them, while
an interrupt is still being processed on another CPU.
The obvious fix is to call synchronize_irq() after disabling
the interrupts at the start of the context save procedure
to make sure we never access the SPU any more during an
ongoing save or even after that.
Thanks to Benjamin Herrenschmidt for pointing this out.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we get the following output from sputrace:
[5.097935954] 1606: spufs_ps_nopfn__enter (thread = 1605, spu = -1)
[5.097958164] 1606: spufs_ps_nopfn__insert (thread = 1605, spu = 15)
[5.097973529] 1607: spufs_ps_nopfn__enter (thread = 1605, spu = -1)
[5.097989174] 1607: spufs_ps_nopfn__insert (thread = 1605, spu = 14)
Which leads me to believe that 160[67] is the current thread ID, and
1605 is the context backing the psmap.
However, the 'current' and 'owner' tids are reversed - the 'current'
tid is on the right. This change puts the current thread ID in the
left-hand column instead, and renames the right to 'ctxthread'.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
At present, we have a situation where a context with no owner is
re-scheduled by spu_forget:
Thread 1: reading regs file Thread 2: context owner
spu_forget()
- ctx->owner = NULL
- set SPU_SCHED_WAS_ACTIVE
spu_acquire_saved()
- context is in saved state
spu_release_saved()
- SPU_SCHED_WAS_ACTIVE is set,
so spu_activate() the context,
which now has no owner
In spu_forget(), we shouldn't be requesting a re-schedule by setting
SPU_SCHED_WAS_ACTIVE. This change removes the set_bit in spu_forget(),
so that spu_release_saved() doesn't reinsert this destroyed context on
to the run queue.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We have a small window where a spu context may be destroyed while
we're servicing a page fault (from another thread) to the context's
problem state mapping.
After we up_read() the mmap_sem, it's possible that the context is
destroyed by its owning thread, and so the later references to ctx
are invalid. This can maifest as a deadlock on the (now free()-ed)
context state mutex.
This change adds a reference to the context before we release the
mmap_sem, so that the context cannot be destroyed.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
At present, the __spufs_trap_data_map and __spu_trap_data_seq functions
exit if spu->flags has the SPU_CONTEXT_SWITCH_ACTIVE set. This was
resulting in suprious returns from these functions, as they may be
legitimately called when we have this bit set.
We only use it in these two sanity checks, so this change removes the
flag completely. This fixes hangs in the page-fault path of SPE apps.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
2.6.25 has a regression where we can starve the scheduler by creating
(N_SPES+1) contexts, then running them one at a time.
The final context will never be run, as the other contexts are loaded on
the SPEs, none of which are repoted as free (ie, spu->alloc_state !=
SPU_FREE), so spu_get_idle() doesn't give us a spu to run on. Because
all of the contexts are stopped, none are descheduled by the scheduler
tick, as spusched_tick returns if spu_stopped(ctx).
This change replaces the spu_stopped() check with checking for SCHED_IDLE
in ctx->policy. We set a context's policy to SCHED_IDLE when we're not
in spu_run(). We also favour SCHED_IDLE contexts when looking for contexts
to unbind, but leave their timeslice intact for later resumption.
This patch fixes the following test in the spufs-testsuite:
tests/20-scheduler/02-yield-starvation
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Remove unused CONFIG_WANT_DEVICE_TREE
[POWERPC] Cell RAS: Remove DEBUG, and add license and copyright
[POWERPC] hvc_rtas_init() must be __init
[POWERPC] free_property() must not be __init
[POWERPC] vdso_do_func_patch{32,64}() must be __init
[POWERPC] Remove generated files on make clean
[POWERPC] Fix arch/ppc compilation - add typedef for pgtable_t
[POWERPC] Wire up new timerfd syscalls
[POWERPC] PS3: Update sys-manager button events
[POWERPC] PS3: Sys-manager code cleanup
[POWERPC] PS3: Use system reboot on restart
[POWERPC] PS3: Fix bootwrapper hang bug
[POWERPC] PS3: Fix reading pm interval in logical performance monitor
[POWERPC] PS3: Fix setting bookmark in logical performance monitor
[POWERPC] Fix DEBUG_PREEMPT warning when warning
* Add path_put() functions for releasing a reference to the dentry and
vfsmount of a struct path in the right order
* Switch from path_release(nd) to path_put(&nd->path)
* Rename dput_path() to path_put_conditional()
[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.
Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
<dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
struct path in every place where the stack can be traversed
- it reduces the overall code size:
without patch series:
text data bss dec hex filename
5321639 858418 715768 6895825 6938d1 vmlinux
with patch series:
text data bss dec hex filename
5320026 858418 715768 6894212 693284 vmlinux
This patch:
Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/powerpc/platforms/cell/ras.c still has DEBUG #defined, which is no
longer necessary. Disable it - this disables two pr_debugs().
While we're there this file should have a copyright notice and license,
so add both.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
RCU style multiple probes support for the Linux Kernel Markers. Common case
(one probe) is still fast and does not require dynamic allocation or a
supplementary pointer dereference on the fast path.
- Move preempt disable from the marker site to the callback.
Since we now have an internal callback, move the preempt disable/enable to the
callback instead of the marker site.
Since the callback change is done asynchronously (passing from a handler that
supports arguments to a handler that does not setup the arguments is no
arguments are passed), we can safely update it even if it is outside the
preempt disable section.
- Move probe arm to probe connection. Now, a connected probe is automatically
armed.
Remove MARK_MAX_FORMAT_LEN, unused.
This patch modifies the Linux Kernel Markers API : it removes the probe
"arm/disarm" and changes the probe function prototype : it now expects a
va_list * instead of a "...".
If we want to have more than one probe connected to a marker at a given
time (LTTng, or blktrace, ssytemtap) then we need this patch. Without it,
connecting a second probe handler to a marker will fail.
It allow us, for instance, to do interesting combinations :
Do standard tracing with LTTng and, eventually, to compute statistics
with SystemTAP, or to have a special trigger on an event that would call
a systemtap script which would stop flight recorder tracing.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Mason <mmlnx@us.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: David Smith <dsmith@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Add arch-specific walk_memory_remove() for 64-bit powerpc
[POWERPC] Enable hotplug memory remove for 64-bit powerpc
[POWERPC] Add remove_memory() for 64-bit powerpc
[POWERPC] Make cell IOMMU fixed mapping printk more useful
[POWERPC] Fix potential cell IOMMU bug when switching back to default DMA ops
[POWERPC] Don't enable cell IOMMU fixed mapping if there are no dma-ranges
[POWERPC] Fix cell IOMMU null pointer explosion on old firmwares
[POWERPC] spufs: Fix timing dependent false return from spufs_run_spu
[POWERPC] spufs: No need to have a runnable SPU for libassist update
[POWERPC] spufs: Update SPU_Status[CISHP] in backing runcntl write
[POWERPC] spufs: Fix state_mutex leaks
[POWERPC] Disable G5 NAP mode during SMU commands on U3
Add a .show_options super operation to spufs.
Use generic_show_options() and save the complete option string in
spufs_fill_super().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
simple_attr_close implementes ->release so it should be named accordingly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <stefano.brivio@polimi.it>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg KH <greg@kroah.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes simple attributes might need to return an error, e.g. for
acquiring a mutex interruptibly. In fact we have that situation in
spufs already which is the original user of the simple attributes. This
patch merged the temporarily forked attributes in spufs back into the
main ones and allows to return errors.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <stefano.brivio@polimi.it>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg KH <greg@kroah.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the cell IOMMU fixed mapping just printks that it's been setup,
which is not particularly useful. Much more interesting is the address
ranges for the different windows. This adds one line to dmesg on a blade.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If we get a 64-bit dma mask we switch to the fixed ops and call
cell_dma_dev_setup(). If the driver then switches back to a 32-bit dma
mask for any reason we don't call cell_dma_dev_setup() again, which
has the potential to leave bogus data in dev->archdata.dma_data.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In order for the cell IOMMU fixed mapping to work we need "dma-ranges"
properties in the device tree. If there are none then there's no point
enabling the fixed mapping support.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The cell IOMMU fixed mapping support has a null pointer bug if you run
it on older firmwares that don't contain the "dma-ranges" properties.
Fix it and convert to using of_get_next_parent() while we're there.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Stop bits are only valid when the running bit is not set. Status bits
carry over from one invocation of spufs_run_spu() to another, so the
RUNNING bit gets added to the previous state of the register which may
have been a remote library call. In this case, it looks like another
library routine should be invoked, but the spe is actually running.
This fixes a problem with a testcase that exercises the scheduler.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We don't need to update the libassist statistic with the context in a
runnable state, so do it after spu_disable_spu().
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, the kernel may fail to restart a SPE context which
has stopped and been swapped out.
This changes spu_backing_runcntl_write to emulate the real
SPU_Status register exactly. When the SPU Run Control register
is written with SPU_RunCntl[Run] set to '1', the physical SPU
automatically sets SPU_Status[R] and clears SPU_Status[CISHP].
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix various state_mutex leaks. The worst one was introduced by the
interrutible state_mutex conversion but there've been a few before
too. Notably spufs_wait now returns without the state_mutex held
when returning an error, which actually cleans up some code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
I got this warning from gcc:
arch/powerpc/platforms/cell/axon_msi.c:118: warning: 'tmp' may be used uninitialized in this function
Which turns out to be a false positive, but pointed out that it was
possible for the error path in find_msi_translator() to do an extra
of_node_put on a node. This fixes it by localising the ref counting
a bit. As a side effect, the warning goes away.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There's a brown-paper-bag bug in axon_msi, we pass the address of our
FIFO directly to the hardware, without DMA mapping it. This leads to
DMA exceptions if you enable MSI & the IOMMU.
The fix is to correctly DMA map the fifo, dma_alloc_coherent() does
what we want - and we need to track the virt & phys addresses.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that we create of_platform devices earlier on cell, we can make the
axon_msi driver an of_platform driver. This makes the code cleaner in
several ways, and most importantly means we have a struct device.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently cell publishes OF devices at device_initcall() time, which
means the earliest a driver can bind to a device is also device_initcall()
time. We have a driver we want to register before other devices, so
publish the devices at subsys_initcall() time.
This should not cause any behaviour change for existing drivers, as they
are still bound at device_initcall() time.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Reference count for the "neighbor" spu context was not
being correctly decremented after usage.
So, contexts used as reference during SPU affinity setup
were not being deallocated, leading to a memory leak.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently we only catch debug events through the 0x3fff status;
spufs_run_spu doesn't handle single-step SPE events.
This change adds a handler for conditions where the SPE is stopped due
to single-step-mode.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds markers two important points in the spufs code and a new
module (sputrace.ko) that allows reading these out through a proc file.
Long-term I'd rather see something like lttng extended to use the spufs
instrumentation, but for now I think this is a good enough quick
solution. We'll probably want to add various addition event in addition
to that ones I have already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds support for setting up a fixed IOMMU mapping on certain
cell machines. For 64-bit devices this avoids the performance overhead of
mapping and unmapping pages at runtime. 32-bit devices are unable to use
the fixed mapping.
The fixed mapping is established at boot, and maps all of physical memory
1:1 into device space at some offset. On machines with < 30 GB of memory
we setup the fixed mapping immediately above the normal IOMMU window.
For example a machine with 4GB of memory would end up with the normal
IOMMU window from 0-2GB and the fixed mapping window from 2GB to 6GB. In
this case a 64-bit device wishing to DMA to 1GB would be told to DMA to
3GB, plus any offset required by firmware. The firmware offset is encoded
in the "dma-ranges" property.
On machines with 30GB or more of memory, we are unable to place the fixed
mapping above the normal IOMMU window as we would run out of address space.
Instead we move the normal IOMMU window to coincide with the hash page
table, this region does not need to be part of the fixed mapping as no
device should ever be DMA'ing to it. We then setup the fixed mapping
from 0 to 32GB.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Split out the ioid fetching and checking logic so we can use it elsewhere
in a subsequent patch.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add support to cell_iommu_setup_page_tables() for handling two windows,
the dynamic window and the fixed window. A fixed window size of 0
indicates that there is no fixed window at all.
Currently there are no callers who pass a non-zero fixed window, but the
upcoming fixed IOMMU mapping patch will change that.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Split the IOMMU logic out from cell_dma_dev_setup() into a separate
function. If we're not using dma_direct_ops or dma_iommu_ops we don't
know what the hell's going on, so BUG.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Split cell_iommu_setup_hardware() into two parts. Split the page table
setup into cell_iommu_setup_page_tables() and the bits that kick the
hardware into cell_iommu_enable_hardware().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Split out the logic that allocates a struct iommu into a separate
function. This can fail however the calling code has never cared - so
just return if we can't allocate an iommu.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently the IOMMU code allocates one page for the segment table, that
isn't safe if we have more than 132 GB of RAM.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Rather than using the global variable, have cell use its own variable
to store the direct DMA offset.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Store the direct_dma_offset in each device's dma_data in the case
where we're using the direct DMA ops.
We need to make sure we setup the ppc_md.pci_dma_dev_setup() callback
if we're using a non-zero offset.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
All kobjects require a dynamically allocated name now. We no longer
need to keep track if the name is statically assigned, we can just
unconditionally free() all kobject names on cleanup.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>