We currently have two issues with the MFC save code:
* save_mfc_decr doesn't handle a transition of 1 -> 0 of the Ds bit
* The Q bit may be stale in the CSA
This change fixes the first issue by clearing the relevant bits from
the MFC_CNTL value in the CSA before or-ing in the updated status.
Also, we add the Q bit to the updated status.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we can introduce invalid entries into the MFC queues:
1) context starts a DMA
2) context gets scheduled out during a DMA
- kernel saves MFC queue to CSA
- kernel saves 0x0 in csa->mfc_control_RW
3) context gets scheduled in
- csa->mfc_control[Q] ('queues empty') isn't set, so DMA queues are
restored from the CSA
4) context's DMA is completed
5) context gets scheduled out again, no DMA occuring this time
- kernel sees that MFC_CNTL[Q] ('queues empty') is set, so doesn't
touch saved queue data in CSA
- kernel saves 0x0 in csa->mfc_control_RW
6) context gets scheduled in
- csa->mfc_control[Q] ('queues empty') isn't set (we saved is as 0!),
so DMA queues are restored from the CSA
In this last restore, we've restored the queue status from step 2,
which are now invalid.
This change makes save_mfc_cntl() closer to the save/restore sequence,
as specified in the CBE handbook.
With changes from Luke Browning.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
When we issue a MFC purge request, we may inadvertantly clear the
suspended status.
This change adds the MFC_CNTL_SUSPEND_MASK when we issue a purge
request, so that the suspend bit is masked out.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The sputrace module contained a trace entry for spu_acquire_saved, but
this marker was not placed anywhere. Fix this by adding a marker to the
routine.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Fix a typo in the marker for the find_victim function, which prevented
it from being traced. It previously read find_vitim.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The sputrace module contained a reference to a marker for
destroy_spu_context, but this marker did not appear in the code. Fix
this by adding a marker in the function.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The markers facility defines the marker parameters to be of the form
'name %format'. Add parameter names to sputrace, to specify the context
and %spu paramerters, instead of just specifying the '%format' part.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
There are userspace instrumentation tools that need to monitor spu
context switches. This patch adds a new file called 'switch_log' to
each spufs context directory that can be used to monitor the context
switches.
Context switch in/out and exit from spu_run are monitored after the
file was first opened and can be read from it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.
Add correct ->owner to proc_fops to fix reading/module unloading race.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (202 commits)
[POWERPC] Fix compile breakage for 64-bit UP configs
[POWERPC] Define copy_siginfo_from_user32
[POWERPC] Add compat handler for PTRACE_GETSIGINFO
[POWERPC] i2c: Fix build breakage introduced by OF helpers
[POWERPC] Optimize fls64() on 64-bit processors
[POWERPC] irqtrace support for 64-bit powerpc
[POWERPC] Stacktrace support for lockdep
[POWERPC] Move stackframe definitions to common header
[POWERPC] Fix device-tree locking vs. interrupts
[POWERPC] Make pci_bus_to_host()'s struct pci_bus * argument const
[POWERPC] Remove unused __max_memory variable
[POWERPC] Simplify xics direct/lpar irq_host setup
[POWERPC] Use pseries_setup_i8259_cascade() in pseries_mpic_init_IRQ()
[POWERPC] Turn xics_setup_8259_cascade() into a generic pseries_setup_i8259_cascade()
[POWERPC] Move xics_setup_8259_cascade() into platforms/pseries/setup.c
[POWERPC] Use asm-generic/bitops/find.h in bitops.h
[POWERPC] 83xx: mpc8315 - fix USB UTMI Host setup
[POWERPC] 85xx: Fix the size of qe muram for MPC8568E
[POWERPC] 86xx: mpc86xx_hpcn - Temporarily accept old dts node identifier.
[POWERPC] 86xx: mark functions static, other minor cleanups
...
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At present, ppu-gdb can't trace spu infomation with coredump generated
by the kernel. While the core dumps notes have correct contents, they
have the wrong names, as the file descriptors used to generate the note
names are off-by-one. An application that opens a SPE context as fd 3,
the current core dump code will generate notes like:
SPU/4/mem
SPU/4/regs
etc.
This confuses GDB, which knows it is looking for SPE context 3 (from
parsing the spu_context_run system call arguments), and cannot find
any notes that match context 3.
This change corrects the file descriptor counting, to only increment
the fd until after we've written the note name.
Signed-off-by: Gerhard Stenzel <stenzel@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
During the context save process, we currently save the MFC command
channel after purging the MFC queues. This causes a systemsim warning,
as the command channel may be in an unknown state after the purge.
This change does the save before purging the MFC queues.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
During spu_process callback, we release then acquire the SPU, but keep a
pointer to the local store memory. Since the context may have been
scheduled out during the callback, the ls pointer may become invalid.
This change reacquires the pointer to the context local store after
spu_acquire()-ing, so that it isn't invalidated by a context switch.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
All of the single-value files in spufs are terminated by a newline,
except for signal1_type and signal2_type.
This change adds a trailing newline to these two files.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
At present, we can hit the BUG_ON in __spu_update_sched_info by reading
the regs file of a context between two calls to spu_run. The
spu_release_saved called by spufs_regs_read() is resulting in the (now
non-runnable) context being placed back on the run queue, so the next
call to spu_run ends up in the bug condition.
This change uses the SPU_SCHED_SPU_RUN flag to only reschedule a context
if it's still in spu_run().
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
commit 4ef11014 introduced a usage of SCHED_IDLE to detect when
a context is within spu_run.
Instead of SCHED_IDLE (which has other meaning), add a flag to
sched_flags to tell if a context should be running.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The spu_runcntl_RW register is restored within spu_restore function.
So, at the end of spu_bind_context, the SPU context is not just loaded,
but running.
This change corrects the state switch to account the time as USER.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
There is a small race between the context save procedure
and the SPU interrupt handling, where we expect all interrupt
processing to have finished after disabling them, while
an interrupt is still being processed on another CPU.
The obvious fix is to call synchronize_irq() after disabling
the interrupts at the start of the context save procedure
to make sure we never access the SPU any more during an
ongoing save or even after that.
Thanks to Benjamin Herrenschmidt for pointing this out.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we get the following output from sputrace:
[5.097935954] 1606: spufs_ps_nopfn__enter (thread = 1605, spu = -1)
[5.097958164] 1606: spufs_ps_nopfn__insert (thread = 1605, spu = 15)
[5.097973529] 1607: spufs_ps_nopfn__enter (thread = 1605, spu = -1)
[5.097989174] 1607: spufs_ps_nopfn__insert (thread = 1605, spu = 14)
Which leads me to believe that 160[67] is the current thread ID, and
1605 is the context backing the psmap.
However, the 'current' and 'owner' tids are reversed - the 'current'
tid is on the right. This change puts the current thread ID in the
left-hand column instead, and renames the right to 'ctxthread'.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
At present, we have a situation where a context with no owner is
re-scheduled by spu_forget:
Thread 1: reading regs file Thread 2: context owner
spu_forget()
- ctx->owner = NULL
- set SPU_SCHED_WAS_ACTIVE
spu_acquire_saved()
- context is in saved state
spu_release_saved()
- SPU_SCHED_WAS_ACTIVE is set,
so spu_activate() the context,
which now has no owner
In spu_forget(), we shouldn't be requesting a re-schedule by setting
SPU_SCHED_WAS_ACTIVE. This change removes the set_bit in spu_forget(),
so that spu_release_saved() doesn't reinsert this destroyed context on
to the run queue.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We have a small window where a spu context may be destroyed while
we're servicing a page fault (from another thread) to the context's
problem state mapping.
After we up_read() the mmap_sem, it's possible that the context is
destroyed by its owning thread, and so the later references to ctx
are invalid. This can maifest as a deadlock on the (now free()-ed)
context state mutex.
This change adds a reference to the context before we release the
mmap_sem, so that the context cannot be destroyed.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
At present, the __spufs_trap_data_map and __spu_trap_data_seq functions
exit if spu->flags has the SPU_CONTEXT_SWITCH_ACTIVE set. This was
resulting in suprious returns from these functions, as they may be
legitimately called when we have this bit set.
We only use it in these two sanity checks, so this change removes the
flag completely. This fixes hangs in the page-fault path of SPE apps.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
2.6.25 has a regression where we can starve the scheduler by creating
(N_SPES+1) contexts, then running them one at a time.
The final context will never be run, as the other contexts are loaded on
the SPEs, none of which are repoted as free (ie, spu->alloc_state !=
SPU_FREE), so spu_get_idle() doesn't give us a spu to run on. Because
all of the contexts are stopped, none are descheduled by the scheduler
tick, as spusched_tick returns if spu_stopped(ctx).
This change replaces the spu_stopped() check with checking for SCHED_IDLE
in ctx->policy. We set a context's policy to SCHED_IDLE when we're not
in spu_run(). We also favour SCHED_IDLE contexts when looking for contexts
to unbind, but leave their timeslice intact for later resumption.
This patch fixes the following test in the spufs-testsuite:
tests/20-scheduler/02-yield-starvation
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
* Add path_put() functions for releasing a reference to the dentry and
vfsmount of a struct path in the right order
* Switch from path_release(nd) to path_put(&nd->path)
* Rename dput_path() to path_put_conditional()
[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.
Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
<dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
struct path in every place where the stack can be traversed
- it reduces the overall code size:
without patch series:
text data bss dec hex filename
5321639 858418 715768 6895825 6938d1 vmlinux
with patch series:
text data bss dec hex filename
5320026 858418 715768 6894212 693284 vmlinux
This patch:
Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
RCU style multiple probes support for the Linux Kernel Markers. Common case
(one probe) is still fast and does not require dynamic allocation or a
supplementary pointer dereference on the fast path.
- Move preempt disable from the marker site to the callback.
Since we now have an internal callback, move the preempt disable/enable to the
callback instead of the marker site.
Since the callback change is done asynchronously (passing from a handler that
supports arguments to a handler that does not setup the arguments is no
arguments are passed), we can safely update it even if it is outside the
preempt disable section.
- Move probe arm to probe connection. Now, a connected probe is automatically
armed.
Remove MARK_MAX_FORMAT_LEN, unused.
This patch modifies the Linux Kernel Markers API : it removes the probe
"arm/disarm" and changes the probe function prototype : it now expects a
va_list * instead of a "...".
If we want to have more than one probe connected to a marker at a given
time (LTTng, or blktrace, ssytemtap) then we need this patch. Without it,
connecting a second probe handler to a marker will fail.
It allow us, for instance, to do interesting combinations :
Do standard tracing with LTTng and, eventually, to compute statistics
with SystemTAP, or to have a special trigger on an event that would call
a systemtap script which would stop flight recorder tracing.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Mason <mmlnx@us.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: David Smith <dsmith@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Add arch-specific walk_memory_remove() for 64-bit powerpc
[POWERPC] Enable hotplug memory remove for 64-bit powerpc
[POWERPC] Add remove_memory() for 64-bit powerpc
[POWERPC] Make cell IOMMU fixed mapping printk more useful
[POWERPC] Fix potential cell IOMMU bug when switching back to default DMA ops
[POWERPC] Don't enable cell IOMMU fixed mapping if there are no dma-ranges
[POWERPC] Fix cell IOMMU null pointer explosion on old firmwares
[POWERPC] spufs: Fix timing dependent false return from spufs_run_spu
[POWERPC] spufs: No need to have a runnable SPU for libassist update
[POWERPC] spufs: Update SPU_Status[CISHP] in backing runcntl write
[POWERPC] spufs: Fix state_mutex leaks
[POWERPC] Disable G5 NAP mode during SMU commands on U3
Add a .show_options super operation to spufs.
Use generic_show_options() and save the complete option string in
spufs_fill_super().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
simple_attr_close implementes ->release so it should be named accordingly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <stefano.brivio@polimi.it>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg KH <greg@kroah.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes simple attributes might need to return an error, e.g. for
acquiring a mutex interruptibly. In fact we have that situation in
spufs already which is the original user of the simple attributes. This
patch merged the temporarily forked attributes in spufs back into the
main ones and allows to return errors.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <stefano.brivio@polimi.it>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg KH <greg@kroah.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stop bits are only valid when the running bit is not set. Status bits
carry over from one invocation of spufs_run_spu() to another, so the
RUNNING bit gets added to the previous state of the register which may
have been a remote library call. In this case, it looks like another
library routine should be invoked, but the spe is actually running.
This fixes a problem with a testcase that exercises the scheduler.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We don't need to update the libassist statistic with the context in a
runnable state, so do it after spu_disable_spu().
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, the kernel may fail to restart a SPE context which
has stopped and been swapped out.
This changes spu_backing_runcntl_write to emulate the real
SPU_Status register exactly. When the SPU Run Control register
is written with SPU_RunCntl[Run] set to '1', the physical SPU
automatically sets SPU_Status[R] and clears SPU_Status[CISHP].
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix various state_mutex leaks. The worst one was introduced by the
interrutible state_mutex conversion but there've been a few before
too. Notably spufs_wait now returns without the state_mutex held
when returning an error, which actually cleans up some code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Reference count for the "neighbor" spu context was not
being correctly decremented after usage.
So, contexts used as reference during SPU affinity setup
were not being deallocated, leading to a memory leak.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently we only catch debug events through the 0x3fff status;
spufs_run_spu doesn't handle single-step SPE events.
This change adds a handler for conditions where the SPE is stopped due
to single-step-mode.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds markers two important points in the spufs code and a new
module (sputrace.ko) that allows reading these out through a proc file.
Long-term I'd rather see something like lttng extended to use the spufs
instrumentation, but for now I think this is a good enough quick
solution. We'll probably want to add various addition event in addition
to that ones I have already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit aed3a8c9bb introduced a
definition of notify_spus_active in .../cell/spu_syscalls.c, and
another definition under #ifndef MODULE in .../cell/spufs/sched.c.
The latter is not necessary and causes the build to fail when
CONFIG_SPU_FS=y, so this removes it. It also removes the export
of do_notify_spus_active, which is unnecessary.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
This removes an OProfile dependency on the spufs module. This
dependency was causing a problem for multiplatform systems that are
built with support for Oprofile on Cell but try to load the oprofile
module on a non-Cell system.
Signed-off-by: Bob Nelson <rrnelson@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Based on an original patch from Arnd Bergmann
<arnd.bergmann@de.ibm.com>
If there's no entry in the mailbox, then a read on the _info file will
return data from an uninitialised variable.
This change returns EOF if there's no mailbox info available instead.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes the behavior of spufs when a spu tries a DMA operation
based on a wrong / unavailable address.
Instead of just generating a SIGBUS signal, spufs now
generates a SIGSEGV signal and restarts the problematic DMA operation
after the execution of the application's signal handler. This allows
applications to employ user-level paging systems.
Although the restart_dma function is called before the application's
signal handler, the operation is not actually performed at this time,
since the spu context is already stopped. The operation only takes
place when spu_run is restarted (which happens automatically).
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The original spusched_timer was designed to take effect only when
a context is waiting in the runqueue.
This change adds an additional lower-freq timer has been added to
purely handle the spu_load updates. The new timer will be triggered
per LOAD_FREQ ticks.
Signed-off-by: Aegis Lin <aegislin@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Make most places that use spu_acquire/spu_acquire_saved interruptible,
this allows getting out of the spufs code when e.g. pressing ctrl+c.
There are a few places where we get called e.g. from spufs teardown
routines were we can't simply err out so these are left with a comment.
For now I've also not touched the poll routines because it's open what
libspe would expect in terms of interrupted system calls.
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The simple attr macros currently used by spufs can't deal with the
handlers returning errors, which is required to make the state_mutex
interruptible. This adds a local copy that allows for an error
return from the get/set handlers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Change spufs_spu_run so that the context is queued directly to the
scheduler and the controlling thread advances directly to spufs_wait()
for spe errors and exceptions.
nosched contexts are treated the same as before.
Fixes from Christoph Hellwig <hch@lst.de>
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This changes the spu context switch code to not write to reserved bits
of spu interrupt status register.
The architecture book says the reserved fields should be set to zero.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Need to re-check priority after dropping lock. Otherwise, a
more favored context may be preempted.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This cleans up spu_run_init so that it does all of the spu
initialization for spufs_run_spu. It initializes the spu context as
much as possible before it activates the spu and writes the runcntl
register.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Based on original patches from
Arnd Bergmann <arnd.bergman@de.ibm.com>; and
Luke Browning <lukebr@linux.vnet.ibm.com>
Currently, spu contexts need to be loaded to the SPU in order to take
class 0 and class 1 exceptions.
This change makes the actual interrupt-handlers much simpler (ie,
set the exception information in the context save area), and defers the
handling code to the spufs_handle_class[01] functions, called from
spufs_run_spu.
This should improve the concurrency of the spu scheduling leading to
greater SPU utilization when SPUs are overcommited.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a few #defines for the class 0, 1 and 2 interrupt status bits, and
use them instead of magic numbers when we're setting or checking for
these interrupts.
Also, add a #define for the class 2 mailbox threshold interrupt mask.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When doing a poll on the mbox stat file of a swapped-out context, we
clear the class 0 interrupt status, rather than the class 2 interrupt
status.
This change corrects the poll operation to clear the correct interrupt.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This change encapsulates the spu_privcntl_RW register so that it can
be written through backing ops. This is necessary so that spu contexts
can be initialized and queued to the scheduler in spufs_run_spu.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This change disables the logic that faults-in spu contexts under the
covers from the page fault handler. When a fault requires a runnable
context, the handler will block until the context is scheduled by
other means.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, part of the spufs code (switch.o, lscsa_alloc.o and fault.o)
is compiled directly into the kernel.
This change moves these components of spufs into the kernel.
The lscsa and switch objects are fairly straightforward to move in.
For the fault.o module, we split the fault-handling code into two
parts: a/p/p/c/spu_fault.c and a/p/p/c/spufs/fault.c. The former is for
the in-kernel spu_handle_mm_fault function, and we move the rest of the
fault-handling code into spufs.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix a few typos in the spufs scheduler comments
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add platform specific SPU run control routines to the spufs. The current
spufs implementation uses the SPU master run control bit (MFC_SR1[S]) to
control SPE execution, but the PS3 hypervisor does not support the use of
this feature.
This change adds the run control wrapper routies spu_enable_spu() and
spu_disable_spu(). The bare metal routines use the master run control
bit, and the PS3 specific routines use the priv2 run control register.
An outstanding enhancement for the PS3 would be to add a guard to check
for incorrect access to the spu problem state when the spu context is
disabled. This check could be implemented with a flag added to the spu
context that would inhibit mapping problem state pages, and a routine
to unmap spu problem state pages. When the spu is enabled with
ps3_enable_spu() the flag would be set allowing pages to be mapped,
and when the spu is disabled with ps3_disable_spu() the flag would be
cleared and mapped problem state pages would be unmapped.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, we have a possibilty that the SLBs setup during context
switch don't cover the entirety of the necessary lscsa and code
regions, if these regions cross a segment boundary.
This change checks the start and end of each region, and inserts a SLB
entry for each, if unique. We also remove the assumption that the
spu_save_code and spu_restore_code reside in the same segment, by using
the specific code array for save and restore.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Add a function spu_64k_pages_available(), so that we can abstract the
explicity use of mmu_psize_defs() in lssca_alloc.c
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Currently, the SPU context switch code (spufs/switch.c) sets up the
SPU's SLBs directly, which requires some low-level mm stuff.
This change moves the kernel SLB setup to spu_base.c, by exposing
a function spu_setup_kernel_slbs() to do this setup. This allows us
to remove the low-level mm code from switch.c, making it possible
to later move switch.c to the spufs module.
Also, add a struct spu_slb for the cases where we need to deal with
SLB entries.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
We can currently cause an oops by repeatedly creating and destroying
contexts, while doing getdents() calls on the "/spu" directory.
This is due to the context's top-level dentry remaining hashed while
the context is being destroyed.
Fix this by unhashing the context's dentry with the
dentry->d_inode->i_mutex held. This way, we'll hit the check for
d_unhashed in dentry_readdir, and won't be included in the
list of subdirs for /spu.
test: spufs-testsuite:tests/01-spu_create/07-destroy-vs-readdir-race
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* Convert files to UTF-8.
* Also correct some people's names
(one example is Eißfeldt, which was found in a source file.
Given that the author used an ß at all in a source file
indicates that the real name has in fact a 'ß' and not an 'ss',
which is commonly used as a substitute for 'ß' when limited to
7bit.)
* Correct town names (Goettingen -> Göttingen)
* Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313)
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Slab constructors currently have a flags parameter that is never used. And
the order of the arguments is opposite to other slab functions. The object
pointer is placed before the kmem_cache pointer.
Convert
ctor(void *object, struct kmem_cache *s, unsigned long flags)
to
ctor(struct kmem_cache *s, void *object)
throughout the kernel
[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes the kernel use 1TB segments for all kernel mappings and for
user addresses of 1TB and above, on machines which support them
(currently POWER5+, POWER6 and PA6T).
We detect that the machine supports 1TB segments by looking at the
ibm,processor-segment-sizes property in the device tree.
We don't currently use 1TB segments for user addresses < 1T, since
that would effectively prevent 32-bit processes from using huge pages
unless we also had a way to revert to using 256MB segments. That
would be possible but would involve extra complications (such as
keeping track of which segment size was used when HPTEs were inserted)
and is not addressed here.
Parts of this patch were originally written by Ben Herrenschmidt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The commit 8b6f50ef1d seems to have
been affected by a mismerge of a duplicate patch
(d054b36ffd) - both the
spufs_dir_contents and spufs_dir_nosched_contents have been given
write-only signal notification files.
This change reverts the spufs_dir_contents array to use the
readable signal notification file implementation.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
find_victim can dereference a NULL pointer when iterating over the list
of victim spus because list_mutex only guarantees spu->ct to be stable,
but of course not to be non-NULL.
Also fix find_victim to not call spu_unbind_context without list_mutex
because that violates the above guarantee.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds DEFINE_SPUFS_ATTRIBUTE(), a wrapper around
DEFINE_SIMPLE_ATTRIBUTE which does the specified locking for the get
routine for us.
Unfortunately we need two get routines (a locked and unlocked version) to
support the coredump code. This hides one of those (the locked version)
inside the macro foo.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently the spu coredump code doesn't respect the ulimit, it should.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Rework spufs_coredump_extra_notes_write() to check for and return errors.
If we're coredumping to a pipe we can't trust file->f_pos, we need to
maintain the foffset value passed to us. The cleanest way to do this is
to have the low level write routine increment foffset when we've
successfully written.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Because spufs might be built as a module, we can't have other parts of the
kernel calling directly into it, we need stub routines that check first if the
module is loaded.
Currently we have two structures which hold callbacks for these stubs, the
syscalls are in spufs_calls and the coredump calls are in spufs_coredump_calls.
In both cases the logic for registering/unregistering is essentially the same,
so we can simplify things by combining the two.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The SPUFS attribute get routines take a void * because the generic attribute
code doesn't know what sort of data it's passing around.
However our internal __spufs_get_foo() routines can take a spu_context *
directly, which saves plonking it in and out of a void * again.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The spufs_coredump_read array is NULL terminated, and we also store the size.
We only need one or the other, and the other arrays in file.c are NULL
terminated, so do that.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The routine to dump the local store, __spufs_mem_read(), does not take the
spu_lslr_RW value into account - so we shouldn't check it when we're
calculating the size either.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Unfortunately GDB expects some of the SPU coredump values to be identical
in format to what is found in spufs. This means we need to dump some of
the values as ASCII strings, not the actual values.
Because we don't know what the values will be, we always print the values
with the format "0x%.16lx", that way we know the result will be 19 bytes.
do_coredump_read() doesn't take a __user buffer, so remove the annotation,
and because we know that it's safe to just snprintf() directly to it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The spufs_coredump_reader array contains the size of the data that will be
returned by the read routine. Currently these are specified as literals,
and though some are obvious, sizeof(u32) == 4, others are not, 69 * 8 == ???
Instead, use sizeof() whatever type is returned by each routine, or in
the case of spufs_mem_read() the #define LS_SIZE.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It makes sense to stop the SPU processes as soon as possible. Also if we
dont acquire_saved() I think there's a possibility that the value in
csa.priv2.spu_lslr_RW won't be accurate.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove the ctx_info struct entirely, and also the ctx_info_list. This
fixes a race where two processes can clobber each other's ctx_info structs.
Instead of using the list, we just repeat the search through the file
descriptor table.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Extract the logic for searching through the file descriptors for spu contexts
into a separate routine, coredump_next_context(), so we can use it elsewhere
in future. In the process we flatten the for loop, and move the NOSCHED test
into coredump_next_context().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Based on an original patch from Masato Noguchi
<Masato.Noguchi@jp.sony.com>.
We're currently not restoring the SPE decrementer as specified by the
CBE handbook. This change fixes our implementation to match, and makes
the function read more like the docs.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At present, a built-in spufs will not use the spufs_calls callbacks, but
directly call sys_spu_create. This saves us an indirect branch, but
means we have duplicated functions - one for CONFIG_SPU_FS=y and one for
=m.
This change unifies the spufs syscall path, and provides access to the
spufs_calls structure through a get/put pair. At present, the only user
of the spufs_calls structure is spu_syscalls.c, but this will facilitate
adding the coredump calls later.
Everyone likes numbers, right? Here's a before/after comparison with
CONFIG_SPU_FS=y, doing spu_create(); close(); 64k times.
Before:
[jk@cell ~]$ time ./spu_create
performing 65536 spu_create calls
real 0m24.075s
user 0m0.146s
sys 0m23.925s
After:
[jk@cell ~]$ time ./spu_create
performing 65536 spu_create calls
real 0m24.777s
user 0m0.141s
sys 0m24.631s
So, we're adding around 11us per syscall, at the benefit of having
only one syscall path.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Affinity reference point location (gang->aff_ref_spu) is reset
when the whole gang is descheduled. However, the last member of
a gang can be descheduled while we are trying to schedule another
member of the gang. This was leading to a race condition, and
the code was using gang->aff_ref_spu in an unsafe manner.
By holding the gang->aff_mutex a little bit longer, and increment
gang->aff_sched_count (which controls when gang->aff_ref_spu
should be reset) a little bit earlier, the problem is fixed.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
According to the comment in spufs_init_isolated_loader(), the isolated
loader should be aligned on a 16 byte boundary.
ARCH_{KMALLOC,SLAB}_MINALIGN is not defined so only 8 byte alignment is
guaranteed.
This enforces alignment via __get_free_pages.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Based on an initial patch from Sebastian Siewior
<sebastian@breakpoint.cc>
spu_harvest isn't used, remove it.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
do_spu_create doesn't need the asmlinkage qualifier; remove it.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There are a few symbols used only in one file within spufs; this change
makes them static where suitable.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes a major bug which was happening when a SPU thread advances
its execution right after being restored to a SPU. A potentially
outdated NPC value was being (re)written to the SPU.
So, spu_run_init, in this case, was either not doing anything relevant,
or breaking the execution of the SPU thread.
This fixes a common problem of losing a mailbox write when it was done
to a saved context.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When a process writes into the inbound spu mailbox (wbox) while the
context is saved, we accidentally break the contents of the mb_stat_R
register by clearing other entries of the mailbox status register. This
can cause the user side to hang.
This change fixes the problem by only altering the appropriate bits
of the mailbox status register during a backing-store write.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch fixes affinity reference point placement, which was not being
done in some situations, after the introduction of node_allowed() calls.
The previously used parameter, 'ctx', is just the iterator of the
previous list_for_each_entry_reverse loop, and its value might be
invalid at the end of the loop. Also, the right context to seek
for information when defining the reference ctx location
_is_ the reference ctx.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently we calculate the first timeslice for every context
incorrectly - alloc_spu_context calls spu_set_timeslice before we set
ctx->prio so we always calculate the longest possible timeslice for the
lowest possible priority.
This patch makes sure to update the schedule-related fields before
calculating the timeslice and also makes sure we update the timeslice for
a non-running context when entering spu_run so a priority change affects
the context as soon as possible.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We currently initialize cbe_spu_info[].spus in both init_spu_base and
spu_sched_init. The initialise in spu_sched_init clears the SPU list,
so we end up with no physical SPUs. Because of this, the spu_run
syscall will block forever.
This change removes the unnecessary initialization in spu_sched_init.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spufs.h now has two enums for the sched_flags leading to identical
values for SPU_SCHED_WAS_ACTIVE and SPU_SCHED_NOTIFY_ACTIVE. Merge
them into a single enum as they were in the IBM development tree.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a mismerge in commit 8b6f50ef1d:
"spufs: make signal-notification files readonly for NOSCHED contexts",
where structs got duplicated.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reading from the signal{1,2} files requires a spu_acquire_saved, so make these
files write-only for contexts created with SPU_CREATE_NOSCHED.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This sorts out the various lists and related locks in the spu code.
In detail:
- the per-node free_spus and active_list are gone. Instead struct spu
gained an alloc_state member telling whether the spu is free or not
- the per-node spus array is now locked by a per-node mutex, which
takes over from the global spu_lock and the per-node active_mutex
- the spu_alloc* and spu_free function are gone as the state change is
now done inline in the spufs code. This allows some more sharing of
code for the affinity vs normal case and more efficient locking
- some little refactoring in the affinity code for this locking scheme
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
From: Maynard Johnson <mpjohn@us.ibm.com>
This patch updates the existing arch/powerpc/oprofile/op_model_cell.c
to add in the SPU profiling capabilities. In addition, a 'cell' subdirectory
was added to arch/powerpc/oprofile to hold Cell-specific SPU profiling code.
Exports spu_set_profile_private_kref and spu_get_profile_private_kref which
are used by OProfile to store private profile information in spufs data
structures.
Also incorporated several fixes from other patches (rrn). Check pointer
returned from kzalloc. Eliminated unnecessary cast. Better error
handling and cleanup in the related area. 64-bit unsigned long parameter
was being demoted to 32-bit unsigned int and eventually promoted back to
unsigned long.
Signed-off-by: Carl Love <carll@us.ibm.com>
Signed-off-by: Maynard Johnson <mpjohn@us.ibm.com>
Signed-off-by: Bob Nelson <rrnelson@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
From: Maynard Johnson <mpjohn@us.ibm.com>
This patch adds to the capability of spu_switch_event_register so that
the caller is also notified of currently active SPU tasks.
Exports spu_switch_event_register and spu_switch_event_unregister so
that OProfile can get access to the notifications provided.
Signed-off-by: Maynard Johnson <mpjohn@us.ibm.com>
Signed-off-by: Carl Love <carll@us.ibm.com>
Signed-off-by: Bob Nelson <rrnelson@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
This patch makes the scheduller honor affinity information for each
context being scheduled. If the context has no affinity information,
behaviour is unchanged. If there are affinity information, context is
schedulled to be run on the exact spu recommended by the affinity
placement algorithm.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch provides the spu affinity placement logic for the spufs scheduler.
Each time a gang is going to be scheduled, the placement of a reference
context is defined. The placement of all other contexts with affinity from
the gang is defined based on this reference context location and on a
precomputed displacement offset.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds support for additional flags at spu_create, which relate
to the establishment of affinity between contexts and contexts to memory.
A fourth, optional, parameter is supported. This parameter represent
a affinity neighbor of the context being created, and is used when defining
SPU-SPU affinity.
Affinity is represented as a doubly linked list of spu_contexts.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Addition of a spufs-global "cbe_info" array. Each entry contains information
about one Cell/B.E. node, namelly:
* list of spus (both free and busy spus are in this list);
* list of free spus (replacing the static spu_list from spu_base.c)
* number of spus;
* number of reserved (non scheduleable) spus.
SPE affinity implementation actually requires only access to one spu per
BE node (since it implements its own pointer to walk through the other spus
of the ring) and the number of scheduleable spus (n_spus - non_sched_spus)
However having this more general structure can be useful for other
functionalities, concentrating per-cbe statistics / data.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
spu_sched->bitmap has MAX_PRIO(=140) width in bits.However, since
ff80a77f20, sched_find_first_bit()
only supports 100-bit bitmaps.
Thus, spu_sched->bitmap should be treated by generic find_first_bit().
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
From: Sebastian Siewior <cbe-oss-dev@ml.breakpoint.cc>
The 'file' argument is unused in spufs_run_spu(). This change removes
it.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The SPU decrementer should be restored after the LSCSA DMA has
completed.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
No need to halt the SPE decrementer at context restore step 47, it will
be done in step 7.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
At save step 8, the mfc control register in the CSA should be written
_only_ with Sc and Sm bits (at least MFC_CNTL[Dh] should be set to 0)
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The decr_status in the LSCSA is valid only in the sequence of context
restore. Thus, it's nonsense to read and/or write it through spufs.
This patch changes decr_status node to access MFC_CNTL[Ds] in the CSA.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The decr_status in the LSCSA is confusedly used as two meanings:
* SPU decrementer was running
* SPU decrementer was wrapped as a result of adjust
and the code to set decr_status is missing.
This patch fixes these problems by using the decr_status argument as a
set of flags. This requires a rebuild of the shipped spu_restore code.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The following steps are not needed in the SPE context save/restore
paths:
Save Step 12: save_mfc_decr()
save suspend_time to CSA (It will be done by step 14)
save ch 7 (decrementer value will be saved in LSCSA by spe-side step 10)
Restore Step 59: restore_ch_part1()
restore ch 1 (it will be done by spe-side step 15)
This change removes the unnecessary steps.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Based on a fix from Masato Noguchi <Masato.Noguchi@jp.sony.com>.
Remove the (incorrect) array size declarations in the spufs channel
arrays, and use ARRAY_SIZE rather than hardcoded values.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Currently a process is removed from the physical spu when spu_acquire_saved
is saved but never put back. This patch adds a new spu_release_saved
that is to be paired with spu_acquire_saved and put the process back if
it has been in RUNNABLE state before.
Niether Jeremy not be are entirely happy about this exact patch because
it adds another spu_activate call outside of the owner thread, but I
feel this is the best short-term fix we can come up with.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch exports per-context statistics in spufs as long as spu
statistics in sysfs.
It was formed by merging:
"spufs: add spu stats in sysfs" From: Christoph Hellwig
"spufs: add stat file to spufs" From: Christoph Hellwig
"spufs: fix libassist accounting" From: Jeremy Kerr
"spusched: fix spu utilization statistics" From: Luke Browning
And some adjustments by myself, after suggestions on cbe-oss-dev.
Having separate patches was making the review process harder
than it should, as we end up integrating spus and ctx statistics
accounting much more than it was on the first implementation.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
In 6cbf93960e64f313f6e247cbca7afaa50e3ee2c we added a WARN_ON for
calling spu_deactivate on contexts created with the SPU_CREATE_NOSCHED
flag. However, all NOSCHED contexts will need to be deactivated when
the context is destroyed, so this gives a spurious warning when any
NOSCHED context is closed.
This change removes the WARN_ON.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Reading from the signal{1,2} files requires a spu_acquire_saved, so
make these files write-only for contexts created with
SPU_CREATE_NOSCHED.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The current SPU context saving procedure in SPUFS unexpectedly
restarts MFC when halting decrementer, because MFC_CNTL[Dh] is set
without MFC_CNTL[Sm]. This bug causes, for example, saving broken DMA
queues. Here is a patch to fix the problem.
Signed-off-by: Kazunori Asayama <asayama@sm.sony.co.jp>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
WARNING: arch/powerpc/platforms/cell/spufs/spufs.o(.init.text+0x158): Section
mismatch: reference to .exit.text:.spu_sched_exit (between '.init_module' and
'.spu_sched_init')
was introduced by c99c1994a2
This patch removes the warning.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
83c54070ee broke spufs by incorrectly
updating the code, this patch gets it to compile again.
It's probably still broken due to the scheduler changes, but this
at least makes sure cell kernels can still be built.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch completes Linus's wish that the fault return codes be made into
bit flags, which I agree makes everything nicer. This requires requires
all handle_mm_fault callers to be modified (possibly the modifications
should go further and do things like fault accounting in handle_mm_fault --
however that would be for another patch).
[akpm@linux-foundation.org: fix alpha build]
[akpm@linux-foundation.org: fix s390 build]
[akpm@linux-foundation.org: fix sparc build]
[akpm@linux-foundation.org: fix sparc64 build]
[akpm@linux-foundation.org: fix ia64 build]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Still apparently needs some ARM and PPC loving - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function backing_ops->read_mfc_tagstatus() doesn't return a
correct value because the dma_tagstatus_R register isn't saved in
CSA. This fixes the problem.
Signed-off-by: Kazunori Asayama <asayama@sm.sony.co.jp>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When waiting for I/O events on mfc in an SPU context by using
poll/epoll syscalls, some of the events can be lost because of wrong
order of poll_wait and MFC status checks in the spufs_mfc_poll
function and non-atomic update of tagwait. This fixes the
problem.
Signed-off-by: Kazunori Asayama <asayama@sm.sony.co.jp>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spu_activate can be called from multiple threads at the same time on
behalf of the same spu context. We need to make sure to only add it
once to avoid runqueue corruption.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Only enable the scheduler tick if we have any context waiting to be
scheduled.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We're currently too permissive with counting libassist calls - fix the
check on the SPE stop-and-signal status.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Provide load average information for spu context. The format
is identical to /proc/loadavg, which is also where a lot of code
and concepts is borrowed from.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The new tid file contains the ID of the thread currently running the
context, if any. This is used so that the new spu-top and spu-ps
tools can find the thread in /proc.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove redundant whitespace in arch/powerpc/platforms/cell/spufs/
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spufs_dir_inode_operations is exactly the same as
simple_dir_inode_operations. Use that instead.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
And last but not least we need to make sure the scheduler tick never
preempts a nosched context.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spu_deactivate should never be called for nosched contets. Put in
a check so we can print a stacktrace and exit early in case it
happes erroneously.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a cpus_allowed allowed filed to struct spu_context so that we always
use the cpu mask of the owning thread instead of the one happening to
call into the scheduler. Also use this information in
grab_runnable_context to avoid spurious wakeups.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Update scheduling information on every spu_run to allow for setting
threads to realtime priority just before running them. This requires
some slightly ugly code in spufs_run_spu because we can just update
the information unlocked if the spu is not runnable, but we need to
acquire the active_mutex when it is runnable to protect against
find_victim. This locking scheme requires opencoding
spu_acquire_runnable in spufs_run_spu which actually is a nice cleanup
all by itself.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Print out a few scheduler tuning parameters when we've compiled
with DEBUG defined.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The current timeslice code mixes 'jiffies' up with 'spesched ticks'. This
change correctly defines the number of time slices each SPE contexts is
given, and clarifies the comment.
This brings the default timeslice for SPE contexts into a reasonable
range.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Enable preemptive scheduling for non-RT contexts.
We use the same algorithms as the CPU scheduler to calculate the time
slice length, and for now we also use the same timeslice length as the
CPU scheduler. This might be not enough for good performance and can be
changed after some benchmarking.
Note that currently we do not boost the priority for contexts waiting
on the runqueue for a long time, so contexts with a higher nice value
could starve ones with less priority. This could easily be fixed once
the rework of the spu lists that Luke and I discussed is done.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Get rid of the scheduler workqueues that complicated things a lot to
a dedicated spu scheduler thread that gets woken by a traditional
scheduler tick. By default this scheduler tick runs a HZ * 10, aka
one spu scheduler tick for every 10 cpu ticks.
Currently the tick is not disabled when we have less context than
available spus, but I will implement this later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a bit define from book, and replace one hex number with a
symbol, for clarity.
Signed-off-by: Sebastian Siewior <bigeasy@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently it fails with gcc from sdk 2.1 because of a spec change [1].
Maybe we should start using the definitions from spu_mfcio.h.
[1] http://gcc.gnu.org/ml/gcc-patches/2006-11/msg01598.html
Signed-off-by: Sebastian Siewior <bigeasy@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds a "capabilities" file to spu contexts consisting of a
list of linefeed separated capability names. The current exposed
capabilities are "sched" (the context is scheduleable) and
"step" (the context supports single stepping).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>