linux/kernel
Andrew Morton 7c3ab7381e [PATCH] io-accounting: core statistics
The present per-task IO accounting isn't very useful.  It simply counts the
number of bytes passed into read() and write().  So if a process reads 1MB
from an already-cached file, it is accused of having performed 1MB of I/O,
which is wrong.

(David Wright had some comments on the applicability of the present logical IO accounting:

  For billing purposes it is useless but for workload analysis it is very
  useful

  read_bytes/read_calls  average read request size
  write_bytes/write_calls average write request size

  read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing
  write_bytes/write_blocks  ie logical/physical  guess since pdflush writes can
                                                be missed

  I often look for logical larger than physical to see filesystem cache
  problems.  And the bytes/cpusec can help find applications that are
  dominating the cache and causing slow interactive response from page cache
  contention.

  I want to find the IO intensive applications and make sure they are doing
  efficient IO.  Thus the acctcms(sysV) or csacms command would give the high
  IO commands).

This patchset adds new accounting which tries to be more accurate.  We account
for three things:

reads:

  attempt to count the number of bytes which this process really did cause
  to be fetched from the storage layer.  Done at the submit_bio() level, so it
  is accurate for block-backed filesystems.  I also attempt to wire up NFS and
  CIFS.

writes:

  attempt to count the number of bytes which this process caused to be sent
  to the storage layer.  This is done at page-dirtying time.

  The big inaccuracy here is truncate.  If a process writes 1MB to a file
  and then deletes the file, it will in fact perform no writeout.  But it will
  have been accounted as having caused 1MB of write.

  So...

cancelled_writes:

  account the number of bytes which this process caused to not happen, by
  truncating pagecache.

  We _could_ just subtract this from the process's `write' accounting.  But
  that means that some processes would be reported to have done negative
  amounts of write IO, which is silly.

  So we just report the raw number and punt this decision up to userspace.

Now, we _could_ account for writes at the physical I/O level.  But

- This would require that we track memory-dirtying tasks at the per-page
  level (would require a new pointer in struct page).

- It would mean that IO statistics for a process are usually only available
  long after that process has exitted.  Which means that we probably cannot
  communicate this info via taskstats.

This patch:

Wire up the kernel-private data structures and the accessor functions to
manipulate them.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:41 -08:00
..
irq [PATCH] CPEI gets warning at kernel/irq/migration.c:27/move_masked_irq() 2006-12-08 08:28:37 -08:00
power [PATCH] struct seq_operations and struct file_operations constification 2006-12-07 08:39:46 -08:00
time [PATCH] time_adjust cleared before use 2006-10-28 11:30:55 -07:00
.gitignore gitignore: ignore more generated files 2006-01-03 11:35:26 +01:00
acct.c [PATCH] kernel: change uses of f_{dentry, vfsmnt} to use f_path 2006-12-08 08:28:42 -08:00
audit.c [PATCH] Add include/linux/freezer.h and move definitions from sched.h 2006-12-07 08:39:27 -08:00
audit.h [PATCH] audit: AUDIT_PERM support 2006-09-11 13:32:30 -04:00
auditfilter.c [PATCH] kernel core: replace kmalloc+memset with kzalloc 2006-12-07 08:39:41 -08:00
auditsc.c [PATCH] struct path: convert kernel 2006-12-08 08:28:46 -08:00
capability.c [PATCH] pidspace: is_init() 2006-09-29 09:18:12 -07:00
compat.c [PATCH] Create compat_sys_migrate_pages 2006-11-03 12:27:59 -08:00
configs.c [PATCH] struct seq_operations and struct file_operations constification 2006-12-07 08:39:46 -08:00
cpu.c [PATCH] suspend: don't change cpus_allowed for task initiating the suspend 2006-12-07 08:39:28 -08:00
cpuset.c [PATCH] struct path: convert kernel 2006-12-08 08:28:46 -08:00
delayacct.c [PATCH] slab: remove kmem_cache_t 2006-12-07 08:39:25 -08:00
dma.c [PATCH] struct seq_operations and struct file_operations constification 2006-12-07 08:39:46 -08:00
exec_domain.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
exit.c [PATCH] session_of_pgrp: kill unnecessary do_each_task_pid(PIDTYPE_PGID) 2006-12-08 08:28:52 -08:00
extable.c [PATCH] symbol_put_addr() locks kernel 2006-05-15 11:20:55 -07:00
fork.c [PATCH] io-accounting: core statistics 2006-12-10 09:55:41 -08:00
futex_compat.c [PATCH] __user annotations: futex 2006-10-10 15:37:22 -07:00
futex.c [PATCH] kernel: change uses of f_{dentry, vfsmnt} to use f_path 2006-12-08 08:28:42 -08:00
hrtimer.c [PATCH] posix-timers: Fix clock_nanosleep() doesn't return the remaining time in compatibility mode 2006-09-29 09:18:15 -07:00
itimer.c [PATCH] hrtimers: remove data field 2006-03-26 08:57:03 -08:00
kallsyms.c [PATCH] move kallsyms data to .rodata 2006-12-08 08:28:37 -08:00
Kconfig.hz [PATCH] HZ: 300Hz support 2006-12-07 08:39:36 -08:00
Kconfig.preempt [PATCH] sched: voluntary kernel preemption 2005-06-25 16:24:45 -07:00
kexec.c Merge branch 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6 2006-12-07 15:39:22 -08:00
kfifo.c [PATCH] memory ordering in __kfifo primitives 2006-09-29 09:18:13 -07:00
kmod.c [PATCH] rename struct namespace to struct mnt_namespace 2006-12-08 08:28:51 -08:00
kprobes.c [PATCH] kprobes: enable booster on the preemptible kernel 2006-12-07 08:39:38 -08:00
ksysfs.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
kthread.c WorkStruct: Pass the work_struct pointer instead of context data 2006-11-22 14:55:48 +00:00
latency.c [PATCH] severing module.h->sched.h 2006-12-04 02:00:22 -05:00
lockdep_internals.h [PATCH] lockdep: more chains 2006-12-07 08:39:43 -08:00
lockdep_proc.c [PATCH] struct seq_operations and struct file_operations constification 2006-12-07 08:39:46 -08:00
lockdep.c Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 2006-12-07 08:59:11 -08:00
Makefile [PATCH] srcu-3: RCU variant permitting read-side blocking 2006-10-04 07:55:30 -07:00
module.c [PATCH] struct seq_operations and struct file_operations constification 2006-12-07 08:39:46 -08:00
mutex-debug.c [PATCH] lockdep: show more details about self-test failures 2006-12-07 08:39:43 -08:00
mutex-debug.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
mutex.c [PATCH] lockdep: avoid lockdep warning in md 2006-12-08 08:28:39 -08:00
mutex.h [PATCH] lockdep: prove mutex locking correctness 2006-07-03 15:27:04 -07:00
nsproxy.c [PATCH] to nsproxy 2006-12-08 08:28:52 -08:00
panic.c [PATCH] x86: Clean up x86 NMI sysctls 2006-09-30 01:47:55 +02:00
params.c [PATCH] module_subsys: initialize earlier 2006-09-29 09:18:08 -07:00
pid.c [PATCH] add child reaper to pid_namespace 2006-12-08 08:28:52 -08:00
posix-cpu-timers.c [PATCH] posix-cpu-timers: prevent signal delivery starvation 2006-10-17 08:18:43 -07:00
posix-timers.c [PATCH] slab: remove kmem_cache_t 2006-12-07 08:39:25 -08:00
printk.c [PATCH] add ignore_loglevel boot option 2006-12-07 08:39:47 -08:00
profile.c [PATCH] struct seq_operations and struct file_operations constification 2006-12-07 08:39:46 -08:00
ptrace.c [PATCH] pidspace: is_init() 2006-09-29 09:18:12 -07:00
rcupdate.c [PATCH] rcu: add a prefetch() in rcu_do_batch() 2006-12-07 08:39:40 -08:00
rcutorture.c [PATCH] Add Sparse annotations to SRCU wrapper functions in rcutorture 2006-12-07 08:39:44 -08:00
relay.c [PATCH] kernel: change uses of f_{dentry, vfsmnt} to use f_path 2006-12-08 08:28:42 -08:00
resource.c [PATCH] struct seq_operations and struct file_operations constification 2006-12-07 08:39:46 -08:00
rtmutex_common.h [PATCH] pi-futex: futex_lock_pi/futex_unlock_pi support 2006-06-27 17:32:47 -07:00
rtmutex-debug.c Remove all inclusions of <linux/config.h> 2006-10-04 03:38:54 -04:00
rtmutex-debug.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
rtmutex-tester.c [PATCH] Add include/linux/freezer.h and move definitions from sched.h 2006-12-07 08:39:27 -08:00
rtmutex.c [PATCH] clean up and remove some extra spinlocks from rtmutex 2006-09-29 09:18:09 -07:00
rtmutex.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
rwsem.c [PATCH] lockdep: prove rwsem locking correctness 2006-07-03 15:27:04 -07:00
sched.c [PATCH] struct seq_operations and struct file_operations constification 2006-12-07 08:39:46 -08:00
seccomp.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
signal.c [PATCH] add child reaper to pid_namespace 2006-12-08 08:28:52 -08:00
softirq.c [PATCH] softirq: remove BUG_ONs which can incorrectly trigger 2006-12-07 08:39:43 -08:00
softlockup.c [PATCH] check return value of cpu_callback 2006-09-29 09:18:14 -07:00
spinlock.c [PATCH] lockdep: spin_lock_irqsave_nested() 2006-11-25 13:28:34 -08:00
srcu.c [PATCH] SRCU: report out-of-memory errors 2006-10-04 07:55:30 -07:00
stacktrace.c [PATCH] lockdep: stacktrace subsystem, core 2006-07-03 15:27:02 -07:00
stop_machine.c [PATCH] stop_machine.c copyright 2006-09-29 09:18:24 -07:00
sys_ni.c [PATCH] Create compat_sys_migrate_pages 2006-11-03 12:27:59 -08:00
sys.c [PATCH] sys_setpgid: eliminate unnecessary do_each_task_pid(PIDTYPE_PGID) 2006-12-08 08:28:52 -08:00
sysctl.c [PATCH] sysctl: remove unused "context" param 2006-12-10 09:55:41 -08:00
taskstats.c [PATCH] taskstats: cleanup reply assembling 2006-12-07 08:39:34 -08:00
time.c [PATCH] NTP: Move all the NTP related code to ntp.c 2006-10-01 00:39:26 -07:00
timer.c [PATCH] kill wall_jiffies 2006-10-01 00:39:27 -07:00
tsacct.c [PATCH] xacct_add_tsk: fix pure theoretical ->mm use-after-free 2006-10-30 12:08:41 -08:00
uid16.c [PATCH] Add more prevent_tail_call() 2006-04-19 16:27:18 -07:00
unwind.c [PATCH] unwinder: move .eh_frame to RODATA 2006-12-07 02:14:19 +01:00
user.c [PATCH] slab: remove kmem_cache_t 2006-12-07 08:39:25 -08:00
utsname.c [PATCH] namespaces: utsname: implement CLONE_NEWUTS flag 2006-10-02 07:57:22 -07:00
wait.c [PATCH] uninline init_waitqueue_head() 2006-07-10 13:24:25 -07:00
workqueue.c [PATCH] WorkStruct: Use direct assignment rather than cmpxchg() 2006-12-09 12:25:08 -08:00