linux

Author	SHA1	Message	Date
Linus Torvalds	6e18933f2b	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [PATCH] Build fix for CONFIG_NUMA=y && CONFIG_SMP=n [IA64] fix bootmem regression on Altix	2008-04-25 12:50:00 -07:00
Linus Torvalds	0b79dada97	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes: sched: fix share (re)distribution softlockup: fix NOHZ wakeup seqlock: livelock fix	2008-04-25 12:47:56 -07:00
David Miller	5a9d3225a0	sched: use alloc_bootmem() instead of alloc_bootmem_low() There is no guarantee that there is physical ram below 4GB, and in fact many boxes don't have exactly that. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-25 09:53:06 +02:00
Peter Zijlstra	3f5087a2ba	sched: fix share (re)distribution fix __aggregate_redistribute_shares() related lockup reported by David S. Miller. The problem this code tries to solve is 'accurately' calculating the 'fair' share of the group weight for each cpu. The current code falls back to a global group rebalance in case the sched_domain's span it looks at has no shares, but does have tasks. The reason it gets stuck here, is because its inherently racy - if someone steals the last task after we compute the agg->rq_weight, but before we rebalance, we'll never get out of the loop. We could of course go fix that, but while looking at this issue I found that this 'fallback' wasn't nearly as rare as I'd hoped it to be. In fact its quite common - and given it walks the whole machine, thats very bad. The new approach is simple (why didn't I think of it before?), we set the aggregate shares to the full task group weight, and each larger sched domain that encounters an aggregate shares larger than the weight, clips it (it already re-distributes anyway). This nicely converges to the desired global picture where the sum of all shares equals the task group weight. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-25 00:25:08 +02:00
Mike Travis	03970f065d	[PATCH] Build fix for CONFIG_NUMA=y && CONFIG_SMP=n Regression caused by `434d53b00d` Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2008-04-24 14:40:29 -07:00
Russ Anderson	472613b961	[IA64] fix bootmem regression on Altix A recent change prevents SGI Altix from booting. This patch fixes the problem. The regresson was introduced in commit `434d53b00d` Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2008-04-24 14:21:21 -07:00
Randy Dunlap	73486722b7	kernel-doc: fix sched.c missing parameter Add missing kernel-doc in kernel/sched.c: Warning(linux-2.6.25-git3//kernel/sched.c:7044): No description found for parameter 'span' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-22 13:48:02 -07:00
Ingo Molnar	c24b7c5244	sched: features fix Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:45:00 +02:00
Peter Zijlstra	f00b45c145	sched: /debug/sched_features provide a text based interface to the scheduler features; this saves the 'user' from setting bits using decimal arithmetic. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:45:00 +02:00
Ingo Molnar	06379aba52	sched: add SCHED_FEAT_DEADLINE unused at the moment. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:45:00 +02:00
Peter Zijlstra	8f1bc385cf	sched: fair: weight calculations In order to level the hierarchy, we need to calculate load based on the root view. That is, each task's load is in the same unit. A / \ B 1 / \ 2 3 To compute 1's load we do: weight(1) -------------- rq_weight(A) To compute 2's load we do: weight(2) weight(B) ------------ * ----------- rq_weight(B) rw_weight(A) This yields load fractions in comparable units. The consequence is that it changes virtual time. We used to have: time_{i} vtime_{i} = ------------ weight_{i} vtime = \Sum vtime_{i} = time / rq_weight. But with the new way of load calculation we get that vtime equals time. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:45:00 +02:00
Peter Zijlstra	4a55bd5e97	sched: fair-group: de-couple load-balancing from the rb-trees De-couple load-balancing from the rb-trees, so that I can change their organization. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:45:00 +02:00
Peter Zijlstra	18d95a2832	sched: fair-group: SMP-nice for group scheduling Implement SMP nice support for the full group hierarchy. On each load-balance action, compile a sched_domain wide view of the full task_group tree. We compute the domain wide view when walking down the hierarchy, and readjust the weights when walking back up. After collecting and readjusting the domain wide view, we try to balance the tasks within the task_groups. The current approach is a naively balance each task group until we've moved the targeted amount of load. Inspired by Srivatsa Vaddsgiri's previous code and Abhishek Chandra's H-SMP paper. XXX: there will be some numerical issues due to the limited nature of SCHED_LOAD_SCALE wrt to representing a task_groups influence on the total weight. When the tree is deep enough, or the task weight small enough, we'll run out of bits. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Abhishek Chandra <chandra@cs.umn.edu> CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:45:00 +02:00
Hidetoshi Seto	1d3504fcf5	sched, cpuset: customize sched domains, core [rebased for sched-devel/latest] - Add a new cpuset file, having levels: sched_relax_domain_level - Modify partition_sched_domains() and build_sched_domains() to take attributes parameter passed from cpuset. - Fill newidle_idx for node domains which currently unused but might be required if sched_relax_domain_level become higher. - We can change the default level by boot option 'relax_domain_level='. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:45:00 +02:00
Peter Zijlstra	b40b2e8eb5	sched: rt: multi level group constraints multi level rt constraints Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:45:00 +02:00
Peter Zijlstra	f473aa5e02	sched: task_group hierarchy Add the full parent<->child relation thing into task_groups as well. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:45:00 +02:00
Peter Zijlstra	eff766a65c	sched: fix the task_group hierarchy for UID grouping UID grouping doesn't actually have a task_group representing the root of the task_group tree. Add one. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:45:00 +02:00
Dhaval Giani	ec7dc8ac73	sched: allow the group scheduler to have multiple levels This patch makes the group scheduler multi hierarchy aware. [a.p.zijlstra@chello.nl: rt-parts and assorted fixes] Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:59 +02:00
Dhaval Giani	354d60c2ff	sched: mix tasks and groups This patch allows tasks and groups to exist in the same cfs_rq. With this change the CFS group scheduling follows a 1/(M+N) model from a 1/(1+N) fairness model where M tasks and N groups exist at the cfs_rq level. [a.p.zijlstra@chello.nl: rt bits and assorted fixes] Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:59 +02:00
Ingo Molnar	ea736ed5d3	sched: fix checks Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:59 +02:00
Peter Zijlstra	112f53f5d7	sched: old sleeper bonus Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:59 +02:00
Mike Travis	cd8ba7cd9b	sched: add new set_cpus_allowed_ptr function Add a new function that accepts a pointer to the "newly allowed cpus" cpumask argument. int set_cpus_allowed_ptr(struct task_struct p, const cpumask_t new_mask) The current set_cpus_allowed() function is modified to use the above but this does not result in an ABI change. And with some compiler optimization help, it may not introduce any additional overhead. Additionally, to enforce the read only nature of the new_mask arg, the "const" property is migrated to sub-functions called by set_cpus_allowed. This silences compiler warnings. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:59 +02:00
Mike Travis	e0982e90cd	init: move setup of nr_cpu_ids to as early as possible Move the setting of nr_cpu_ids from sched_init() to start_kernel() so that it's available as early as possible. Note that an arch has the option of setting it even earlier if need be, but it should not result in a different value than the setup_nr_cpu_ids() function. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:59 +02:00
Mike Travis	4bdbaad33d	sched: remove another cpumask_t variable from stack * Remove another cpumask_t variable from stack that was missed in the last kernel_sched_c updates. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:59 +02:00
Mike Travis	7c16ec585c	cpumask: reduce stack usage in SD_x_INIT initializers * Remove empty cpumask_t (and all non-zero/non-null) variables in SD__INIT macros. Use memset(0) to clear. Also, don't inline the initializer functions to save on stack space in build_sched_domains(). Merge change to include/linux/topology.h that uses the new node_to_cpumask_ptr function in the nr_cpus_node macro into this patch. Depends on: [mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch [sched-devel]: sched: add new set_cpus_allowed_ptr function Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:59 +02:00
Mike Travis	c5f59f0833	nodemask: use new node_to_cpumask_ptr function * Use new node_to_cpumask_ptr. This creates a pointer to the cpumask for a given node. This definition is in mm patch: asm-generic-add-node_to_cpumask_ptr-macro.patch * Use new set_cpus_allowed_ptr function. Depends on: [mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch [sched-devel]: sched: add new set_cpus_allowed_ptr function [x86/latest]: x86: add cpus_scnprintf function Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Greg Banks <gnb@melbourne.sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:59 +02:00
Mike Travis	b53e921ba1	generic: reduce stack pressure in sched_affinity * Modify sched_affinity functions to pass cpumask_t variables by reference instead of by value. * Use new set_cpus_allowed_ptr function. Depends on: [sched-devel]: sched: add new set_cpus_allowed_ptr function Cc: Paul Jackson <pj@sgi.com> Cc: Cliff Wickman <cpw@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:59 +02:00
Mike Travis	f9a86fcbbb	cpuset: modify cpuset_set_cpus_allowed to use cpumask pointer * Modify cpuset_cpus_allowed to return the currently allowed cpuset via a pointer argument instead of as the function return value. * Use new set_cpus_allowed_ptr function. * Cleanup CPU_MASK_ALL and NODE_MASK_ALL uses. Depends on: [sched-devel]: sched: add new set_cpus_allowed_ptr function Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:58 +02:00
Mike Travis	434d53b00d	sched: remove fixed NR_CPUS sized arrays in kernel_sched_c * Change fixed size arrays to per_cpu variables or dynamically allocated arrays in sched_init() and sched_init_smp(). (1) static struct sched_entity init_sched_entity_p[NR_CPUS]; (1) static struct cfs_rq init_cfs_rq_p[NR_CPUS]; (1) static struct sched_rt_entity init_sched_rt_entity_p[NR_CPUS]; (1) static struct rt_rq init_rt_rq_p[NR_CPUS]; static struct sched_group *sched_group_nodes_bycpu[NR_CPUS]; (1) - these arrays are allocated via alloc_bootmem_low() Change sched_domain_debug_one() to use cpulist_scnprintf instead of cpumask_scnprintf. This reduces the output buffer required and improves readability when large NR_CPU count machines arrive. * In sched_create_group() we allocate new arrays based on nr_cpu_ids. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:58 +02:00
Dhaval Giani	0297b80339	sched: allow cpuacct stats to be reset Currently the schedstats implementation does not allow the statistics to be reset. This patch aims to allow that. echo 0 > cpuacct.usage resets the usage. Any other value is not allowed and returns -EINVAL. Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:58 +02:00
Dhaval Giani	32cd756a80	sched: cleanup cpuacct variable names Change the variable names to the common convention for the cpuacct subsystem. Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:58 +02:00
Peter Zijlstra	ac086bc229	sched: rt-group: smp balancing Currently the rt group scheduling does a per cpu runtime limit, however the rt load balancer makes no guarantees about an equal spread of real- time tasks, just that at any one time, the highest priority tasks run. Solve this by making the runtime limit a global property by borrowing excessive runtime from the other cpus once the local limit runs out. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:58 +02:00
Peter Zijlstra	d0b27fa778	sched: rt-group: synchonised bandwidth period Various SMP balancing algorithms require that the bandwidth period run in sync. Possible improvements are moving the rt_bandwidth thing into root_domain and keeping a span per rt_bandwidth which marks throttled cpus. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:57 +02:00
Ingo Molnar	50df5d6aea	sched: remove sysctl_sched_batch_wakeup_granularity it's unused. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:57 +02:00
Ingo Molnar	02e2b83bd2	sched: reenable sync wakeups Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:57 +02:00
Ingo Molnar	d25ce4cd49	sched: cache hot buddy Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:57 +02:00
Ingo Molnar	1fc8afa4c8	sched: feat affine wakeups Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:57 +02:00
Ingo Molnar	b85d066726	sched: introduce SCHED_FEAT_SYNC_WAKEUPS, turn it off turn off sync wakeups by default. They are not needed anymore - the buddy logic should be smart enough to keep the system from overscheduling. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:57 +02:00
Guillaume Chazarain	15934a3732	sched: fix rq->clock overflows detection with CONFIG_NO_HZ When using CONFIG_NO_HZ, rq->tick_timestamp is not updated every TICK_NSEC. We check that the number of skipped ticks matches the clock jump seen in __update_rq_clock(). Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:57 +02:00
Reynes Philippe	30914a58af	sched: sched.c needs tick.h kernel/sched.c:506: erreur: implicit declaration of function tick_get_tick_sched kernel/sched.c:506: erreur: invalid type argument of -> kernel/sched.c:506: erreur: NOHZ_MODE_INACTIVE undeclared (first use in this function) kernel/sched.c:506: erreur: (Each undeclared identifier is reported only once kernel/sched.c:506: erreur: for each function it appears in.) Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:57 +02:00
Ingo Molnar	27ec440779	sched: make cpu_clock() globally synchronous Alexey Zaytsev reported (and bisected) that the introduction of cpu_clock() in printk made the timestamps jump back and forth. Make cpu_clock() more reliable while still keeping it fast when it's called frequently. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:57 +02:00
Thomas Gleixner	06d8308c61	NOHZ: reevaluate idle sleep length after add_timer_on() add_timer_on() can add a timer on a CPU which is currently in a long idle sleep, but the timer wheel is not reevaluated by the nohz code on that CPU. So a timer can be delayed for quite a long time. This triggered a false positive in the clocksource watchdog code. To avoid this we need to wake up the idle CPU and enforce the reevaluation of the timer wheel for the next timer event. Add a function, which checks a given CPU for idle state, marks the idle task with NEED_RESCHED and sends a reschedule IPI to notify the other CPU of the change in the timer wheel. Call this function from add_timer_on(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: stable@kernel.org -- include/linux/sched.h \| 6 ++++++ kernel/sched.c \| 43 +++++++++++++++++++++++++++++++++++++++++++ kernel/timer.c \| 10 +++++++++- 3 files changed, 58 insertions(+), 1 deletion(-)	2008-03-26 08:28:55 +01:00
Heiko Carstens	22e52b072d	sched: add arch_update_cpu_topology hook. Will be called each time the scheduling domains are rebuild. Needed for architectures that don't have a static cpu topology. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-03-21 16:43:48 +01:00
Heiko Carstens	9aefd0abd8	sched: add exported arch_reinit_sched_domains() to header file. Needed so it can be called from outside of sched.c. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-03-21 16:43:47 +01:00
Roel Kluin	23e3c3cd2e	sched: remove double unlikely from schedule() Combine two unlikely's Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-03-21 16:43:47 +01:00
Peter Zijlstra	2070ee01d3	sched: cleanup old and rarely used 'debug' features. TREE_AVG and APPROX_AVG are initial task placement policies that have been disabled for a long while.. time to remove them. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-03-21 16:43:47 +01:00
Ingo Molnar	f540a6080a	sched: wakeup-buddy tasks are cache-hot Wakeup-buddy tasks are cache-hot - this makes it a bit harder for the load-balancer to tear them apart. (but it's still possible, if the load is sufficiently assymetric) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-03-19 04:27:53 +01:00
Ingo Molnar	4ae7d5cefd	sched: improve affine wakeups improve affine wakeups. Maintain the 'overlap' metric based on CFS's sum_exec_runtime - which means the amount of time a task executes after it wakes up some other task. Use the 'overlap' for the wakeup decisions: if the 'overlap' is short, it means there's strong workload coupling between this task and the woken up task. If the 'overlap' is large then the workload is decoupled and the scheduler will move them to separate CPUs more easily. ( Also slightly move the preempt_check within try_to_wake_up() - this has no effect on functionality but allows 'early wakeups' (for still-on-rq tasks) to be correctly accounted as well.) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-03-19 04:27:53 +01:00
Peter Zijlstra	aa2ac25229	sched: fix overload performance: buddy wakeups Currently we schedule to the leftmost task in the runqueue. When the runtimes are very short because of some server/client ping-pong, especially in over-saturated workloads, this will cycle through all tasks trashing the cache. Reduce cache trashing by keeping dependent tasks together by running newly woken tasks first. However, by not running the leftmost task first we could starve tasks because the wakee can gain unlimited runtime. Therefore we only run the wakee if its within a small (wakeup_granularity) window of the leftmost task. This preserves fairness, but does alternate server/client task groups. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-03-15 03:02:50 +01:00
Ingo Molnar	27d1172660	sched: fix calc_delta_mine() lw->weight can be 0 for a short time during bootup. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2008-03-15 03:02:50 +01:00

1 2 3 4 5 ...

584 Commits