Olof Johansson stated the following:
Comparing a va_list with NULL is bogus. It's supposed to be treated like
an opaque type and only be manipulated with va_* accessors.
Olof noticed that this code broke the ARM builds:
kernel/trace/trace.c: In function 'trace_array_vprintk':
kernel/trace/trace.c:1364: error: invalid operands to binary == (have 'va_list' and 'void *')
kernel/trace/trace.c: In function 'tracing_mark_write':
kernel/trace/trace.c:3349: error: incompatible type for argument 3 of 'trace_vprintk'
This patch partly reverts c13d2f7c32 and
re-installs the original mark_printk() mechanism.
Reported-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
LKML-Reference: <4B1BAB74.104@osadl.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There is a case where the graph tracer might get confused and omits
displaying of a single record. This applies mostly with the trace_pipe
since it is unlikely that the trace_seq buffer will overflow with the
trace file.
As the function_graph tracer goes through the trace entries keeping a
pointer to the current record:
current -> func1 ENTRY
func2 ENTRY
func2 RETURN
func1 RETURN
When an function ENTRY is encountered, it moves the pointer to the
next entry to check if the function is a nested or leaf function.
func1 ENTRY
current -> func2 ENTRY
func2 RETURN
func1 RETURN
If the rest of the writing of the function fills the trace_seq buffer,
then the trace_pipe read will ignore this entry. The next read will
Now start at the current location, but the first entry (func1) will
be discarded.
This patch keeps a copy of the current entry in the iterator private
storage and will keep track of when the trace_seq buffer fills. When
the trace_seq buffer fills, it will reuse the copy of the entry in the
next iteration.
[
This patch has been largely modified by Steven Rostedt in order to
clean it up and simplify it. The original idea and concept was from
Jirka and for that, this patch will go under his name to give him
the credit he deserves. But because this was modify by Steven Rostedt
anything wrong with the patch should be blamed on Steven.
]
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The trace_seq buffer might fill up, and right now one needs to check the
return value of each printf into the buffer to check for that.
Instead, have the buffer keep track of whether it is full or not, and
reject more input if it is full or would have overflowed with an input
that wasn't added.
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If the seq_read fills the buffer it will call s_start again on the next
itertation with the same position. This causes a problem with the
function_graph tracer because it consumes the iteration in order to
determine leaf functions.
What happens is that the iterator stores the entry, and the function
graph plugin will look at the next entry. If that next entry is a return
of the same function and task, then the function is a leaf and the
function_graph plugin calls ring_buffer_read which moves the ring buffer
iterator forward (the trace iterator still points to the function start
entry).
The copying of the trace_seq to the seq_file buffer will fail if the
seq_file buffer is full. The seq_read will not show this entry.
The next read by userspace will cause seq_read to again call s_start
which will reuse the trace iterator entry (the function start entry).
But the function return entry was already consumed. The function graph
plugin will think that this entry is a nested function and not a leaf.
To solve this, the trace code now checks the return status of the
seq_printf (trace_print_seq). If the writing to the seq_file buffer
fails, we set a flag in the iterator (leftover) and we do not reset
the trace_seq buffer. On the next call to s_start, we check the leftover
flag, and if it is set, we just reuse the trace_seq buffer and do not
call into the plugin print functions.
Before this patch:
2) | fput() {
2) | __fput() {
2) 0.550 us | inotify_inode_queue_event();
2) | __fsnotify_parent() {
2) 0.540 us | inotify_dentry_parent_queue_event();
After the patch:
2) | fput() {
2) | __fput() {
2) 0.550 us | inotify_inode_queue_event();
2) 0.548 us | __fsnotify_parent();
2) 0.540 us | inotify_dentry_parent_queue_event();
[
Updated the patch to fix a missing return 0 from the trace_print_seq()
stub when CONFIG_TRACING is disabled.
Reported-by: Ingo Molnar <mingo@elte.hu>
]
Reported-by: Jiri Olsa <jolsa@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This fixes a cut and paste error that had pipe_close get called
if pipe_open was defined (not pipe_close).
Reported-by: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
LKML-Reference: <20091209153204.F4CD.A69D9226@jp.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* 'bkl-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sys: Remove BKL from sys_reboot
pm_qos: clean up racy global "name" variable
pm_qos: remove BKL
When we define the common event fields in kprobe, we invert the error
handling and return immediately in case of success. Then we omit
to define specific kprobes fields (ip and nargs), and specific
kretprobes fields (func, ret_ip, nargs). And we only define them
when we fail to create common fields.
The most visible consequence is that we can't create filter for
k(ret)probes specific fields.
This patch re-invert the success/error handling to fix it.
Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <1260263815-5167-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The normalized values are also recalculated in case the scaling factor
changes.
This patch updates the internally used scheduler tuning values that are
normalized to one cpu in case a user sets new values via sysfs.
Together with patch 2 of this series this allows to let user configured
values scale (or not) to cpu add/remove events taking place later.
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1259579808-11357-4-git-send-email-ehrhardt@linux.vnet.ibm.com>
[ v2: fix warning ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As scaling now takes place on all kind of cpu add/remove events a user
that configures values via proc should be able to configure if his set
values are still rescaled or kept whatever happens.
As the comments state that log2 was just a second guess that worked the
interface is not just designed for on/off, but to choose a scaling type.
Currently this allows none, log and linear, but more important it allwos
us to keep the interface even if someone has an even better idea how to
scale the values.
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1259579808-11357-3-git-send-email-ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Based on Peter Zijlstras patch suggestion this enables recalculation of
the scheduler tunables in response of a change in the number of cpus. It
also adds a max of eight cpus that are considered in that scaling.
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1259579808-11357-2-git-send-email-ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
83f9ac removed a call to effective_prio() in wake_up_new_task(), which
leads to tasks running at MAX_PRIO.
This is caused by the idle thread being set to MAX_PRIO before forking
off init. O(1) used that to make sure idle was always preempted, CFS
uses check_preempt_curr_idle() for that so we can savely remove this bit
of legacy code.
Reported-by: Mike Galbraith <efault@gmx.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1259754383.4003.610.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When setting the weight for a per-cpu task-group, we have to put in a
phantom weight when there is no work on that cpu, otherwise we'll not
service that cpu when new work gets placed there until we again update
the per-cpu weights.
We used to add these phantom weights to the total, so that the idle
per-cpu shares don't get inflated, this however causes the non-idle
parts to get deflated, causing unexpected weight distibutions.
Reverse this, so that the non-idle shares are correct but the idle
shares are inflated.
Reported-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Tested-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1257934048.23203.76.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As Nick pointed out, and realized by myself when doing:
sched: Fix balance vs hotplug race
the patch:
sched: for_each_domain() vs RCU
is wrong, sched_domains are freed after synchronize_sched(), which
means disabling preemption is enough.
Reported-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WAKEUP_RUNNING was an experiment, not sure why that ever ended up being
merged...
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Streamline the wakeup preemption code a bit, unifying the preempt path
so that they all do the same.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If a RT task is woken up while a non-RT task is running,
check_preempt_wakeup() is called to check whether the new task can
preempt the old task. The function returns quickly without going deeper
because it is apparent that a RT task can always preempt a non-RT task.
In this situation, check_preempt_wakeup() always calls update_curr() to
update vruntime value of the currently running task. However, the
function call is unnecessary and redundant at that moment because (1) a
non-RT task can always be preempted by a RT task regardless of its
vruntime value, and (2) update_curr() will be called shortly when the
context switch between two occurs.
By moving update_curr() in check_preempt_wakeup(), we can avoid
redundant call to update_curr(), slightly reducing the time taken to
wake up RT tasks.
Signed-off-by: Jupyung Lee <jupyung@gmail.com>
[ Place update_curr() right before the wake_preempt_entity() call, which
is the only thing that relies on the updated vruntime ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1258451500-6714-1-git-send-email-jupyung@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently we try to do task placement in wake_up_new_task() after we do
the load-balance pass in sched_fork(). This yields complicated semantics
in that we have to deal with tasks on different RQs and the
set_task_cpu() calls in copy_process() and sched_fork()
Rename ->task_new() to ->task_fork() and call it from sched_fork()
before the balancing, this gives the policy a clear point to place the
task.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since set_task_clock() doesn't rely on rq->clock anymore we can simplyfy
the mess in ttwu().
Optimize things a bit by not fiddling with the IRQ state there.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
set_task_cpu() should be rq invariant and only touch task state, it
currently fails to do so, which opens up a few races, since not all
callers hold both rq->locks.
Remove the relyance on rq->clock, as any site calling set_task_cpu()
should also do a remote clock update, which should ensure the observed
time between these two cpus is monotonic, as per
kernel/sched_clock.c:sched_clock_remote().
Therefore we can simply remove the clock_offset bits and be happy.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since we've had a much saner debugfs interface to this, remove the
sysctl one.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
[ v2: build fix ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
sched_rr_get_param calls
task->sched_class->get_rr_interval(task) without protection
against a concurrent sched_setscheduler() call which modifies
task->sched_class.
Serialize the access with task_rq_lock(task) and hand the rq
pointer into get_rr_interval() as it's needed at least in the
sched_fair implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <alpine.LFD.2.00.0912090930120.3089@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
sched_getaffinity() is not protected against a concurrent
modification of the tasks affinity.
Serialize the access with task_rq_lock(task).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20091208202026.769251187@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In current code, children task will allocate memory for
'child->perf_event_ctxp' if the parent is counted, we can
do it only if the parent allowed children inherit it.
It can save memory and reduce overhead.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <4B1F19A8.5040805@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Clean up the code a bit:
- define 'perf_cpu_context' variable with 'static'
- use kzalloc() instead of kmalloc() and memset()
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <4B1F194D.7080306@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, when ptrace needs to modify a breakpoint, like disabling
it, changing its address, type or len, it calls
modify_user_hw_breakpoint(). This latter will perform the heavy and
racy task of unregistering the old breakpoint and registering a new
one.
This is racy as someone else might steal the reserved breakpoint
slot under us, which is undesired as the breakpoint is only
supposed to be modified, sometimes in the middle of a debugging
workflow. We don't want our slot to be stolen in the middle.
So instead of unregistering/registering the breakpoint, just
disable it while we modify its breakpoint fields and re-enable it
after if necessary.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <1260347148-5519-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'timers-for-linus-ntp' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ntp: Provide compability defines (You say MOD_NANO, I say ADJ_NANO)
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq: do not execute DEBUG_SHIRQ when irq setup failed
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers, init: Limit the number of per cpu calibration bootup messages
posix-cpu-timers: optimize and document timer_create callback
clockevents: Add missing include to pacify sparse
x86: vmiclock: Fix printk format
x86: Fix printk format due to variable type change
sparc: fix printk for change of variable type
clocksource/events: Fix fallout of generic code changes
nohz: Allow 32-bit machines to sleep for more than 2.15 seconds
nohz: Track last do_timer() cpu
nohz: Prevent clocksource wrapping during idle
nohz: Type cast printk argument
mips: Use generic mult/shift factor calculation for clocks
clocksource: Provide a generic mult/shift factor calculation
clockevents: Use u32 for mult and shift factors
nohz: Introduce arch_needs_cpu
nohz: Reuse ktime in sub-functions of tick_check_idle.
time: Remove xtime_cache
time: Implement logarithmic time accumulation
* 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block: (113 commits)
cfq-iosched: Do not access cfqq after freeing it
block: include linux/err.h to use ERR_PTR
cfq-iosched: use call_rcu() instead of doing grace period stall on queue exit
blkio: Allow CFQ group IO scheduling even when CFQ is a module
blkio: Implement dynamic io controlling policy registration
blkio: Export some symbols from blkio as its user CFQ can be a module
block: Fix io_context leak after failure of clone with CLONE_IO
block: Fix io_context leak after clone with CLONE_IO
cfq-iosched: make nonrot check logic consistent
io controller: quick fix for blk-cgroup and modular CFQ
cfq-iosched: move IO controller declerations to a header file
cfq-iosched: fix compile problem with !CONFIG_CGROUP
blkio: Documentation
blkio: Wait on sync-noidle queue even if rq_noidle = 1
blkio: Implement group_isolation tunable
blkio: Determine async workload length based on total number of queues
blkio: Wait for cfq queue to get backlogged if group is empty
blkio: Propagate cgroup weight updation to cfq groups
blkio: Drop the reference to queue once the task changes cgroup
blkio: Provide some isolation between groups
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM: Add flag for devices capable of generating run-time wake-up events
PM / Runtime: Remove unnecessary braces in __pm_runtime_set_status()
PM / Runtime: Make documentation of runtime_idle() agree with the code
PM / Runtime: Ensure timer_expires is nonzero in pm_schedule_suspend()
PM / Runtime: Use deferred_resume flag in pm_request_resume
PM / Runtime: Export the PM runtime workqueue
PM / Runtime: Fix lockdep warning in __pm_runtime_set_status()
PM / Hibernate: Swap, use KERN_CONT
PM / Hibernate: Shift remaining code from swsusp.c to hibernate.c
PM / Hibernate: Move swap functions to kernel/power/swap.c.
PM / freezer: Don't get over-anxious while waiting
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (84 commits)
KVM: VMX: Fix comparison of guest efer with stale host value
KVM: s390: Fix prefix register checking in arch/s390/kvm/sigp.c
KVM: Drop user return notifier when disabling virtualization on a cpu
KVM: VMX: Disable unrestricted guest when EPT disabled
KVM: x86 emulator: limit instructions to 15 bytes
KVM: s390: Make psw available on all exits, not just a subset
KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
KVM: VMX: Report unexpected simultaneous exceptions as internal errors
KVM: Allow internal errors reported to userspace to carry extra data
KVM: Reorder IOCTLs in main kvm.h
KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
KVM: only clear irq_source_id if irqchip is present
KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic
KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
KVM: VMX: Remove vmx->msr_offset_efer
KVM: MMU: update invlpg handler comment
KVM: VMX: move CR3/PDPTR update to vmx_set_cr3
KVM: remove duplicated task_switch check
KVM: powerpc: Fix BUILD_BUG_ON condition
KVM: VMX: Use shared msr infrastructure
...
Trivial conflicts due to new Kconfig options in arch/Kconfig and kernel/Makefile
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
mac80211: fix reorder buffer release
iwmc3200wifi: Enable wimax core through module parameter
iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
iwmc3200wifi: Coex table command does not expect a response
iwmc3200wifi: Update wiwi priority table
iwlwifi: driver version track kernel version
iwlwifi: indicate uCode type when fail dump error/event log
iwl3945: remove duplicated event logging code
b43: fix two warnings
ipw2100: fix rebooting hang with driver loaded
cfg80211: indent regulatory messages with spaces
iwmc3200wifi: fix NULL pointer dereference in pmkid update
mac80211: Fix TX status reporting for injected data frames
ath9k: enable 2GHz band only if the device supports it
airo: Fix integer overflow warning
rt2x00: Fix padding bug on L2PAD devices.
WE: Fix set events not propagated
b43legacy: avoid PPC fault during resume
b43: avoid PPC fault during resume
tcp: fix a timewait refcnt race
...
Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
kernel/sysctl_check.c
net/ipv4/sysctl_net_ipv4.c
net/ipv6/addrconf.c
net/sctp/sysctl.c
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
security/tomoyo: Remove now unnecessary handling of security_sysctl.
security/tomoyo: Add a special case to handle accesses through the internal proc mount.
sysctl: Drop & in front of every proc_handler.
sysctl: Remove CTL_NONE and CTL_UNNUMBERED
sysctl: kill dead ctl_handler definitions.
sysctl: Remove the last of the generic binary sysctl support
sysctl net: Remove unused binary sysctl code
sysctl security/tomoyo: Don't look at ctl_name
sysctl arm: Remove binary sysctl support
sysctl x86: Remove dead binary sysctl support
sysctl sh: Remove dead binary sysctl support
sysctl powerpc: Remove dead binary sysctl support
sysctl ia64: Remove dead binary sysctl support
sysctl s390: Remove dead sysctl binary support
sysctl frv: Remove dead binary sysctl support
sysctl mips/lasat: Remove dead binary sysctl support
sysctl drivers: Remove dead binary sysctl support
sysctl crypto: Remove dead binary sysctl support
sysctl security/keys: Remove dead binary sysctl support
sysctl kernel: Remove binary sysctl logic
...
get_user_pages() must be called with mmap_sem held.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: stable@kernel.org
Cc: Andrew Morton <akpm@linuxfoundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20091208121942.GA21298@basil.fritz.box>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Today's linux-next build failed with:
kernel/hw_breakpoint.c:86: error: 'task_bp_pinned' redeclared as different kind of symbol
...
Caused by commit dd17c8f729 ("percpu:
remove per_cpu__ prefix") from the percpu tree interacting with
commit 56053170ea ("hw-breakpoints:
Fix task-bound breakpoint slot allocation") from the tip tree.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20091208182515.bb6dda4a.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
An ftrace plugin can add a pipe_open interface when the user opens
trace_pipe. But if the plugin allocates something within the pipe_open
it can not free it because there exists no pipe_close. The hook to
the trace file open has a corresponding close. The closing of the
trace_pipe file should also have a corresponding close.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Whatever the context nature of a breakpoint, we always perform the
following constraint checks before allocating it a slot:
- Check the number of pinned breakpoint bound the concerned cpus
- Check the max number of task-bound breakpoints that are belonging
to a task.
- Add both and see if we have a reamining slot for the new breakpoint
This is the right thing to do when we are about to register a cpu-only
bound breakpoint. But not if we are dealing with a task bound
breakpoint. What we want in this case is:
- Check the number of pinned breakpoint bound the concerned cpus
- Check the number of breakpoints that already belong to the task
in which the breakpoint to register is bound to.
- Add both
This fixes a regression that makes the "firefox -g" command fail to
register breakpoints once we deal with a secondary thread.
Reported-by: Walt <w41ter@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Since (e761b77: cpu hotplug, sched: Introduce cpu_active_map and redo
sched domain managment) we have cpu_active_mask which is suppose to rule
scheduler migration and load-balancing, except it never (fully) did.
The particular problem being solved here is a crash in try_to_wake_up()
where select_task_rq() ends up selecting an offline cpu because
select_task_rq_fair() trusts the sched_domain tree to reflect the
current state of affairs, similarly select_task_rq_rt() trusts the
root_domain.
However, the sched_domains are updated from CPU_DEAD, which is after the
cpu is taken offline and after stop_machine is done. Therefore it can
race perfectly well with code assuming the domains are right.
Cure this by building the domains from cpu_active_mask on
CPU_DOWN_PREPARE.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit acc3f5d7ca ("cpumask:
Partition_sched_domains takes array of cpumask_var_t") changed
the function signature of generate_sched_domains() for the
CONFIG_SMP=y case, but forgot to update the corresponding
function for the CONFIG_SMP=n case, causing:
kernel/cpuset.c:2073: warning: passing argument 1 of 'generate_sched_domains' from incompatible pointer type
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.DEB.2.00.0912062038070.5693@ayla.of.borg>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch (as1306) exports the PM runtime workqueue for use by
loadable modules.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Use KERN_CONT in save_image() for printks, so that anybody won't
try to add a loglevel.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Shift the remaining declaration of the variable in_suspend and the
function swsusp_show_speed from swsusp.c to hibernate.c, and delete
swsusp.c.
Signed-off-by: Nigel Cunningham <nigel@tuxonice.net>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Move hibernation code's functions for allocating and freeing swap
from swsusp.c to swap.c, which is where you'd expect to find them.
Signed-off-by: Nigel Cunningham <nigel@tuxonice.net>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Fix min, max times in /proc/lock_stats
(1) When collecting lock hold and wait times, if the current minimum
time is zero, it will be replaced by the next time.
(2) When aggregating minimum and maximum lock hold and wait times
accross cpus, the values are added, instead of selecting the
minimum and maximum.
Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4B05BBAE.2050005@am.sony.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
struct perf_event::event callback was called when a breakpoint
triggers. But this is a rather opaque callback, pretty
tied-only to the breakpoint API and not really integrated into perf
as it triggers even when we don't overflow.
We prefer to use overflow_handler() as it fits into the perf events
rules, being called only when we overflow.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
Drop the callback and task parameters from modify_user_hw_breakpoint().
For now we have no user that need to modify a breakpoint to the point
of changing its handler or its task context.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (40 commits)
tracing: Separate raw syscall from syscall tracer
ring-buffer-benchmark: Add parameters to set produce/consumer priorities
tracing, function tracer: Clean up strstrip() usage
ring-buffer benchmark: Run producer/consumer threads at nice +19
tracing: Remove the stale include/trace/power.h
tracing: Only print objcopy version warning once from recordmcount
tracing: Prevent build warning: 'ftrace_graph_buf' defined but not used
ring-buffer: Move access to commit_page up into function used
tracing: do not disable interrupts for trace_clock_local
ring-buffer: Add multiple iterations between benchmark timestamps
kprobes: Sanitize struct kretprobe_instance allocations
tracing: Fix to use __always_unused attribute
compiler: Introduce __always_unused
tracing: Exit with error if a weak function is used in recordmcount.pl
tracing: Move conditional into update_funcs() in recordmcount.pl
tracing: Add regex for weak functions in recordmcount.pl
tracing: Move mcount section search to front of loop in recordmcount.pl
tracing: Fix objcopy revision check in recordmcount.pl
tracing: Check absolute path of input file in recordmcount.pl
tracing: Correct the check for number of arguments in recordmcount.pl
...
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Fix trace_marker output
tracing: Fix event format export
tracing: Fix return value of tracing_stats_read()
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
rcu: Make RCU's CPU-stall detector be default
rcu: Add expedited grace-period support for preemptible RCU
rcu: Enable fourth level of TREE_RCU hierarchy
rcu: Rename "quiet" functions
rcu: Re-arrange code to reduce #ifdef pain
rcu: Eliminate unneeded function wrapping
rcu: Fix grace-period-stall bug on large systems with CPU hotplug
rcu: Eliminate __rcu_pending() false positives
rcu: Further cleanups of use of lastcomp
rcu: Simplify association of forced quiescent states with grace periods
rcu: Accelerate callback processing on CPUs not detecting GP end
rcu: Mark init-time-only rcu_bootup_announce() as __init
rcu: Simplify association of quiescent states with grace periods
rcu: Rename dynticks_completed to completed_fqs
rcu: Enable synchronize_sched_expedited() fastpath
rcu: Remove inline from forward-referenced functions
rcu: Fix note_new_gpnum() uses of ->gpnum
rcu: Fix synchronization for rcu_process_gp_end() uses of ->completed counter
rcu: Prepare for synchronization fixes: clean up for non-NO_HZ handling of ->completed counter
rcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls
...
* 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ratelimit: Make suppressed output messages more useful
printk: Remove ratelimit.h from kernel.h
ratelimit: Fix/allow use in atomic contexts
ratelimit: Use per ratelimit context locking
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
mutex: Fix missing conditions to build mutex_spin_on_owner()
mutex: Better control mutex adaptive spinning config
locking, task_struct: Reduce size on TRACE_IRQFLAGS and 64bit
locking: Use __[SPIN|RW]_LOCK_UNLOCKED in [spin|rw]_lock_init()
locking: Remove unused prototype
locking: Reduce ifdefs in kernel/spinlock.c
locking: Make inlining decision Kconfig based
With CLONE_IO, parent's io_context->nr_tasks is incremented, but never
decremented whenever copy_process() fails afterwards, which prevents
exit_io_context() from calling IO schedulers exit functions.
Give a task_struct to exit_io_context(), and call exit_io_context() instead of
put_io_context() in copy_process() cleanup path.
Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
commit 8ec1e0ebe26087bfc5c0394ada5feb5758014fc8
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Dec 3 12:16:35 2009 +0100
ipv4: add sysctl to accept packets with local source addresses
Change fib_validate_source() to accept packets with a local source address when
the "accept_local" sysctl is set for the incoming inet device. Combined with the
previous patches, this allows to communicate between multiple local interfaces
over the wire.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't need to build mutex_spin_on_owner() if we have
CONFIG_DEBUG_MUTEXES or CONFIG_HAVE_DEFAULT_NO_SPIN_MUTEXES as
it won't be used under such configs.
Use CONFIG_MUTEX_SPIN_ON_OWNER as it gathers all the necessary
checks before building it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1259783357-8542-2-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Introduce CONFIG_MUTEX_SPIN_ON_OWNER so that we can centralize
in a single place the conditions that determine its definition
and use.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1259783357-8542-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Enable a fourth level of rcu_node hierarchy for TREE_RCU and
TREE_PREEMPT_RCU. This is for stress-testing and experiemental
purposes only, although in theory this would enable 16,777,216
CPUs on 64-bit systems, though only 1,048,576 CPUs on 32-bit
systems. Normal experimental use of this fourth level will
normally set CONFIG_RCU_FANOUT=2, requiring a 16-CPU system,
though the more adventurous (and more fortunate) experimenters
may wish to chose CONFIG_RCU_FANOUT=3 for 81-CPU systems or even
CONFIG_RCU_FANOUT=4 for 256-CPU systems.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12597846161257-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The number of "quiet" functions has grown recently, and the
names are no longer very descriptive. The point of all of these
functions is to do some portion of the task of reporting a
quiescent state, so rename them accordingly:
o cpu_quiet() becomes rcu_report_qs_rdp(), which reports a
quiescent state to the per-CPU rcu_data structure. If this
turns out to be a new quiescent state for this grace period,
then rcu_report_qs_rnp() will be invoked to propagate the
quiescent state up the rcu_node hierarchy.
o cpu_quiet_msk() becomes rcu_report_qs_rnp(), which reports
a quiescent state for a given CPU (or possibly a set of CPUs)
up the rcu_node hierarchy.
o cpu_quiet_msk_finish() becomes rcu_report_qs_rsp(), which
reports a full set of quiescent states to the global rcu_state
structure.
o task_quiet() becomes rcu_report_unblock_qs_rnp(), which reports
a quiescent state due to a task exiting an RCU read-side critical
section that had previously blocked in that same critical section.
As indicated by the new name, this type of quiescent state is
reported up the rcu_node hierarchy (using rcu_report_qs_rnp()
to do so).
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12597846163698-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On the parisc architecture we face for each and every loaded kernel module
this kernel "badness warning":
sysfs: cannot create duplicate filename '/module/ac97_bus/sections/.text'
Badness at fs/sysfs/dir.c:487
Reason for that is, that on parisc all kernel modules do have multiple
.text sections due to the usage of the -ffunction-sections compiler flag
which is needed to reach all jump targets on this platform.
An objdump on such a kernel module gives:
Sections:
Idx Name Size VMA LMA File off Algn
0 .note.gnu.build-id 00000024 00000000 00000000 00000034 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .text 00000000 00000000 00000000 00000058 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .text.ac97_bus_match 0000001c 00000000 00000000 00000058 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
3 .text 00000000 00000000 00000000 000000d4 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
...
Since the .text sections are empty (size of 0 bytes) and won't be
loaded by the kernel module loader anyway, I don't see a reason
why such sections need to be listed under
/sys/module/<module_name>/sections/<section_name> either.
The attached patch does solve this issue by not exporting section
names which are empty.
This fixes bugzilla http://bugzilla.kernel.org/show_bug.cgi?id=14703
Signed-off-by: Helge Deller <deller@gmx.de>
CC: rusty@rustcorp.com.au
CC: akpm@linux-foundation.org
CC: James.Bottomley@HansenPartnership.com
CC: roland@redhat.com
CC: dave@hiauly1.hia.nrc.ca
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a real fix for problem of utime/stime values decreasing
described in the thread:
http://lkml.org/lkml/2009/11/3/522
Now cputime is accounted in the following way:
- {u,s}time in task_struct are increased every time when the thread
is interrupted by a tick (timer interrupt).
- When a thread exits, its {u,s}time are added to signal->{u,s}time,
after adjusted by task_times().
- When all threads in a thread_group exits, accumulated {u,s}time
(and also c{u,s}time) in signal struct are added to c{u,s}time
in signal struct of the group's parent.
So {u,s}time in task struct are "raw" tick count, while
{u,s}time and c{u,s}time in signal struct are "adjusted" values.
And accounted values are used by:
- task_times(), to get cputime of a thread:
This function returns adjusted values that originates from raw
{u,s}time and scaled by sum_exec_runtime that accounted by CFS.
- thread_group_cputime(), to get cputime of a thread group:
This function returns sum of all {u,s}time of living threads in
the group, plus {u,s}time in the signal struct that is sum of
adjusted cputimes of all exited threads belonged to the group.
The problem is the return value of thread_group_cputime(),
because it is mixed sum of "raw" value and "adjusted" value:
group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)
This misbehavior can break {u,s}time monotonicity.
Assume that if there is a thread that have raw values greater
than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
but only runs 45ms) and if it exits, cputime will decrease (e.g.
-5ms).
To fix this, we could do:
group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)
But task_times() contains hard divisions, so applying it for
every thread should be avoided.
This patch fixes the above problem in the following way:
- Modify thread's exit (= __exit_signal()) not to use task_times().
It means {u,s}time in signal struct accumulates raw values instead
of adjusted values. As the result it makes thread_group_cputime()
to return pure sum of "raw" values.
- Introduce a new function thread_group_times(*task, *utime, *stime)
that converts "raw" values of thread_group_cputime() to "adjusted"
values, in same calculation procedure as task_times().
- Modify group's exit (= wait_task_zombie()) to use this introduced
thread_group_times(). It make c{u,s}time in signal struct to
have adjusted values like before this patch.
- Replace some thread_group_cputime() by thread_group_times().
This replacements are only applied where conveys the "adjusted"
cputime to users, and where already uses task_times() near by it.
(i.e. sys_times(), getrusage(), and /proc/<PID>/stat.)
This patch have a positive side effect:
- Before this patch, if a group contains many short-life threads
(e.g. runs 0.9ms and not interrupted by ticks), the group's
cputime could be invisible since thread's cputime was accumulated
after adjusted: imagine adjustment function as adj(ticks, runtime),
{adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
After this patch it will not happen because the adjustment is
applied after accumulated.
v2:
- remove if()s, put new variables into signal_struct.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Spencer Candland <spencer@bluehost.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
LKML-Reference: <4B162517.8040909@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- Remove if({u,s}t)s because no one call it with NULL now.
- Use cputime_{add,sub}().
- Add ifndef-endif for prev_{u,s}time since they are used
only when !VIRT_CPU_ACCOUNTING.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Spencer Candland <spencer@bluehost.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
LKML-Reference: <4B1624C7.7040302@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Anton Blanchard wrote:
> We allocate and zero cpu_isolated_map after the isolcpus
> __setup option has run. This means cpu_isolated_map always
> ends up empty and if CPUMASK_OFFSTACK is enabled we write to a
> cpumask that hasn't been allocated.
I introduced this regression in 49557e6203 (sched: Fix
boot crash by zalloc()ing most of the cpu masks).
Use the bootmem allocator if they set isolcpus=, otherwise
allocate and zero like normal.
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: peterz@infradead.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
LKML-Reference: <200912021409.17013.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Anton Blanchard <anton@samba.org>
Instead of using per_cpu(..., raw_smp_processor_id()), use
__get_cpu_var(...).
Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1259578491-4589-1-git-send-email-avi@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
498657a478 incorrectly assumed
that preempt wasn't disabled around context_switch() and thus
was fixing imaginary problem. It also broke KVM because it
depended on ->sched_in() to be called with irq enabled so that
it can do smp calls from there.
Revert the incorrect commit and add comment describing different
contexts under with the two callbacks are invoked.
Avi: spotted transposed in/out in the added comment.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Avi Kivity <avi@redhat.com>
Cc: peterz@infradead.org
Cc: efault@gmx.de
Cc: rusty@rustcorp.com.au
LKML-Reference: <1259726212-30259-2-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In the CONFIG_PERF_USE_VMALLOC case, perf_mmap_data_free() only
schedules the cleanup of the perf_mmap_data struct. In that
case we have to wait until the work has been done before we free
data.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@kernel.org>
LKML-Reference: <1259697901-1747-1-git-send-email-krh@bitplanet.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After duplications are removed, syscall_name_to_nr() is unused.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D2A6.6060803@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
use only one prof_sysenter_enable() instead of
prof_sysenter_enable_##sname()
use only one prof_sysenter_disable() instead of
prof_sysenter_disable_##sname()
use only one prof_sysexit_enable() instead of
prof_sysexit_enable_##sname()
use only one prof_sysexit_disable() instead of
prof_sysexit_disable_##sname()
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D2A1.8060304@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
use only one init_syscall_trace instead of
many init_enter_##sname()/init_exit_##sname()
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D29B.6090708@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add syscall_nr field to struct syscall_metadata,
it helps us to get syscall number easier.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D293.6090800@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
use ->enter_event->id instead of ->enter_id
use ->exit_event->id instead of ->exit_id
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D288.7030001@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Set event_enter_##sname->data to its metadata,
it makes codes simpler.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D282.7050709@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commits 3d7a641 ("SLOW_WORK: Wait for outstanding work items belonging to a
module to clear") introduced some code to make sure that all of a module's
slow-work items were complete before that module was removed, and commit
3bde31a ("SLOW_WORK: Allow a requeueable work item to sleep till the thread is
needed") further extended that, breaking it in the process if CONFIG_MODULES=n:
CC kernel/slow-work.o
kernel/slow-work.c: In function 'slow_work_execute':
kernel/slow-work.c:313: error: 'slow_work_thread_processing' undeclared (first use in this function)
kernel/slow-work.c:313: error: (Each undeclared identifier is reported only once
kernel/slow-work.c:313: error: for each function it appears in.)
kernel/slow-work.c: In function 'slow_work_wait_for_items':
kernel/slow-work.c:950: error: 'slow_work_unreg_sync_lock' undeclared (first use in this function)
kernel/slow-work.c:951: error: 'slow_work_unreg_wq' undeclared (first use in this function)
kernel/slow-work.c:961: error: 'slow_work_unreg_work_item' undeclared (first use in this function)
kernel/slow-work.c:974: error: 'slow_work_unreg_module' undeclared (first use in this function)
kernel/slow-work.c:977: error: 'slow_work_thread_processing' undeclared (first use in this function)
make[1]: *** [kernel/slow-work.o] Error 1
Fix this by:
(1) Extracting the bits of slow_work_execute() that are contingent on
CONFIG_MODULES, and the bits that should be, into inline functions and
placing them into the #ifdef'd section that defines the relevant variables
and adding stubs for moduleless kernels. This allows the removal of some
#ifdefs.
(2) #ifdef'ing out the contents of slow_work_wait_for_items() in moduleless
kernels.
The four functions related to handling module unloading synchronisation (and
their associated variables) could be offloaded into a separate .c file, but
each function is only used once and three of them are tiny, so doing so would
prevent them from being inlined.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a memory leak case in create_trace_probe(). When an argument
is too long (> MAX_ARGSTR_LEN), it just jumps to error path. In
that case tp->args[i].name is not released.
This also fixes a bug to check kstrdup()'s return value.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091201001919.10235.56455.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fork() clones all thread_info flags, including
TIF_USER_RETURN_NOTIFY; if the new task is first scheduled on a cpu
which doesn't have user return notifiers set, this causes user
return notifiers to trigger without any way of clearing itself.
This is easy to trigger with a forky workload on the host in
parallel with kvm, resulting in a cpu in an endless loop on the
verge of returning to userspace.
Fix by dropping the TIF_USER_RETURN_NOTIFY immediately after fork.
Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1259505288-16559-1-git-send-email-avi@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
"symbol_name+0" is not so friendly.
It makes the output longer.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B0CEBCB.7080309@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>