When !CONFIG_SMP, cpu_stop functions weren't defined at all which
could lead to build failures if UP code uses cpu_stop facility. Add
dummy cpu_stop implementation for UP. The waiting variants execute
the work function directly with preempt disabled and
stop_one_cpu_nowait() schedules a workqueue work.
Makefile and ifdefs around stop_machine implementation are updated to
accomodate CONFIG_SMP && !CONFIG_STOP_MACHINE case.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ingo Molnar <mingo@elte.hu>
* 'v4l_for_2.6.34' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6:
V4L/DVB: pxa_camera: move fifo reset direct before dma start
V4L/DVB: video: testing unsigned for less than 0
V4L/DVB: mx1-camera: compile fix
V4L/DVB: budget: Oops: "BUG: unable to handle kernel NULL pointer dereference"
V4L/DVB: ngene: Workaround for stuck DiSEqC pin
V4L/DVB: saa7146: fix regression of the av7110/budget-av driver
V4L/DVB: v4l: fix config dependencies: mxb and saa7191 are V4L2 drivers, not V4L1
V4L/DVB: feature-removal: announce videotext.h removal
V4L/DVB: V4L - vpfe capture - fix for kernel crash
V4L/DVB: gspca: make usb id 0461:0815 get handled by the right driver
V4L/DVB: gspca - stv06xx: Remove the 046d:08da from the stv06xx driver
V4L/DVB: gspca - sn9c20x: Correct onstack wait_queue_head declaration
V4L/DVB: saa7146: fix up bytesperline if it is an impossible value
V4L/DVB: V4L: vpfe_capture - free ccdc_lock when memory allocation fails
V4L/DVB: V4L - Makfile:Removed duplicate entry of davinci
V4L/DVB: omap24xxcam: potential buffer overflow
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: create rcu_my_thread_group_empty() wrapper
memcg: css_id() must be called under rcu_read_lock()
cgroup: Check task_lock in task_subsys_state()
sched: Fix an RCU warning in print_task()
cgroup: Fix an RCU warning in alloc_css_id()
cgroup: Fix an RCU warning in cgroup_path()
KEYS: Fix an RCU warning in the reading of user keys
KEYS: Fix an RCU warning
Add group scheduling transactional APIs to struct pmu.
These APIs will be implemented in arch code, based on Peter's idea as
below.
> the idea behind hw_perf_group_sched_in() is to not perform
> schedulability tests on each event in the group, but to add the group
> as a whole and then perform one test.
>
> Of course, when that test fails, you'll have to roll-back the whole
> group again.
>
> So start_txn (or a better name) would simply toggle a flag in the pmu
> implementation that will make pmu::enable() not perform the
> schedulablilty test.
>
> Then commit_txn() will perform the schedulability test (so note the
> method has to have a !void return value.
>
> This will allow us to use the regular
> kernel/perf_event.c::group_sched_in() and all the rollback code.
> Currently each hw_perf_group_sched_in() implementation duplicates all
> the rolllback code (with various bugs).
->start_txn:
Start group events scheduling transaction, set a flag to make
pmu::enable() not perform the schedulability test, it will be performed
at commit time.
->commit_txn:
Commit group events scheduling transaction, perform the group
schedulability as a whole
->cancel_txn:
Stop group events scheduling transaction, clear the flag so
pmu::enable() will perform the schedulability test.
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1272002160.5707.60.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rename perf_event_attr::precise to perf_event_attr::precise_ip and
widen it to 2 bits. This new field describes the required precision of
the PERF_SAMPLE_IP field:
0 - SAMPLE_IP can have arbitrary skid
1 - SAMPLE_IP must have constant skid
2 - SAMPLE_IP requested to have 0 skid
3 - SAMPLE_IP must have 0 skid
And modify the Intel PEBS code accordingly. The PEBS implementation
now supports up to precise_ip == 2, where we perform the IP fixup.
Also s/PERF_RECORD_MISC_EXACT/&_IP/ to clarify its meaning, this bit
should be set for each PERF_SAMPLE_IP field known to match the actual
instruction triggering the event.
This new scheme allows for a PEBS mode that uses the buffer for more
than a single event.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Both Stephane and Corey reported that PERF_FORMAT_GROUP didn't work
as expected if the task the counters were attached to quit before
the read() call.
The cause is that we unconditionally destroy the grouping when we
remove counters from their context. Fix this by only doing this when
we free the counter itself.
Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1273160566.5605.404.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
struct rq isn't visible outside of sched.o so its near useless to
expose the pointer, also there are no users of it, so remove it.
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1272997616.1642.207.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's unused and buggy in its current form, since it can place a bo
in the reserved state without removing it from lru lists.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
An earlier regression fix for the mxb driver (V4L/DVB: saa7146_vv: fix
regression where v4l2_device was registered too late) caused a new
regression in the av7110 driver.
Reverted the old fix and fixed the problem in the mxb driver instead.
Tested on mxb and budget-av cards.
The real problem is that the saa7146 framework has separate probe()
and attach() driver callbacks which should be rolled into one. This
is now done for the mxb driver, but others should do the same. Lack
of hardware makes this hard to do, though. I hope to get hold of some
hexium cards and then I can try to improve the framework to prevent
this from happening again.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Currently migration_thread is serving three purposes - migration
pusher, context to execute active_load_balance() and forced context
switcher for expedited RCU synchronize_sched. All three roles are
hardcoded into migration_thread() and determining which job is
scheduled is slightly messy.
This patch kills migration_thread and replaces all three uses with
cpu_stop. The three different roles of migration_thread() are
splitted into three separate cpu_stop callbacks -
migration_cpu_stop(), active_load_balance_cpu_stop() and
synchronize_sched_expedited_cpu_stop() - and each use case now simply
asks cpu_stop to execute the callback as necessary.
synchronize_sched_expedited() was implemented with private
preallocated resources and custom multi-cpu queueing and waiting
logic, both of which are provided by cpu_stop.
synchronize_sched_expedited_count is made atomic and all other shared
resources along with the mutex are dropped.
synchronize_sched_expedited() also implemented a check to detect cases
where not all the callback got executed on their assigned cpus and
fall back to synchronize_sched(). If called with cpu hotplug blocked,
cpu_stop already guarantees that and the condition cannot happen;
otherwise, stop_machine() would break. However, this patch preserves
the paranoid check using a cpumask to record on which cpus the stopper
ran so that it can serve as a bisection point if something actually
goes wrong theree.
Because the internal execution state is no longer visible,
rcu_expedited_torture_stats() is removed.
This patch also renames cpu_stop threads to from "stopper/%d" to
"migration/%d". The names of these threads ultimately don't matter
and there's no reason to make unnecessary userland visible changes.
With this patch applied, stop_machine() and sched now share the same
resources. stop_machine() is faster without wasting any resources and
sched migration users are much cleaner.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Josh Triplett <josh@freedesktop.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Reimplement stop_machine using cpu_stop. As cpu stoppers are
guaranteed to be available for all online cpus,
stop_machine_create/destroy() are no longer necessary and removed.
With resource management and synchronization handled by cpu_stop, the
new implementation is much simpler. Asking the cpu_stop to execute
the stop_cpu() state machine on all online cpus with cpu hotplug
disabled is enough.
stop_machine itself doesn't need to manage any global resources
anymore, so all per-instance information is rolled into struct
stop_machine_data and the mutex and all static data variables are
removed.
The previous implementation created and destroyed RT workqueues as
necessary which made stop_machine() calls highly expensive on very
large machines. According to Dimitri Sivanich, preventing the dynamic
creation/destruction makes booting faster more than twice on very
large machines. cpu_stop resources are preallocated for all online
cpus and should have the same effect.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Implement a simplistic per-cpu maximum priority cpu monopolization
mechanism. A non-sleeping callback can be scheduled to run on one or
multiple cpus with maximum priority monopolozing those cpus. This is
primarily to replace and unify RT workqueue usage in stop_machine and
scheduler migration_thread which currently is serving multiple
purposes.
Four functions are provided - stop_one_cpu(), stop_one_cpu_nowait(),
stop_cpus() and try_stop_cpus().
This is to allow clean sharing of resources among stop_cpu and all the
migration thread users. One stopper thread per cpu is created which
is currently named "stopper/CPU". This will eventually replace the
migration thread and take on its name.
* This facility was originally named cpuhog and lived in separate
files but Peter Zijlstra nacked the name and thus got renamed to
cpu_stop and moved into stop_machine.c.
* Better reporting of preemption leak as per Peter's suggestion.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Some RCU-lockdep splat repairs need to know whether they are running
in a single-threaded process. Unfortunately, the thread_group_empty()
primitive is defined in sched.h, and can induce #include hell. This
commit therefore introduces a rcu_my_thread_group_empty() wrapper that
is defined in rcupdate.c, thus avoiding the need to include sched.h
everywhere.
Signed-off-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
ICMP protocol unreachable handling completely disregarded
the fact that the user may have locked the socket. It proceeded
to destroy the association, even though the user may have
held the lock and had a ref on the association. This resulted
in the following:
Attempt to release alive inet socket f6afcc00
=========================
[ BUG: held lock freed! ]
-------------------------
somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held
there!
(sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
1 lock held by somenu/2672:
#0: (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
stack backtrace:
Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55
Call Trace:
[<c1232266>] ? printk+0xf/0x11
[<c1038553>] debug_check_no_locks_freed+0xce/0xff
[<c10620b4>] kmem_cache_free+0x21/0x66
[<c1185f25>] __sk_free+0x9d/0xab
[<c1185f9c>] sk_free+0x1c/0x1e
[<c1216e38>] sctp_association_put+0x32/0x89
[<c1220865>] __sctp_connect+0x36d/0x3f4
[<c122098a>] ? sctp_connect+0x13/0x4c
[<c102d073>] ? autoremove_wake_function+0x0/0x33
[<c12209a8>] sctp_connect+0x31/0x4c
[<c11d1e80>] inet_dgram_connect+0x4b/0x55
[<c11834fa>] sys_connect+0x54/0x71
[<c103a3a2>] ? lock_release_non_nested+0x88/0x239
[<c1054026>] ? might_fault+0x42/0x7c
[<c1054026>] ? might_fault+0x42/0x7c
[<c11847ab>] sys_socketcall+0x6d/0x178
[<c10da994>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c1002959>] syscall_call+0x7/0xb
This was because the sctp_wait_for_connect() would aqcure the socket
lock and then proceed to release the last reference count on the
association, thus cause the fully destruction path to finish freeing
the socket.
The simplest solution is to start a very short timer in case the socket
is owned by user. When the timer expires, we can do some verification
and be able to do the release properly.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When more than one header is included under CREATE_TRACE_POINTS
the DECLARE_TRACE() macro is not defined back to its original meaning
and the second include will fail to initialize the TRACE_EVENT()
and DECLARE_TRACE() correctly.
To fix this the tracepoint.h file moves the define of DECLARE_TRACE()
out of the #ifdef _LINUX_TRACEPOINT_H protection (just like the
define of the TRACE_EVENT()). This way the define_trace.h will undef
the DECLARE_TRACE() at the end and allow new headers to start
from scratch.
This patch also requires fixing the include/events/napi.h
It currently uses DECLARE_TRACE() and should be converted to a TRACE_EVENT()
format. But I'll leave that change to the authors of that file.
But since the napi.h file depends on using the CREATE_TRACE_POINTS
and does not define its own DEFINE_TRACE() it must use the define_trace.h
method instead.
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
In perverse acpi implementations the isa irqs are not identity mapped
to the first 16 gsi. Furthermore at least the extended interrupt
resource capability may return gsi's and not isa irqs. So since
what we get from acpi is a gsi teach acpi_get_overrride_irq to
operate on a gsi instead of an isa_irq.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-2-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
There are a number of cases where the current code makes the assumption
that isa irqs identity map to the first 16 acpi global system intereupts.
In most instances that assumption is correct as that is the required
behaviour in dual i8259 mode and the default behavior in ioapic mode.
However there are some systems out there that take advantage of acpis
interrupt remapping for the isa irqs to have a completely different
mapping of isa_irq to gsi.
Introduce acpi_isa_irq_to_gsi to perform this mapping explicitly in the
code that needs it. Initially this will be just the current assumed
identity mapping to ensure it's introduction does not cause regressions.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-1-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Fix function prototype visibility issues when compiling for non-x86
architectures. Tested with crosstool
(ftp://ftp.kernel.org/pub/tools/crosstool/) with alpha, ia64 and sparc
targets.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100503130736.GD26107@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Expand task_subsys_state()'s rcu_dereference_check() to include the full
locking rule as documented in Documentation/cgroups/cgroups.txt by adding
a check for task->alloc_lock being held.
This fixes an RCU false positive when resuming from suspend. The warning
comes from freezer cgroup in cgroup_freezing_or_frozen().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The ftrace.h file contains several functions as macros when the
functions are disabled due to config options. This patch converts
most of them to static inlines.
There are two exceptions:
register_ftrace_function() and unregister_ftrace_function()
This is because their parameter "ops" must not be evaluated since
code using the function is allowed to #ifdef out the creation of
the parameter.
This also fixes an error caused by recent changes:
kernel/trace/trace_irqsoff.c: In function 'start_irqsoff_tracer':
kernel/trace/trace_irqsoff.c:571: error: expected expression before 'do'
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Using a single list for all userspace devices leads to a dead lock
on multiplexed buses in some circumstances (mux chip instantiated
from userspace). This is solved by using a separate list for each
bus segment.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Michael Lawnick <ml.lawnick@gmx.de>
Fix this one:
include/net/sock.h: error: two or more data types in declaration specifiers
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The breakpoint generic layer assumes that archs always know in advance
the static number of address registers available to host breakpoints
through the HBP_NUM macro.
However this is not true for every archs. For example Arm needs to get
this information dynamically to handle the compatiblity between
different versions.
To solve this, this patch proposes to drop the static HBP_NUM macro
and let the arch provide the number of available slots through a
new hw_breakpoint_slots() function. For archs that have
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS selected, it will be called once
as the number of registers fits for instruction and data breakpoints
together.
For the others it will be called first to get the number of
instruction breakpoint registers and another time to get the
data breakpoint registers, the targeted type is given as a
parameter of hw_breakpoint_slots().
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
There are two outstanding fashions for archs to implement hardware
breakpoints.
The first is to separate breakpoint address pattern definition
space between data and instruction breakpoints. We then have
typically distinct instruction address breakpoint registers
and data address breakpoint registers, delivered with
separate control registers for data and instruction breakpoints
as well. This is the case of PowerPc and ARM for example.
The second consists in having merged breakpoint address space
definition between data and instruction breakpoint. Address
registers can host either instruction or data address and
the access mode for the breakpoint is defined in a control
register. This is the case of x86 and Super H.
This patch adds a new CONFIG_HAVE_MIXED_BREAKPOINTS_REGS config
that archs can select if they belong to the second case. Those
will have their slot allocation merged for instructions and
data breakpoints.
The others will have a separate slot tracking between data and
instruction breakpoints.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
For more clearance what the functions actually do,
usb_buffer_alloc() is renamed to usb_alloc_coherent()
usb_buffer_free() is renamed to usb_free_coherent()
They should only be used in code which really needs DMA coherency.
[added compatibility macros so we can convert things easier - gregkh]
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Pedro Ribeiro <pedrib@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4
nfs: fix some issues in nfs41_proc_reclaim_complete()
NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear
NFS: Fix an unstable write data integrity race
nfs: testing for null instead of ERR_PTR()
NFS: rsize and wsize settings ignored on v4 mounts
NFSv4: Don't attempt an atomic open if the file is a mountpoint
SUNRPC: Fix a bug in rpcauth_prune_expired
Ok, version 4
Change Notes:
1) Minor cleanups, from Vlads notes
Summary:
Hey-
Recently, it was reported to me that the kernel could oops in the
following way:
<5> kernel BUG at net/core/skbuff.c:91!
<5> invalid operand: 0000 [#1]
<5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
mptbase sd_mod scsi_mod
<5> CPU: 0
<5> EIP: 0060:[<c02bff27>] Not tainted VLI
<5> EFLAGS: 00010216 (2.6.9-89.0.25.EL)
<5> EIP is at skb_over_panic+0x1f/0x2d
<5> eax: 0000002c ebx: c033f461 ecx: c0357d96 edx: c040fd44
<5> esi: c033f461 edi: df653280 ebp: 00000000 esp: c040fd40
<5> ds: 007b es: 007b ss: 0068
<5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
<5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
e0c2947d
<5> 00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
df653490
<5> 00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
00000004
<5> Call Trace:
<5> [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
<5> [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
<5> [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
<5> [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
<5> [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
<5> [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
<5> [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
<5> [<c01555a4>] cache_grow+0x140/0x233
<5> [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
<5> [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
<5> [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
<5> [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
<5> [<c02d005e>] nf_iterate+0x40/0x81
<5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5> [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
<5> [<c02d0362>] nf_hook_slow+0x83/0xb5
<5> [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
<5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5> [<c02e103e>] ip_rcv+0x334/0x3b4
<5> [<c02c66fd>] netif_receive_skb+0x320/0x35b
<5> [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
<5> [<c02c67a4>] process_backlog+0x6c/0xd9
<5> [<c02c690f>] net_rx_action+0xfe/0x1f8
<5> [<c012a7b1>] __do_softirq+0x35/0x79
<5> [<c0107efb>] handle_IRQ_event+0x0/0x4f
<5> [<c01094de>] do_softirq+0x46/0x4d
Its an skb_over_panic BUG halt that results from processing an init chunk in
which too many of its variable length parameters are in some way malformed.
The problem is in sctp_process_unk_param:
if (NULL == *errp)
*errp = sctp_make_op_error_space(asoc, chunk,
ntohs(chunk->chunk_hdr->length));
if (*errp) {
sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
WORD_ROUND(ntohs(param.p->length)));
sctp_addto_chunk(*errp,
WORD_ROUND(ntohs(param.p->length)),
param.v);
When we allocate an error chunk, we assume that the worst case scenario requires
that we have chunk_hdr->length data allocated, which would be correct nominally,
given that we call sctp_addto_chunk for the violating parameter. Unfortunately,
we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
chunk, so the worst case situation in which all parameters are in violation
requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.
The result of this error is that a deliberately malformed packet sent to a
listening host can cause a remote DOS, described in CVE-2010-1173:
http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173
I've tested the below fix and confirmed that it fixes the issue. We move to a
strategy whereby we allocate a fixed size error chunk and ignore errors we don't
have space to report. Tested by me successfully
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
sfc: Change falcon_probe_board() to fail for unsupported boards
sfc: Always close net device at the end of a disabling reset
sfc: Wait at most 10ms for the MC to finish reading out MAC statistics
sctp: Fix oops when sending queued ASCONF chunks
sctp: fix to calc the INIT/INIT-ACK chunk length correctly is set
sctp: per_cpu variables should be in bh_disabled section
sctp: fix potential reference of a freed pointer
sctp: avoid irq lock inversion while call sk->sk_data_ready()
Revert "tcp: bind() fix when many ports are bound"
net/usb: add sierra_net.c driver
cdc_ether: fix autosuspend for mbm devices
bluetooth: handle l2cap_create_connless_pdu() errors
gianfar: Wait for both RX and TX to stop
ipheth: potential null dereferences on error path
smc91c92_cs: spin_unlock_irqrestore before calling smc_interrupt()
drivers/usb/net/kaweth.c: add device "Allied Telesyn AT-USB10 USB Ethernet Adapter"
bnx2: Update version to 2.0.9.
bnx2: Prevent "scheduling while atomic" warning with cnic, bonding and vlan.
bnx2: Fix lost MSI-X problem on 5709 NICs.
cxgb3: Wait longer for control packets on initialization
...
sk->sk_data_ready() of sctp socket can be called from both BH and non-BH
contexts, but the default sk->sk_data_ready(), sock_def_readable(), can
not be used in this case. Therefore, we have to make a new function
sctp_data_ready() to grab sk->sk_data_ready() with BH disabling.
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.33-rc6 #129
---------------------------------------------------------
sctp_darn/1517 just changed the state of lock:
(clock-AF_INET){++.?..}, at: [<c06aab60>] sock_def_readable+0x20/0x80
but this lock took another, SOFTIRQ-unsafe lock in the past:
(slock-AF_INET){+.-...}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
1 lock held by sctp_darn/1517:
#0: (sk_lock-AF_INET){+.+.+.}, at: [<cdfe363d>] sctp_sendmsg+0x23d/0xc00 [sctp]
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
coda: move backing-dev.h kernel include inside __KERNEL__
mtd: ensure that bdi entries are properly initialized and registered
Move mtd_bdi_*mappable to mtdcore.c
btrfs: convert to using bdi_setup_and_register()
Catch filesystems lacking s_bdi
drbd: Terminate a connection early if sending the protocol fails
drbd: fix memory leak
Fix JFFS2 sync silent failure
smbfs: add bdi backing to mount session
ncpfs: add bdi backing to mount session
exofs: add bdi backing to mount session
ecryptfs: add bdi backing to mount session
coda: add bdi backing to mount session
cifs: add bdi backing to mount session
afs: add bdi backing to mount session.
9p: add bdi backing to mount session
bdi: add helper function for doing init and register of a bdi for a file system
block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timer
When performing a non-consuming read, a synchronize_sched() is
performed once for every cpu which is actively tracing.
This is very expensive, and can make it take several seconds to open
up the 'trace' file with lots of cpus.
Only one synchronize_sched() call is actually necessary. What is
desired is for all cpus to see the disabling state change. So we
transform the existing sequence:
for_each_cpu() {
ring_buffer_read_start();
}
where each ring_buffer_start() call performs a synchronize_sched(),
into the following:
for_each_cpu() {
ring_buffer_read_prepare();
}
ring_buffer_read_prepare_sync();
for_each_cpu() {
ring_buffer_read_start();
}
wherein only the single ring_buffer_read_prepare_sync() call needs to
do the synchronize_sched().
The first phase, via ring_buffer_read_prepare(), allocates the 'iter'
memory and increments ->record_disabled.
In the second phase, ring_buffer_read_prepare_sync() makes sure this
->record_disabled state is visible fully to all cpus.
And in the final third phase, the ring_buffer_read_start() calls reset
the 'iter' objects allocated in the first phase since we now know that
none of the cpus are adding trace entries any more.
This makes openning the 'trace' file nearly instantaneous on a
sparc64 Niagara2 box with 128 cpus tracing.
Signed-off-by: David S. Miller <davem@davemloft.net>
LKML-Reference: <20100420.154711.11246950.davem@davemloft.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Lists of DMA channels and slaves are not changed, make them constant. Besides,
SH7724 channel and slave configuration of both DMA controllers is identical,
remove the extra copy of the configuration data.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
noop_backing_dev_info is used only as a flag to mark filesystems that
don't have any backing store, like tmpfs, procfs, spufs, etc.
Signed-off-by: Joern Engel <joern@logfs.org>
Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes
to the noop_backing_dev_info is not legal and will not result in
them being flushed, but we already catch this condition in
__mark_inode_dirty() when checking for a registered bdi.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
If a futex key happens to be located within a huge page mapped
MAP_PRIVATE, get_futex_key() can go into an infinite loop waiting for a
page->mapping that will never exist.
See https://bugzilla.redhat.com/show_bug.cgi?id=552257 for more details
about the problem.
This patch makes page->mapping a poisoned value that includes
PAGE_MAPPING_ANON mapped MAP_PRIVATE. This is enough for futex to
continue but because of PAGE_MAPPING_ANON, the poisoned value is not
dereferenced or used by futex. No other part of the VM should be
dereferencing the page->mapping of a hugetlbfs page as its page cache is
not on the LRU.
This patch fixes the problem with the test case described in the bugzilla.
[akpm@linux-foundation.org: mel cant spel]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Darren Hart <darren@dvhart.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This cleans up a few of the complaints of __generic_block_fiemap. I've
fixed all the typing stuff, used inline functions instead of macros,
gotten rid of a couple of variables, and made sure the size and block
requests are all block aligned. It also fixes a problem where sometimes
FIEMAP_EXTENT_LAST wasn't being set properly.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>