* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
Revert "net: Make accesses to ->br_port safe for sparse RCU"
mce: convert to rcu_dereference_index_check()
net: Make accesses to ->br_port safe for sparse RCU
vfs: add fs.h to define struct file
lockdep: Add an in_workqueue_context() lockdep-based test function
rcu: add __rcu API for later sparse checking
rcu: add an rcu_dereference_index_check()
tree/tiny rcu: Add debug RCU head objects
mm: remove all rcu head initializations
fs: remove all rcu head initializations, except on_stack initializations
powerpc: remove all rcu head initializations
* 'kms-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb,docs: Update the kgdb docs to include kms
drm_fb_helper: Preserve capability to use atomic kms
i915: when kgdb is active display compression should be off
drm/i915: use new fb debug hooks
drm: add KGDB/KDB support
fb: add hooks to handle KDB enter/exit
kgdboc: Add call backs to allow kernel mode switching
vt,console,kdb: automatically set kdb LINES variable
vt,console,kdb: implement atomic console enter/leave functions
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
debug_core,kdb: fix crash when arch does not have single step
kgdb,x86: use macro HBP_NUM to replace magic number 4
kgdb,mips: remove unused kgdb_cpu_doing_single_step operations
mm,kdb,kgdb: Add a debug reference for the kdb kmap usage
KGDB: Remove set but unused newPC
ftrace,kdb: Allow dumping a specific cpu's buffer with ftdump
ftrace,kdb: Extend kdb to be able to dump the ftrace buffer
kgdb,powerpc: Replace hardcoded offset by BREAK_INSTR_SIZE
arm,kgdb: Add ability to trap into debugger on notify_die
gdbstub: do not directly use dbg_reg_def[] in gdb_cmd_reg_set()
gdbstub: Implement gdbserial 'p' and 'P' packets
kgdb,arm: Individual register get/set for arm
kgdb,mips: Individual register get/set for mips
kgdb,x86: Individual register get/set for x86
kgdb,kdb: individual register set and and get API
gdbstub: Optimize kgdb's "thread:" response for the gdb serial protocol
kgdb: remove custom hex_to_bin()implementation
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (79 commits)
powerpc/8xx: Add support for the MPC8xx based boards from TQC
powerpc/85xx: Introduce support for the Freescale P1022DS reference board
powerpc/85xx: Adding DTS for the STx GP3-SSA MPC8555 board
powerpc/85xx: Change deprecated binding for 85xx-based boards
powerpc/tqm85xx: add a quirk for ti1520 PCMCIA bridge
powerpc/tqm85xx: update PCI interrupt-map attribute
powerpc/mpc8308rdb: support for MPC8308RDB board from Freescale
powerpc/fsl_pci: add quirk for mpc8308 pcie bridge
powerpc/85xx: Cleanup QE initialization for MPC85xxMDS boards
powerpc/85xx: Fix booting for P1021MDS boards
powerpc/85xx: Fix SWIOTLB initalization for MPC85xxMDS boards
powerpc/85xx: kexec for SMP 85xx BookE systems
powerpc/5200/i2c: improve i2c bus error recovery
of/xilinxfb: update tft compatible versions
powerpc/fsl-diu-fb: Support setting display mode using EDID
powerpc/5121: doc/dts-bindings: update doc of FSL DIU bindings
powerpc/5121: shared DIU framebuffer support
powerpc/5121: move fsl-diu-fb.h to include/linux
powerpc/5121: fsl-diu-fb: fix issue with re-enabling DIU area descriptor
powerpc/512x: add clock structure for Video-IN (VIU) unit
...
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus: (150 commits)
MIPS: PowerTV: Separate PowerTV USB support from non-USB code
MIPS: strip the un-needed sections of vmlinuz
MIPS: Clean up the calculation of VMLINUZ_LOAD_ADDRESS
MIPS: Clean up arch/mips/boot/compressed/decompress.c
MIPS: Clean up arch/mips/boot/compressed/ld.script
MIPS: Unify the suffix of compressed vmlinux.bin
MIPS: PowerTV: Add Gaia platform definitions.
MIPS: BCM47xx: Fix nvram_getenv return value.
MIPS: Octeon: Allow more than 3.75GB of memory with PCIe
MIPS: Clean up notify_die() usage.
MIPS: Remove unused task_struct.trap_no field.
Documentation: Mention that KProbes is supported on MIPS
SAMPLES: kprobe_example: Make it print something on MIPS.
MIPS: kprobe: Add support.
MIPS: Add instrunction format for BREAK and SYSCALL
MIPS: kprobes: Define regs_return_value()
MIPS: Ritually kill stupid printk.
MIPS: Octeon: Disallow MSI-X interrupt and fall back to MSI interrupts.
MIPS: Octeon: Support 256 MSI on PCIe
MIPS: Decode core number for R2 CPUs.
...
The kernel console interface stores the number of lines it is
configured to use. The kdb debugger can greatly benefit by knowing how
many lines there are on the console for the pager functionality
without having the end user compile in the setting or have to
repeatedly change it at run time.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: David Airlie <airlied@linux.ie>
CC: Andrew Morton <akpm@linux-foundation.org>
When an arch such as mips and microblaze does not implement either HW
or software single stepping the debug core should re-enter kdb. The
kdb code will properly ignore the single step operation. Attempting
to single step the kernel without software or hardware support causes
unpredictable kernel crashes.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
In systems with more than one processor it is desirable to look at the
per cpu trace buffers.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
CC: Frederic Weisbecker <fweisbec@gmail.com>
Add in a helper function to allow the kdb shell to dump the ftrace
buffer.
Modify trace.c to expose the capability to iterate over the ftrace
buffer in a read only capacity.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
CC: Frederic Weisbecker <fweisbec@gmail.com>
Presently the usable registers definitions on x86 are not contiguous
for kgdb. The x86 kgdb uses a case statement for the sparse register
accesses. The array which defines the registers (dbg_reg_def) should
not be used directly in order to safely work with sparse register
definitions.
Specifically there was a problem when gdb accesses ORIG_AX, which is
accessed only through the case statement.
This patch encodes register memory using the size information provided
from the debugger which avoids the need to look up the size of the
register. The dbg_set_reg() function always further validates the
inputs from the debugger.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
The gdbserial 'p' and 'P' packets allow gdb to individually get and
set registers instead of querying for all the available registers.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
The kdb shell specification includes the ability to get and set
architecture specific registers by name.
For the time being individual register get and set will be implemented
on a per architecture basis. If an architecture defines
DBG_MAX_REG_NUM > 0 then kdb and the gdbstub will use the capability
for individually getting and setting architecture specific registers.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
The gdb debugger understands how to parse short versions of the thread
reference string as long as the bytes are paired in sets of two
characters. The kgdb implementation was always sending 8 leading
zeros which could be omitted, and further optimized in the case of
non-negative thread numbers. The negative numbers are used to
reference a specific cpu in the case of kgdb.
An example of the previous i386 stop packet looks like:
T05thread:00000000000003bb;
New stop packet response:
T05thread:03bb;
The previous ThreadInfo response looks like:
m00000000fffffffe,0000000000000001,0000000000000002,0000000000000003,0000000000000004,0000000000000005,0000000000000006,0000000000000007,000000000000000c,0000000000000088,000000000000008a,000000000000008b,000000000000008c,000000000000008d,000000000000008e,00000000000000d4,00000000000000d5,00000000000000dd
New ThreadInfo response:
mfffffffe,01,02,03,04,05,06,07,0c,88,8a,8b,8c,8d,8e,d4,d5,dd
A few bytes saved means better response time when using kgdb over a
serial line.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
When a secondary CPU is being brought up, it is not uncommon for
printk() to be invoked when cpu_online(smp_processor_id()) == 0. The
case that I witnessed personally was on MIPS:
http://lkml.org/lkml/2010/5/30/4
If (can_use_console() == 0), printk() will spool its output to log_buf
and it will be visible in "dmesg", but that output will NOT be echoed to
the console until somebody calls release_console_sem() from a CPU that
is online. Therefore, the boot time messages from the new CPU can get
stuck in "limbo" for a long time, and might suddenly appear on the
screen when a completely unrelated event (e.g. "eth0: link is down")
occurs.
This patch modifies the console code so that any pending messages are
automatically flushed out to the console whenever a CPU hotplug
operation completes successfully or aborts.
The issue was seen on 2.6.34.
Original patch by Kevin Cernekee with cleanups by akpm and additional fixes
by Santosh Shilimkar. This patch superseeds
https://patchwork.linux-mips.org/patch/1357/.
Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
To: <mingo@elte.hu>
To: <akpm@linux-foundation.org>
To: <simon.kagstrom@netinsight.net>
To: <David.Woodhouse@intel.com>
To: <lethal@linux-sh.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: <linux-mips@linux-mips.org>
Reviewed-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/1534/
LKML-Reference: <ede63b5a20af951c755736f035d1e787772d7c28@localhost>
LKML-Reference: <EAF47CD23C76F840A9E7FCE10091EFAB02C5DB6D1F@dbde02.ent.ti.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
On my (32-bit x86) machine, sys_init_module() uses 124 bytes of stack
once load_module() is inlined.
This effectively reverts ffb4ba76 which inlined it due to stack
pressure.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This simply hoists more code out of load_module; we also put the
identification of the extable and dynamic debug table in with the
others in find_module_sections().
We move the taint check to the actual add/remove of the dynamic debug
info: this is certain (find_module_sections is too early).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Yehuda Sadeh <yehuda@hq.newdream.net>
Instead of copying and allocating the args and storing it in
load_info, we can just allocate them right before we need them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Pass the struct load_info into all the other functions in module
loading. This neatens things and makes them more consistent.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Restore the stub module_remove_modinfo_attrs, remove the now-unused
!CONFIG_SYSFS module_sysfs_init.
Also, rename mod_kobject_remove() to mod_sysfs_teardown() as
it is the logical counterpart to mod_sysfs_setup now.
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We change the sysfs functions to take struct load_info, and call
them all in mod_sysfs_setup().
We also clean up the #ifdefs a little.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
layout_and_allocate() does everything up to and including the final
struct module placement inside the allocated module memory. We have
to store the symbol layout information in our struct load_info though.
This avoids the nasty code we had before where 'mod' pointed first
to the version inside the temporary allocation containing the entire
file, then later was moved to point to the real struct module: now
the main code only ever sees the final module address.
(Includes fix for the Tony Luck-found Linus-diagnosed failure path
error).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Andrew had the sole pleasure of tickling this bug in linux-next; when we set
up "info->strtab" it's pointing into the temporary copy of the module. For
most uses that is fine, but kallsyms keeps a pointer around during module
load (inside mod->strtab).
If we oops for some reason inside a module's init function, kallsyms will use
the mod->strtab pointer into the now-freed temporary module copy.
(Later oopses work fine: after init we overwrite mod->strtab to point to a
compacted core-only strtab).
Reported-by: Andrew "Grumpy" Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty "Buggy" Russell <rusty@rustcorp.com.au>
Tested-by: Andrew "Happy" Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Put all the "rewrite and check section headers" in one place. This
adds another iteration over the sections, but it's far clearer. We
iterate once for every find_section() so we already iterate over many
times.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Btw, here's a patch that _looks_ large, but it really pretty trivial, and
sets things up so that it would be way easier to split off pieces of the
module loading.
The reason it looks large is that it creates a "module_info" structure
that contains all the module state that we're building up while loading,
instead of having individual variables for all the indices etc.
So the patch ends up being large, because every "symindex" access instead
becomes "info.index.sym" etc. That may be a few characters longer, but it
then means that we can just pass a pointer to that "info" structure
around. and let all the pieces fill it in very naturally.
As an example of that, the patch also moves the initialization of all
those convenience variables into a "setup_module_info()" function. And at
this point it really does become very natural to start to peel off some of
the error labels and move them into the helper functions - now the
"truncated" case is gone, and is handled inside that setup function
instead.
So maybe you don't like this approach, and it does make the variable
accesses a bit longer, but I don't think unreadably so. And the patch
really does look big and scary, but there really should be absolutely no
semantic changes - most of it was a trivial and mindless rename.
In fact, it was so mindless that I on purpose kept the existing helper
functions looking like this:
- err = check_modinfo(mod, sechdrs, infoindex, versindex);
+ err = check_modinfo(mod, info.sechdrs, info.index.info, info.index.vers);
rather than changing them to just take the "info" pointer. IOW, a second
phase (if you think the approach is ok) would change that calling
convention to just do
err = check_modinfo(mod, &info);
(and same for "layout_sections()", "layout_symtabs()" etc.) Similarly,
while right now it makes things _look_ bigger, with things like this:
versindex = find_sec(hdr, sechdrs, secstrings, "__versions");
becoming
info->index.vers = find_sec(info->hdr, info->sechdrs, info->secstrings, "__versions");
in the new "setup_module_info()" function, that's again just a result of
it being a search-and-replace patch. By using the 'info' pointer, we could
just change the 'find_sec()' interface so that it ends up being
info->index.vers = find_sec(info, "__versions");
instead, and then we'd actually have a shorter and more readable line. So
for a lot of those mindless variable name expansions there's would be room
for separate cleanups.
I didn't move quite everything in there - if we do this to layout_symtabs,
for example, we'd want to move the percpu, symoffs, stroffs, *strmap
variables to be fields in that module_info structure too. But that's a
much smaller patch, I moved just the really core stuff that is currently
being set up and used in various parts.
But even in this rough form, it removes close to 70 lines from that
function (but adds 22 lines overall, of course - the structure definition,
the helper function declarations and call-sites etc etc).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And now that I'm looking at that call-chain (to see if it would make sense
to use some other more specific lock - doesn't look like it: all the
readers are using RCU and this is the only writer), I also give you this
trivial one-liner. It changes each_symbol() to not put that constant array
on the stack, resulting in changing
movq $C.388.31095, %rsi #, tmp85
subq $376, %rsp #,
movq %rdi, %rbx # fn, fn
leaq -208(%rbp), %rdi #, tmp84
movq %rbx, %rdx # fn,
rep movsl
xorl %esi, %esi #
leaq -208(%rbp), %rdi #, tmp87
movq %r12, %rcx # data,
call each_symbol_in_section.clone.0 #
into
xorl %esi, %esi #
subq $216, %rsp #,
movq %rdi, %rbx # fn, fn
movq $arr.31078, %rdi #,
call each_symbol_in_section.clone.0 #
which is not so much about being obviously shorter and simpler because we
don't unnecessarily copy that constant array around onto the stack, but
also about having a much smaller stack footprint (376 vs 216 bytes - see
the update of 'rsp').
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1) Extract out the relocation loop into apply_relocations
2) Extract license and version checks into check_module_license_and_versions
3) Extract icache flushing into flush_module_icache
4) Move __obsparm warning into find_module_sections
5) Move license setting into check_modinfo.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Allocate references inside module_unload_init(), clean up inside
module_unload_free().
This version fixed to do allocation before __this_cpu_write, thanks to
bug reports from linux-next from Dave Young <hidave.darkstar@gmail.com>
and Stephen Rothwell <sfr@canb.auug.org.au>.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Here's a second one. It's slightly less trivial - since we now have error
cases - and equally untested so it may well be totally broken. But it also
cleans up a bit more, and avoids one of the goto targets, because the
"move_module()" helper now does both allocations or none.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I'd start from the trivial stuff. There's a fair amount of straight-line
code that just makes the function hard to read just because you have to
page up and down so far. Some of it is trivial to just create a helper
function for.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
No need to clear mod->refptr in module_unload_init(), since
alloc_percpu() already clears allocated chunks.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (removed unused var)
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (39 commits)
random: Reorder struct entropy_store to remove padding on 64bits
padata: update API documentation
padata: Remove padata_get_cpumask
crypto: pcrypt - Update pcrypt cpumask according to the padata cpumask notifier
crypto: pcrypt - Rename pcrypt_instance
padata: Pass the padata cpumasks to the cpumask_change_notifier chain
padata: Rearrange set_cpumask functions
padata: Rename padata_alloc functions
crypto: pcrypt - Dont calulate a callback cpu on empty callback cpumask
padata: Check for valid cpumasks
padata: Allocate cpumask dependend recources in any case
padata: Fix cpu index counting
crypto: geode_aes - Convert pci_table entries to PCI_VDEVICE (if PCI_ANY_ID is used)
pcrypt: Added sysfs interface to pcrypt
padata: Added sysfs primitives to padata subsystem
padata: Make two separate cpumasks
padata: update documentation
padata: simplify serialization mechanism
padata: make padata_do_parallel to return zero on success
padata: Handle empty padata cpumasks
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits)
phy/marvell: add 88ec048 support
igb: Program MDICNFG register prior to PHY init
e1000e: correct MAC-PHY interconnect register offset for 82579
hso: Add new product ID
can: Add driver for esd CAN-USB/2 device
l2tp: fix export of header file for userspace
can-raw: Fix skb_orphan_try handling
Revert "net: remove zap_completion_queue"
net: cleanup inclusion
phy/marvell: add 88e1121 interface mode support
u32: negative offset fix
net: Fix a typo from "dev" to "ndev"
igb: Use irq_synchronize per vector when using MSI-X
ixgbevf: fix null pointer dereference due to filter being set for VLAN 0
e1000e: Fix irq_synchronize in MSI-X case
e1000e: register pm_qos request on hardware activation
ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice
net: Add getsockopt support for TCP thin-streams
cxgb4: update driver version
cxgb4: add new PCI IDs
...
Manually fix up conflicts in:
- drivers/net/e1000e/netdev.c: due to pm_qos registration
infrastructure changes
- drivers/net/phy/marvell.c: conflict between adding 88ec048 support
and cleaning up the IDs
- drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req
conflict (registration change vs marking it static)
Commit 8f92054e7c ("CRED: Fix __task_cred()'s lockdep check and banner
comment") fixed the lockdep checks on __task_cred(). This has shown up
a place in the signalling code where a lock should be held - namely that
check_kill_permission() requires its callers to hold the RCU lock.
Fix group_send_sig_info() to get the RCU read lock around its call to
check_kill_permission().
Without this patch, the following warning can occur:
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/signal.c:660 invoked rcu_dereference_check() without protection!
...
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM / Runtime: Add runtime PM statistics (v3)
PM / Runtime: Make runtime_status attribute not debug-only (v. 2)
PM: Do not use dynamically allocated objects in pm_wakeup_event()
PM / Suspend: Fix ordering of calls in suspend error paths
PM / Hibernate: Fix snapshot error code path
PM / Hibernate: Fix hibernation_platform_enter()
pm_qos: Get rid of the allocation in pm_qos_add_request()
pm_qos: Reimplement using plists
plist: Add plist_last
PM: Make it possible to avoid races between wakeup and system sleep
PNPACPI: Add support for remote wakeup
PM: describe kernel policy regarding wakeup defaults (v. 2)
PM / Hibernate: Fix typos in comments in kernel/power/swap.c
In some cases (for instance with kernel threads) it may be desireable to
use on-stack deferrable timers to get their power saving benefits. Add
interfaces to support this for the IPS driver.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
A function that copies the padata cpumasks to a user buffer
is a bit error prone. The cpumask can change any time so we
can't be sure to have the right cpumask when using this function.
A user who is interested in the padata cpumasks should register
to the padata cpumask notifier chain instead. Users of
padata_get_cpumask are already updated, so we can remove it.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We pass a pointer to the new padata cpumasks to the cpumask_change_notifier
chain. So users can access the cpumasks without the need of an extra
padata_get_cpumask function.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
padata_set_cpumask needs to be protected by a lock. We make
__padata_set_cpumasks unlocked and static. So this function
can be used by the exported and locked padata_set_cpumask and
padata_set_cpumasks functions.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We rename padata_alloc to padata_alloc_possible because this
function allocates a padata_instance and uses the cpu_possible
mask for parallel and serial workers. Also we rename __padata_alloc
to padata_alloc to avoid to export underlined functions. Underlined
functions are considered to be private to padata. Users are updated
accordingly.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
It's possible for get_task_cred() as it currently stands to 'corrupt' a set of
credentials by incrementing their usage count after their replacement by the
task being accessed.
What happens is that get_task_cred() can race with commit_creds():
TASK_1 TASK_2 RCU_CLEANER
-->get_task_cred(TASK_2)
rcu_read_lock()
__cred = __task_cred(TASK_2)
-->commit_creds()
old_cred = TASK_2->real_cred
TASK_2->real_cred = ...
put_cred(old_cred)
call_rcu(old_cred)
[__cred->usage == 0]
get_cred(__cred)
[__cred->usage == 1]
rcu_read_unlock()
-->put_cred_rcu()
[__cred->usage == 1]
panic()
However, since a tasks credentials are generally not changed very often, we can
reasonably make use of a loop involving reading the creds pointer and using
atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero.
If successful, we can safely return the credentials in the knowledge that, even
if the task we're accessing has released them, they haven't gone to the RCU
cleanup code.
We then change task_state() in procfs to use get_task_cred() rather than
calling get_cred() on the result of __task_cred(), as that suffers from the
same problem.
Without this change, a BUG_ON in __put_cred() or in put_cred_rcu() can be
tripped when it is noticed that the usage count is not zero as it ought to be,
for example:
kernel BUG at kernel/cred.c:168!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0
Pid: 2436, comm: master Not tainted 2.6.33.3-85.fc13.x86_64 #1 0HR330/OptiPlex
745
RIP: 0010:[<ffffffff81069881>] [<ffffffff81069881>] __put_cred+0xc/0x45
RSP: 0018:ffff88019e7e9eb8 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff880161514480 RCX: 00000000ffffffff
RDX: 00000000ffffffff RSI: ffff880140c690c0 RDI: ffff880140c690c0
RBP: ffff88019e7e9eb8 R08: 00000000000000d0 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000040 R12: ffff880140c690c0
R13: ffff88019e77aea0 R14: 00007fff336b0a5c R15: 0000000000000001
FS: 00007f12f50d97c0(0000) GS:ffff880007400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8f461bc000 CR3: 00000001b26ce000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process master (pid: 2436, threadinfo ffff88019e7e8000, task ffff88019e77aea0)
Stack:
ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45
<0> ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000
<0> ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246
Call Trace:
[<ffffffff810698cd>] put_cred+0x13/0x15
[<ffffffff81069b45>] commit_creds+0x16b/0x175
[<ffffffff8106aace>] set_current_groups+0x47/0x4e
[<ffffffff8106ac89>] sys_setgroups+0xf6/0x105
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00
48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b
04 25 00 cc 00 00 48 3b b8 58 04 00 00 75
RIP [<ffffffff81069881>] __put_cred+0xc/0x45
RSP <ffff88019e7e9eb8>
---[ end trace df391256a100ebdd ]---
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new kernel API to attach a task to current task's cgroup
in all the active hierarchies.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paul Menage <menage@google.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
The command
echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control
causes an oops.
Move the call to ddebug_remove_module() down into free_module(). In this
way it should be called from all error paths. Currently, we are missing
the remove if the module init routine fails.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Reported-by: Thomas Renninger <trenn@suse.de>
Tested-by: Thomas Renninger <trenn@suse.de>
Cc: <stable@kernel.org> [2.6.32+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>