linux/drivers/char
Tony Camuso cdea46566b ipmi: use rcu lock around call to intf->handlers->sender()
A vendor with a system having more than 128 CPUs occasionally encounters
the following crash during shutdown. This is not an easily reproduceable
event, but the vendor was able to provide the following analysis of the
crash, which exhibits the same footprint each time.

crash> bt
PID: 0      TASK: ffff88017c70ce70  CPU: 5   COMMAND: "swapper/5"
 #0 [ffff88085c143ac8] machine_kexec at ffffffff81059c8b
 #1 [ffff88085c143b28] __crash_kexec at ffffffff811052e2
 #2 [ffff88085c143bf8] crash_kexec at ffffffff811053d0
 #3 [ffff88085c143c10] oops_end at ffffffff8168ef88
 #4 [ffff88085c143c38] no_context at ffffffff8167ebb3
 #5 [ffff88085c143c88] __bad_area_nosemaphore at ffffffff8167ec49
 #6 [ffff88085c143cd0] bad_area_nosemaphore at ffffffff8167edb3
 #7 [ffff88085c143ce0] __do_page_fault at ffffffff81691d1e
 #8 [ffff88085c143d40] do_page_fault at ffffffff81691ec5
 #9 [ffff88085c143d70] page_fault at ffffffff8168e188
    [exception RIP: unknown or invalid address]
    RIP: ffffffffa053c800  RSP: ffff88085c143e28  RFLAGS: 00010206
    RAX: ffff88017c72bfd8  RBX: ffff88017a8dc000  RCX: ffff8810588b5ac8
    RDX: ffff8810588b5a00  RSI: ffffffffa053c800  RDI: ffff8810588b5a00
    RBP: ffff88085c143e58   R8: ffff88017c70d408   R9: ffff88017a8dc000
    R10: 0000000000000002  R11: ffff88085c143da0  R12: ffff8810588b5ac8
    R13: 0000000000000100  R14: ffffffffa053c800  R15: ffff8810588b5a00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    <IRQ stack>
    [exception RIP: cpuidle_enter_state+82]
    RIP: ffffffff81514192  RSP: ffff88017c72be50  RFLAGS: 00000202
    RAX: 0000001e4c3c6f16  RBX: 000000000000f8a0  RCX: 0000000000000018
    RDX: 0000000225c17d03  RSI: ffff88017c72bfd8  RDI: 0000001e4c3c6f16
    RBP: ffff88017c72be78   R8: 000000000000237e   R9: 0000000000000018
    R10: 0000000000002494  R11: 0000000000000001  R12: ffff88017c72be20
    R13: ffff88085c14f8e0  R14: 0000000000000082  R15: 0000001e4c3bb400
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018

This is the corresponding stack trace

It has crashed because the area pointed with RIP extracted from timer
element is already removed during a shutdown process.

The function is smi_timeout().

And we think ffff8810588b5a00 in RDX is a parameter struct smi_info

crash> rd ffff8810588b5a00 20
ffff8810588b5a00:  ffff8810588b6000 0000000000000000   .`.X............
ffff8810588b5a10:  ffff880853264400 ffffffffa05417e0   .D&S......T.....
ffff8810588b5a20:  24a024a000000000 0000000000000000   .....$.$........
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a40:  ffffffffa053a040 ffffffffa053a060   @.S.....`.S.....
ffff8810588b5a50:  0000000000000000 0000000100000001   ................
ffff8810588b5a60:  0000000000000000 0000000000000e00   ................
ffff8810588b5a70:  ffffffffa053a580 ffffffffa053a6e0   ..S.......S.....
ffff8810588b5a80:  ffffffffa053a4a0 ffffffffa053a250   ..S.....P.S.....
ffff8810588b5a90:  0000000500000002 0000000000000000   ................

Unfortunately the top of this area is already detroyed by someone.
But because of two reasonns we think this is struct smi_info
 1) The address included in between  ffff8810588b5a70 and ffff8810588b5a80:
  are inside of ipmi_si_intf.c  see crash> module ffff88085779d2c0

 2) We've found the area which point this.
  It is offset 0x68 of  ffff880859df4000

crash> rd  ffff880859df4000 100
ffff880859df4000:  0000000000000000 0000000000000001   ................
ffff880859df4010:  ffffffffa0535290 dead000000000200   .RS.............
ffff880859df4020:  ffff880859df4020 ffff880859df4020    @.Y.... @.Y....
ffff880859df4030:  0000000000000002 0000000000100010   ................
ffff880859df4040:  ffff880859df4040 ffff880859df4040   @@.Y....@@.Y....
ffff880859df4050:  0000000000000000 0000000000000000   ................
ffff880859df4060:  0000000000000000 ffff8810588b5a00   .........Z.X....
ffff880859df4070:  0000000000000001 ffff880859df4078   ........x@.Y....

 If we regards it as struct ipmi_smi in shutdown process
 it looks consistent.

The remedy for this apparent race is affixed below.

Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Cc: stable@vger.kernel.org # 3.19

This was first introduced in 7ea0ed2b5b ipmi: Make the
message handler easier to use for SMI interfaces
where some code was moved outside of the rcu_read_lock()
and the lock was not added.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
2017-06-19 12:49:34 -05:00
..
agp main drm pull request for 4.12 kernel 2017-05-03 11:44:24 -07:00
hw_random Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2017-05-02 15:53:46 -07:00
ipmi ipmi: use rcu lock around call to intf->handlers->sender() 2017-06-19 12:49:34 -05:00
mwave Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
pcmcia lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
tpm char/misc patches for 4.12-rc1 2017-05-04 19:15:35 -07:00
xilinx_hwicap char: xilinx_hwicap: Remove pointless local variables 2017-01-25 11:41:44 +01:00
xillybus char: xillybus: Fix spelling mistake and comment 2016-08-31 14:47:54 +02:00
apm-emulation.c apm-emulation: move APM_MINOR_DEV to include/linux/miscdevice.h 2017-01-10 21:46:41 +01:00
applicom.c sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
applicom.h
bfin-otp.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
bsr.c bsr: avoid format string leaking into device name 2014-07-09 16:59:15 -07:00
ds1302.c ds1302: remove unneeded linux/miscdevice.h include 2017-01-10 21:46:41 +01:00
ds1620.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
dsp56k.c dsp56k: prevent a harmless underflow 2016-07-14 16:21:53 +09:00
dtlk.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
efirtc.c drivers/char: make efirtc.c driver explicitly non-modular 2015-09-20 19:32:35 -07:00
generic_nvram.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
hangcheck-timer.c hangcheck-timer: Fix typo in comment 2017-04-08 18:08:54 +02:00
hpet.c hpet: Make cmd parameter of hpet_ioctl_common() unsigned 2017-03-17 15:10:49 +09:00
Kconfig TTY/Serial driver patches for 4.11-rc1 2017-02-22 12:17:25 -08:00
lp.c sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
Makefile RTC for 4.8 2016-08-05 09:48:22 -04:00
mbcs.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
mbcs.h
mem.c mm: Tighten x86 /dev/mem with zeroing reads 2017-04-12 11:40:23 -07:00
misc.c drivers: char: misc: Replace printk with pr_err. 2017-04-08 17:53:06 +02:00
mmtimer.c time: Change k_clock timer_set() and timer_get() to use timespec64 2017-04-14 21:49:56 +02:00
mspec.c drivers, char: convert vma_data.refcnt from atomic_t to refcount_t 2017-03-23 13:57:19 +01:00
nsc_gpio.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
nvram.c char/nvram: set array of const as const 2016-02-08 14:57:30 -08:00
nwbutton.c drivers/char/nwbutton: Fix build breakage caused by include file reshuffling 2017-03-07 08:35:49 +01:00
nwbutton.h
nwflash.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
pc8736x_gpio.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
powernv-op-panel.c powerpc/powernv: Add driver for operator panel on FSP machines 2016-06-29 17:33:46 +10:00
ppdev.c ppdev: fix registering same device name 2017-03-16 17:32:21 +09:00
ps3flash.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
random.c random: move random_min_urandom_seed into CONFIG_SYSCTL ifdef block 2017-02-06 20:46:49 -05:00
raw.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
rtc.c sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
scx200_gpio.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
snsc_event.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> 2017-03-02 08:42:29 +01:00
snsc.c sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
snsc.h
sonypi.c scripts/spelling.txt: add "initialiazation" pattern and fix typo instances 2017-02-27 18:43:47 -08:00
tb0219.c mips: separate extable.h, switch module.h to it 2016-10-05 18:36:18 -04:00
tile-srom.c tile-srom: allow the driver to be built as a module 2016-11-10 15:18:56 +01:00
tlclk.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
toshiba.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ttyprintk.c ttyprintk: Neaten and simplify printing 2016-09-13 17:30:17 +02:00
uv_mmtimer.c
virtio_console.c char/misc patches for 4.12-rc1 2017-05-04 19:15:35 -07:00