A vendor with a system having more than 128 CPUs occasionally encounters
the following crash during shutdown. This is not an easily reproduceable
event, but the vendor was able to provide the following analysis of the
crash, which exhibits the same footprint each time.
crash> bt
PID: 0 TASK: ffff88017c70ce70 CPU: 5 COMMAND: "swapper/5"
#0 [ffff88085c143ac8] machine_kexec at ffffffff81059c8b
#1 [ffff88085c143b28] __crash_kexec at ffffffff811052e2
#2 [ffff88085c143bf8] crash_kexec at ffffffff811053d0
#3 [ffff88085c143c10] oops_end at ffffffff8168ef88
#4 [ffff88085c143c38] no_context at ffffffff8167ebb3
#5 [ffff88085c143c88] __bad_area_nosemaphore at ffffffff8167ec49
#6 [ffff88085c143cd0] bad_area_nosemaphore at ffffffff8167edb3
#7 [ffff88085c143ce0] __do_page_fault at ffffffff81691d1e
#8 [ffff88085c143d40] do_page_fault at ffffffff81691ec5
#9 [ffff88085c143d70] page_fault at ffffffff8168e188
[exception RIP: unknown or invalid address]
RIP: ffffffffa053c800 RSP: ffff88085c143e28 RFLAGS: 00010206
RAX: ffff88017c72bfd8 RBX: ffff88017a8dc000 RCX: ffff8810588b5ac8
RDX: ffff8810588b5a00 RSI: ffffffffa053c800 RDI: ffff8810588b5a00
RBP: ffff88085c143e58 R8: ffff88017c70d408 R9: ffff88017a8dc000
R10: 0000000000000002 R11: ffff88085c143da0 R12: ffff8810588b5ac8
R13: 0000000000000100 R14: ffffffffa053c800 R15: ffff8810588b5a00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
<IRQ stack>
[exception RIP: cpuidle_enter_state+82]
RIP: ffffffff81514192 RSP: ffff88017c72be50 RFLAGS: 00000202
RAX: 0000001e4c3c6f16 RBX: 000000000000f8a0 RCX: 0000000000000018
RDX: 0000000225c17d03 RSI: ffff88017c72bfd8 RDI: 0000001e4c3c6f16
RBP: ffff88017c72be78 R8: 000000000000237e R9: 0000000000000018
R10: 0000000000002494 R11: 0000000000000001 R12: ffff88017c72be20
R13: ffff88085c14f8e0 R14: 0000000000000082 R15: 0000001e4c3bb400
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
This is the corresponding stack trace
It has crashed because the area pointed with RIP extracted from timer
element is already removed during a shutdown process.
The function is smi_timeout().
And we think ffff8810588b5a00 in RDX is a parameter struct smi_info
crash> rd ffff8810588b5a00 20
ffff8810588b5a00: ffff8810588b6000 0000000000000000 .`.X............
ffff8810588b5a10: ffff880853264400 ffffffffa05417e0 .D&S......T.....
ffff8810588b5a20: 24a024a000000000 0000000000000000 .....$.$........
ffff8810588b5a30: 0000000000000000 0000000000000000 ................
ffff8810588b5a30: 0000000000000000 0000000000000000 ................
ffff8810588b5a40: ffffffffa053a040 ffffffffa053a060 @.S.....`.S.....
ffff8810588b5a50: 0000000000000000 0000000100000001 ................
ffff8810588b5a60: 0000000000000000 0000000000000e00 ................
ffff8810588b5a70: ffffffffa053a580 ffffffffa053a6e0 ..S.......S.....
ffff8810588b5a80: ffffffffa053a4a0 ffffffffa053a250 ..S.....P.S.....
ffff8810588b5a90: 0000000500000002 0000000000000000 ................
Unfortunately the top of this area is already detroyed by someone.
But because of two reasonns we think this is struct smi_info
1) The address included in between ffff8810588b5a70 and ffff8810588b5a80:
are inside of ipmi_si_intf.c see crash> module ffff88085779d2c0
2) We've found the area which point this.
It is offset 0x68 of ffff880859df4000
crash> rd ffff880859df4000 100
ffff880859df4000: 0000000000000000 0000000000000001 ................
ffff880859df4010: ffffffffa0535290 dead000000000200 .RS.............
ffff880859df4020: ffff880859df4020 ffff880859df4020 @.Y.... @.Y....
ffff880859df4030: 0000000000000002 0000000000100010 ................
ffff880859df4040: ffff880859df4040 ffff880859df4040 @@.Y....@@.Y....
ffff880859df4050: 0000000000000000 0000000000000000 ................
ffff880859df4060: 0000000000000000 ffff8810588b5a00 .........Z.X....
ffff880859df4070: 0000000000000001 ffff880859df4078 ........x@.Y....
If we regards it as struct ipmi_smi in shutdown process
it looks consistent.
The remedy for this apparent race is affixed below.
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Cc: stable@vger.kernel.org # 3.19
This was first introduced in 7ea0ed2b5b ipmi: Make the
message handler easier to use for SMI interfaces
where some code was moved outside of the rcu_read_lock()
and the lock was not added.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
When a computer has an IPMI system interface, the device interface
is most probably also desired. Autoloading of ipmi_devintf currently
works only if ipmi_si has allocated a platform device. That doesn't
happen if the SI interface was detected e.g. via ACPI. But ACPI
detection is preferred these days, see e.g. kernel.org bug 46741.
This patch introduces a softdep in place of the existing modalias
for ipmi_devintf.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Suggested-by: Takashi Iwai <tiwai@suse.com>
I moved this to ipmi_msghandler.c, so it works for all IPMI
interfaces. Retested by Martin.
Tested-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
The IPMI message handler uses a message id that the lower-layer
preserved to track the sequence number of the message. The macros
that handled these sequence numbers were somewhat broken as they
could result in sequence number truncation and they were not
doing an "and" of the proper number of bits.
I think this actually is not a problem, because the truncation
should be harmless and the improper "and" didn't hurt anything
because sequence number generation used the same improper "and"
and wouldn't generate a sequence number that would get
truncated wrong. However, it should be fixed.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Parameter trydefaults=1 causes the ipmi_init to initialize ipmi through
the legacy port io space that was designated for ipmi. Architectures
that do not map legacy port io can panic when trydefaults=1.
Rather than implement build-time conditional exceptions for each
architecture that does not map legacy port io, we have removed legacy
port io from the driver.
Parameter 'trydefaults' has been removed. Attempts to use it hereafter
will evoke the "Unknown symbol in module, or unknown parameter" message.
The patch was built against a number of architectures and tested for
regressions and functionality on x86_64 and ARM64.
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Removed the config entry and the address source entry for default,
since neither were used any more.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Commit 7ea0ed2b5b ("ipmi: Make the message handler easier to use for
SMI interfaces") changed handle_new_recv_msgs() to call handle_one_recv_msg()
for a smi_msg while the smi_msg is still connected to waiting_rcv_msgs list.
That could lead to following list corruption problems:
1) low-level function treats smi_msg as not connected to list
handle_one_recv_msg() could end up calling smi_send(), which
assumes the msg is not connected to list.
For example, the following sequence could corrupt list by
doing list_add_tail() for the entry still connected to other list.
handle_new_recv_msgs()
msg = list_entry(waiting_rcv_msgs)
handle_one_recv_msg(msg)
handle_ipmb_get_msg_cmd(msg)
smi_send(msg)
spin_lock(xmit_msgs_lock)
list_add_tail(msg)
spin_unlock(xmit_msgs_lock)
2) race between multiple handle_new_recv_msgs() instances
handle_new_recv_msgs() once releases waiting_rcv_msgs_lock before calling
handle_one_recv_msg() then retakes the lock and list_del() it.
If others call handle_new_recv_msgs() during the window shown below
list_del() will be done twice for the same smi_msg.
handle_new_recv_msgs()
spin_lock(waiting_rcv_msgs_lock)
msg = list_entry(waiting_rcv_msgs)
spin_unlock(waiting_rcv_msgs_lock)
|
| handle_one_recv_msg(msg)
|
spin_lock(waiting_rcv_msgs_lock)
list_del(msg)
spin_unlock(waiting_rcv_msgs_lock)
Fixes: 7ea0ed2b5b ("ipmi: Make the message handler easier to use for SMI interfaces")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
[Added a comment to describe why this works.]
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: stable@vger.kernel.org # 3.19
Tested-by: Ye Feng <yefeng.yl@alibaba-inc.com>
Lots of char arrays could be set as const since they contain only literal
char arrays.
We could in the same time make const some struct members who are pointer
to those const char arrays.
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Received handlers defined as ipmi_recv_hndl member of struct
ipmi_user_hndl can take a spinlock. This means that if the kernel
panics while holding the lock, a deadlock may happen on the lock
while flushing queued messages in the panic context.
Calling the receive handler doesn't make much meanings in the panic
context, simply skip it to avoid possible deadlocks.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
When processing queued messages in the panic context, IPMI driver
tries to do it without any locking to avoid deadlocks. However,
this means we can touch a corrupted list if the kernel panicked
while manipulating the list. Fortunately, current `add-tail and
del-from-head' style implementation won't touch the corrupted part,
but it is inherently risky.
To get rid of the risk, this patch re-initializes the message lists
on panic if the related spinlock has already been acquired. As the
result, we may lose queued messages, but it's not so painful.
Dropping messages on the received message list is also less
problematic because no one can respond the received messages.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Fixed a comment typo.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
When flushing queued messages in run-to-completion mode,
smi_event_handler() is recursively called.
flush_messages()
smi_event_handler()
handle_transaction_done()
deliver_recv_msg()
ipmi_smi_msg_received()
smi_recv_tasklet()
sender()
flush_messages()
smi_event_handler()
...
The depth of the recursive call depends on the number of queued
messages, so it can cause a stack overflow if many messages have
been queued.
To solve this problem, this patch removes flush_messages()
from sender()@ipmi_si_intf.c. Instead, add flush_messages() to
caller side of sender() if needed. Additionally, to implement this,
add new handler flush_messages to struct ipmi_smi_handlers.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Fixed up a comment and some spacing issues.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
send_panic_events() calls intf->handlers->set_run_to_completion(),
but it has already been done in the caller function panic_event().
Remove it from send_panic_events().
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
commit d6c5dc18d8 ("ipmi: Remove uses of return value of seq_printf")
incorrectly changed the return value of various proc_show functions
to use seq_has_overflowed().
These functions should return 0 on completion rather than 1/true
on overflow. 1 is the same as #define SEQ_SKIP which would cause
the output to not be emitted (skipped) instead.
This is a logical defect only as the length of these outputs are
all smaller than the initial allocation done by the seq filesystem.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
The seq_printf like functions will soon be changed to return void.
Convert these uses to check seq_has_overflowed instead.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Instead of manual calls of device_create_file() and
device_remove_file(), implement the condition in is_visible callback
for the attribute group and put these entries to the group, too.
This simplifies the code and avoids the possible races.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
A new harmless warning has come up on ARM builds with gcc-4.9:
drivers/char/ipmi/ipmi_msghandler.c: In function 'smi_send.isra.11':
include/linux/spinlock.h:372:95: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized]
raw_spin_unlock_irqrestore(&lock->rlock, flags);
^
drivers/char/ipmi/ipmi_msghandler.c:1490:16: note: 'flags' was declared here
unsigned long flags;
^
This could be worked around by initializing the 'flags' variable, but it
seems better to rework the code to avoid this.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 7ea0ed2b5b ("ipmi: Make the message handler easier to use for SMI interfaces")
Signed-off-by: Corey Minyard <cminyard@mvista.com>
There can't be more than a few IPMI messages allocated at any one time,
so converting the messages to slabs would be a waste. So just remove
the FIXME.
Suggested-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
The previous cleanup of BMC attributes left a few holes, and if
you run with lockdep debugging with a BMC with the proper attributes,
you could get a warning.
This patch removes all the unused attributes from the BMC structure,
since they are all declared in the .data section now. It makes
the attributes all static. It fixes the referencing of the
attributes in a couple of cases that dynamically added the files
depending on BMC information.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: Huang Ying <ying.huang@intel.com>
Tested-by: Alexei Starovoitov <ast@plumgrid.com>
The message handler expected the SMI interface to keep a queue of
messages, but that was kind of silly, the queue would be easier to
manage in the message handler itself. As part of that, fix the
message cleanup to make sure no messages are outstanding when an
SMI interface is unregistered. This makes it easier for an SMI
interface to unregister, it just has to call ipmi_unregister_smi()
first and all processing from the message handler will be cleaned
up.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Embed the platform device in the bmc device instead of externally
allocating it, use more proper form for creating the device
attributes, and other general cleanups.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
The code to send the channel config errors was missing an error report
in one place and needed some more information in another, and had an
extraneous bit of code. Clean all that up.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The IPMI driver would wake up periodically looking for events and
watchdog pretimeouts. If there is nothing waiting for these events,
it's really kind of pointless to be checking for them. So modify the
driver so the message handler can pass down if it needs the lower layer
to be waiting for these. Modify the system interface lower layer to
turn off all timer and thread activity if the upper layer doesn't need
anything and it is not currently handling messages. And modify the
message handler to not restart the timer if its timer is not needed.
The timers and kthread will still be enabled if:
- the SI interface is handling a message.
- a user has enabled watching for events.
- the IPMI watchdog timer is in use (since it uses pretimeouts).
- the message handler is waiting on a remote response.
- a user has registered to receive commands.
This mostly affects interfaces without interrupts. Interfaces with
interrupts already don't use CPU in the system interface when the
interface is idle.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A couple of variables were getting warnings about being uninitialized.
It was a false warning, but initialize them, anyway.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replaced calls to kmalloc followed by strcpy with a sincle call to
kstrdup. Patch found using coccinelle.
Signed-off-by: Alexandru Gheorghiu <gheorghiuandru@gmail.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The only part of proc_dir_entry the code outside of fs/proc
really cares about is PDE(inode)->data. Provide a helper
for that; static inline for now, eventually will be moved
to fs/proc, along with the knowledge of struct proc_dir_entry
layout.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
"Asynchronous" is misspelled in some comments. No code changes.
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
There was a spot where the compiler couldn't tell some variables
would be set. So initialize them to make the warning go away.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge third batch of patches from Andrew Morton:
- Some MM stragglers
- core SMP library cleanups (on_each_cpu_mask)
- Some IPI optimisations
- kexec
- kdump
- IPMI
- the radix-tree iterator work
- various other misc bits.
"That'll do for -rc1. I still have ~10 patches for 3.4, will send
those along when they've baked a little more."
* emailed from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
backlight: fix typo in tosa_lcd.c
crc32: add help text for the algorithm select option
mm: move hugepage test examples to tools/testing/selftests/vm
mm: move slabinfo.c to tools/vm
mm: move page-types.c from Documentation to tools/vm
selftests/Makefile: make `run_tests' depend on `all'
selftests: launch individual selftests from the main Makefile
radix-tree: use iterators in find_get_pages* functions
radix-tree: rewrite gang lookup using iterator
radix-tree: introduce bit-optimized iterator
fs/proc/namespaces.c: prevent crash when ns_entries[] is empty
nbd: rename the nbd_device variable from lo to nbd
pidns: add reboot_pid_ns() to handle the reboot syscall
sysctl: use bitmap library functions
ipmi: use locks on watchdog timeout set on reboot
ipmi: simplify locking
ipmi: fix message handling during panics
ipmi: use a tasklet for handling received messages
ipmi: increase KCS timeouts
ipmi: decrease the IPMI message transaction time in interrupt mode
...
The part of the IPMI driver that delivered panic information to the event
log and extended the watchdog timeout during a panic was not properly
handling the messages. It used static messages to avoid allocation, but
wasn't properly waiting for these, or wasn't properly handling the
refcounts.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The IPMI driver would release a lock, deliver a message, then relock.
This is obviously ugly, and this patch converts the message handler
interface to use a tasklet to schedule work. This lets the receive
handler be called from an interrupt handler with interrupts enabled.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:
perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`
Signed-off-by: David Howells <dhowells@redhat.com>
The IPMI smi_watcher will be used to catch the IPMI interface as they
come or go. In order to communicate with the correct IPMI device, it
should be confirmed whether it is what we wanted especially on the
system with multiple IPMI devices. But the new_smi callback function
of smi_watcher provides very limited info(only the interface number
and dev pointer) and there is no detailed info about the low level
interface. For example: which mechansim registers the IPMI
interface(ACPI, PCI, DMI and so on).
This is to add one interface that can get more info of low-level IPMI
device. For example: the ACPI device handle will be returned for the
pnp_acpi IPMI device.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Len Brown <len.brown@intel.com>
__init and __exit belong after the return type on functions, not
before.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The timeouts in IPMI are in the 1-5 second range in message handling, so a
1 second timeout is a reasonable thing to do. This should help with
reducing power consumption on idle systems.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes a sysfs lockdep warning in the ipmi code.
Thanks to Eric Biederman and Yinghai Lu for the original versions of the
patch, unfortunatly they did not submit them in a form they could be
applied in.
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>