linux/drivers
Sarah Sharp 0d9f78a92e xhci: Fix hang on back-to-back Set TR Deq Ptr commands.
The Microsoft LifeChat 3000 USB headset was causing a very reproducible
hang whenever it was plugged in.  At first, I thought the host
controller was producing bad transfer events, because the log was filled
with errors like:

xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD

However, it turned out to be an xHCI driver bug in the ring expansion
patches.  The bug is triggered When there are two ring segments, and a
TD that ends just before a link TRB, like so:

 ______________                     _____________
|              |              ---> | setup TRB B |
 ______________               |     _____________
|              |              |    |  data TRB B |
 ______________               |     _____________
| setup TRB A  | <-- deq      |    |  data TRB B |
 ______________               |     _____________
| data TRB A   |              |    |             | <-- enq, deq''
 ______________               |     _____________
| status TRB A |              |    |             |
 ______________               |     _____________
|  link TRB    |---------------    |  link TRB   |
 _____________  <--- deq'           _____________

TD A (the first control transfer) stalls on the data phase.  That halts
the ring.  The xHCI driver moves the hardware dequeue pointer to the
first TRB after the stalled transfer, which happens to be the link TRB.

Once the Set TR dequeue pointer command completes, the function
update_ring_for_set_deq_completion runs.  That function is supposed to
update the xHCI driver's dequeue pointer to match the internal hardware
dequeue pointer.  On the first call this would work fine, and the
software dequeue pointer would move to deq'.

However, if the transfer immediately after that stalled (TD B in this
case), another Set TR Dequeue command would be issued.  That would move
the hardware dequeue pointer to deq''.  Once that command completed,
update_ring_for_set_deq_completion would run again.

The original code would unconditionally increment the software dequeue
pointer, which moved the pointer off the ring segment into la-la-land.
The while loop would happy increment the dequeue pointer (possibly
wrapping it) until it matched the hardware pointer value.

The while loop would also access all the memory in between the first
ring segment and the second ring segment to determine if it was a link
TRB.  This could cause general protection faults, although it was
unlikely because the ring segments came from a DMA pool, and would often
have consecutive memory addresses.

If nothing in that space looked like a link TRB, the deq_seg pointer for
the ring would remain on the first segment.  Thus, the deq_seg and the
software dequeue pointer would get out of sync.

When the next transfer event came in after the stalled transfer, the
xHCI driver code would attempt to convert the software dequeue pointer
into a DMA address in order to compare the DMA address for the completed
transfer.  Since the deq_seg and the dequeue pointer were out of sync,
xhci_trb_virt_to_dma would return NULL.

The transfer event would get ignored, the transfer would eventually
timeout, and we would mistakenly convert the finished transfer to no-op
TRBs.  Some kernel driver (maybe xHCI?) would then get stuck in an
infinite loop in interrupt context, and the whole machine would hang.

This patch should be backported to kernels as old as 3.4, that contain
the commit b008df60c6 "xHCI: count free
TRBs on transfer ring"

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: Andiry Xu <andiry.xu@amd.com>
Cc: stable@vger.kernel.org
2012-07-02 12:51:25 -07:00
..
accessibility
acpi Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux 2012-06-30 11:11:58 -07:00
amba arm-soc: driver specific updates 2012-05-26 12:22:27 -07:00
ata Viresh has moved 2012-06-20 14:39:36 -07:00
atm solos-pci: Fix DMA support 2012-05-24 16:22:53 -04:00
auxdisplay
base PM / Sleep: Prevent waiting forever on asynchronous suspend after abort 2012-06-24 23:31:09 +02:00
bcma bcma: fix null pointer in bcma_core_pci_irq_ctl 2012-06-08 13:47:07 -04:00
block mtip32xx: Changes to sysfs entries 2012-05-31 08:46:50 +02:00
bluetooth Bluetooth: btmrvl: Do not send vendor events to bluetooth stack 2012-06-19 00:19:11 -03:00
cdrom
char Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2012-06-18 12:20:36 -07:00
clk clk: mxs: fix clock lookup after freeing init memory 2012-06-25 16:51:48 -07:00
clocksource clocksource: sh_tmu: Use clockevents_config_and_register(). 2012-06-11 17:10:16 +09:00
connector
cpufreq
cpuidle
crypto arm-soc: clock driver changes 2012-05-26 12:42:29 -07:00
dca
devfreq Power management updates for 3.5 2012-05-23 14:07:06 -07:00
dio
dma Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma 2012-06-20 22:12:52 -07:00
edac edac: Do alignment logic properly in edac_align_ptr() 2012-06-11 12:43:16 -03:00
eisa
extcon extcon: max8997: Add missing kfree for info->edev in max8997_muic_remove() 2012-06-18 16:30:42 -07:00
firewire IEEE 1394 (FireWire) subsystem updates post v3.4: 2012-05-24 12:57:47 -07:00
firmware
gpio gpio/samsung: fix the typo 'exynos5_xxx' instead of 'exonys5_xxx' 2012-06-03 21:21:01 -07:00
gpu Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes 2012-06-27 19:56:20 +01:00
hid Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2012-06-26 11:23:41 -07:00
hsi
hv Driver core pull for 3.5-rc1 2012-05-22 16:02:13 -07:00
hwmon Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-06-29 10:29:54 -07:00
hwspinlock
i2c i2c: Add generic I2C multiplexer using pinctrl API 2012-06-04 16:49:43 +02:00
ide drivers/ide/ide-cs.c: adjust suspicious bit operation 2012-06-12 15:51:41 -07:00
idle
ieee802154
iio iio: drop wrong reference from Kconfig 2012-06-14 17:28:46 -07:00
infiniband Merge branches 'cma' and 'ocrdma' into for-linus 2012-06-24 04:59:59 -07:00
input i2c: Split I2C_M_NOSTART support out of I2C_FUNC_PROTOCOL_MANGLING 2012-05-30 10:55:34 +02:00
iommu iommu/amd: Fix deadlock in ppr-handling error path 2012-06-04 12:47:44 +02:00
isdn Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-05-24 11:54:29 -07:00
leds leds: Make LEDS_ASIC3 and LEDS_RENESAS_TPU depend on LEDS_CLASS=y 2012-06-12 10:56:25 +08:00
lguest
macintosh
md md: 2 fixes for 3.5-rc 2012-06-06 09:49:28 -07:00
media Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2012-06-25 14:53:09 -07:00
memory
memstick
message Merge branch 'akpm' (Andrew's patch-bomb) 2012-05-31 18:10:18 -07:00
mfd Viresh has moved 2012-06-20 14:39:36 -07:00
misc misc: mei: set WDIOF_ALARMONLY on mei watchdog 2012-06-13 15:34:31 -07:00
mmc Revert "mmc: omap_hsmmc: Enable Auto CMD12" 2012-06-26 16:10:30 -04:00
mtd Fix the debugfs regression - we never enable it because incorrect 2012-06-28 11:41:43 -07:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-06-28 11:20:31 -07:00
nfc NFC: potential integer overflow problem in check_crc() 2012-05-25 11:16:16 -04:00
nubus
of Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2012-05-29 18:27:19 -07:00
oprofile oprofile: perf: use NR_CPUS instead or nr_cpumask_bits for static array 2012-06-21 16:15:11 +02:00
parisc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-05-22 19:22:50 -07:00
parport Driver core pull for 3.5-rc1 2012-05-22 16:02:13 -07:00
pci USB: add NO_D3_DURING_SLEEP flag and revert 151b612847 2012-06-13 13:11:39 -07:00
pcmcia
pinctrl Merge branch 'akpm' (Andrew's patch-bomb) 2012-06-20 14:41:57 -07:00
platform drivers/platform/x86/acerhdf.c: correct Boris' mail address 2012-06-07 14:43:55 -07:00
pnp
power A bunch of fixes for v3.5, nothing extraordinary. 2012-05-31 12:10:15 -07:00
pps
ps3
ptp Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-05-22 19:22:50 -07:00
rapidio rapidio/tsi721: add DMA engine support 2012-05-31 17:49:31 -07:00
regulator regulator: palmas: fix regmap offsets for enable/disable 2012-06-23 11:37:28 +01:00
remoteproc remoteproc/omap: fix dev_err typo 2012-06-17 10:31:03 +03:00
rpmsg
rtc Merge branches 'bugfix-battery', 'bugfix-misc', 'bugfix-rafael', 'bugfix-turbostat', 'bugfix-video' and 'workaround-pss' into release 2012-06-04 00:48:41 -04:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2012-05-31 10:51:10 -07:00
sbus
scsi SCSI & usb-storage: add try_rc_10_first flag 2012-06-22 22:05:31 -07:00
sfi
sh sh: intc: Kill off special reservation interface. 2012-05-22 19:07:55 +09:00
sn
spi SPI: fix over-eager devm_xxx() conversion 2012-06-18 11:27:04 +01:00
ssb
staging Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2012-06-25 14:53:09 -07:00
target target: Return error to initiator if SET TARGET PORT GROUPS emulation fails 2012-06-12 20:12:25 -07:00
tc
thermal
tty Serial driver fixes for 3.5-rc4 2012-06-20 15:13:13 -07:00
uio
usb xhci: Fix hang on back-to-back Set TR Deq Ptr commands. 2012-07-02 12:51:25 -07:00
uwb
vhost vhost: use USER_DS in vhost_worker thread 2012-06-26 21:10:56 -07:00
video fbdev fixes for 3.5 2012-06-16 16:59:05 -07:00
virt
virtio virtio-mmio: Devices parameter parsing 2012-05-22 12:16:15 +09:30
vlynq
vme
w1 arm-soc: clock driver changes 2012-05-26 12:42:29 -07:00
watchdog watchdog: core: fix WDIOC_GETSTATUS return value 2012-06-28 20:40:56 +02:00
xen Five bug-fixes: 2012-06-15 17:17:15 -07:00
zorro
Kconfig Staging tree pull request for 3.5-rc1 2012-05-22 16:34:21 -07:00
Makefile arm-soc: driver specific updates 2012-05-26 12:22:27 -07:00