linux/drivers/net/usb
Douglas Anderson d9962b0d42 r8152: Block future register access if register access fails
Even though the functions to read/write registers can fail, most of
the places in the r8152 driver that read/write register values don't
check error codes. The lack of error code checking is problematic in
at least two ways.

The first problem is that the r8152 driver often uses code patterns
similar to this:
  x = read_register()
  x = x | SOME_BIT;
  write_register(x);

...with the above pattern, if the read_register() fails and returns
garbage then we'll end up trying to write modified garbage back to the
Realtek adapter. If the write_register() succeeds that's bad. Note
that as of commit f53a7ad189 ("r8152: Set memory to all 0xFFs on
failed reg reads") the "garbage" returned by read_register() will at
least be consistent garbage, but it is still garbage.

It turns out that this problem is very serious. Writing garbage to
some of the hardware registers on the Ethernet adapter can put the
adapter in such a bad state that it needs to be power cycled (fully
unplugged and plugged in again) before it can enumerate again.

The second problem is that the r8152 driver generally has functions
that are long sequences of register writes. Assuming everything will
be OK if a random register write fails in the middle isn't a great
assumption.

One might wonder if the above two problems are real. You could ask if
we would really have a successful write after a failed read. It turns
out that the answer appears to be "yes, this can happen". In fact,
we've seen at least two distinct failure modes where this happens.

On a sc7180-trogdor Chromebook if you drop into kdb for a while and
then resume, you can see:
1. We get a "Tx timeout"
2. The "Tx timeout" queues up a USB reset.
3. In rtl8152_pre_reset() we try to reinit the hardware.
4. The first several (2-9) register accesses fail with a timeout, then
   things recover.

The above test case was actually fixed by the patch ("r8152: Increase
USB control msg timeout to 5000ms as per spec") but at least shows
that we really can see successful calls after failed ones.

On a different (AMD) based Chromebook with a particular adapter, we
found that during reboot tests we'd also sometimes get a transitory
failure. In this case we saw -EPIPE being returned sometimes. Retrying
worked, but retrying is not always safe for all register accesses
since reading/writing some registers might have side effects (like
registers that clear on read).

Let's fully lock out all register access if a register access fails.
When we do this, we'll try to queue up a USB reset and try to unlock
register access after the reset. This is slightly tricker than it
sounds since the r8152 driver has an optimized reset sequence that
only works reliably after probe happens. In order to handle this, we
avoid the optimized reset if probe didn't finish. Instead, we simply
retry the probe routine in this case.

When locking out access, we'll use the existing infrastructure that
the driver was using when it detected we were unplugged. This keeps us
from getting stuck in delay loops in some parts of the driver.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22 11:46:18 +01:00
..
aqc111.c net: move from strlcpy with unused retval to strscpy 2022-08-31 14:11:07 -07:00
aqc111.h
asix_common.c net: move from strlcpy with unused retval to strscpy 2022-08-31 14:11:07 -07:00
asix_devices.c net: asix: fix modprobe "sysfs: cannot create duplicate filename" 2023-03-22 22:04:04 -07:00
asix.h net: asix: ax88772: migrate to phylink 2022-08-26 10:00:52 +01:00
ax88172a.c
ax88179_178a.c Revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" 2022-08-10 09:28:56 +01:00
catc.c net: move from strlcpy with unused retval to strscpy 2022-08-31 14:11:07 -07:00
cdc_eem.c
cdc_ether.c USB: zaurus: Add ID for A-300/B-500/C-700 2023-08-01 14:44:27 -07:00
cdc_mbim.c net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990 2023-03-07 15:27:01 +01:00
cdc_ncm.c net: cdc_ncm: Deal with too low values of dwNtbOutMaxSize 2023-05-18 19:56:17 -07:00
cdc_subset.c
cdc-phonet.c
ch9200.c
cx82310_eth.c
dm9601.c net: usb: dm9601: fix uninitialized variable use in dm9601_mdio_read 2023-10-10 20:08:11 -07:00
gl620a.c
hso.c tty: hso: simplify hso_serial_write() 2023-08-11 21:12:47 +02:00
huawei_cdc_ncm.c
int51x1.c
ipheth.c usbnet: ipheth: add CDC NCM support 2023-06-09 10:26:57 +01:00
kalmia.c net/usb: kalmia: Don't pass act_len in usb_bulk_msg error path 2023-02-13 09:41:14 +00:00
kaweth.c
Kconfig usbnet: ipheth: update Kconfig description 2023-06-09 10:26:57 +01:00
lan78xx.c net: usb: lan78xx: reorder cleanup operations to avoid UAF bugs 2023-07-31 09:58:29 +01:00
lan78xx.h
lg-vl600.c
Makefile
mcs7830.c
net1080.c
pegasus.c net: move from strlcpy with unused retval to strscpy 2022-08-31 14:11:07 -07:00
pegasus.h
plusb.c usb: plusb: remove unused pl_clear_QuickLink_features function 2023-03-20 10:16:27 +00:00
qmi_wwan.c net: usb: qmi_wwan: add Quectel EM05GV2 2023-07-31 14:09:38 -07:00
r8152.c r8152: Block future register access if register access fails 2023-10-22 11:46:18 +01:00
r8153_ecm.c
rndis_host.c usb: rndis_host: Secure rndis_query check against int overflow 2023-01-03 09:24:41 +00:00
rtl8150.c net: move from strlcpy with unused retval to strscpy 2022-08-31 14:11:07 -07:00
sierra_net.c treewide: Convert del_timer*() to timer_shutdown*() 2022-12-25 13:38:09 -08:00
smsc75xx.c net: usb: smsc75xx: Fix uninit-value access in __smsc75xx_read_reg 2023-10-03 10:19:29 +02:00
smsc75xx.h
smsc95xx.c net: usb: smsc95xx: Fix uninit-value access in smsc95xx_read_reg 2023-10-22 11:39:26 +01:00
smsc95xx.h
sr9700.c net: usb: sr9700: Handle negative len 2023-01-17 11:50:42 +01:00
sr9700.h
sr9800.c
sr9800.h
usbnet.c net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb 2023-07-13 20:37:23 -07:00
zaurus.c USB: zaurus: Add ID for A-300/B-500/C-700 2023-08-01 14:44:27 -07:00