linux/drivers/net/ethernet
Ido Schimmel 332fdf951d mlxsw: thermal: Fix out-of-bounds memory accesses
Currently, mlxsw allows cooling states to be set above the maximum
cooling state supported by the driver:

 # cat /sys/class/thermal/thermal_zone2/cdev0/type
 mlxsw_fan
 # cat /sys/class/thermal/thermal_zone2/cdev0/max_state
 10
 # echo 18 > /sys/class/thermal/thermal_zone2/cdev0/cur_state
 # echo $?
 0

This results in out-of-bounds memory accesses when thermal state
transition statistics are enabled (CONFIG_THERMAL_STATISTICS=y), as the
transition table is accessed with a too large index (state) [1].

According to the thermal maintainer, it is the responsibility of the
driver to reject such operations [2].

Therefore, return an error when the state to be set exceeds the maximum
cooling state supported by the driver.

To avoid dead code, as suggested by the thermal maintainer [3],
partially revert commit a421ce088a ("mlxsw: core: Extend cooling
device with cooling levels") that tried to interpret these invalid
cooling states (above the maximum) in a special way. The cooling levels
array is not removed in order to prevent the fans going below 20% PWM,
which would cause them to get stuck at 0% PWM.

[1]
BUG: KASAN: slab-out-of-bounds in thermal_cooling_device_stats_update+0x271/0x290
Read of size 4 at addr ffff8881052f7bf8 by task kworker/0:0/5

CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.15.0-rc3-custom-45935-gce1adf704b14 #122
Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2FO"/"SA000874", BIOS 4.6.5 03/08/2016
Workqueue: events_freezable_power_ thermal_zone_device_check
Call Trace:
 dump_stack_lvl+0x8b/0xb3
 print_address_description.constprop.0+0x1f/0x140
 kasan_report.cold+0x7f/0x11b
 thermal_cooling_device_stats_update+0x271/0x290
 __thermal_cdev_update+0x15e/0x4e0
 thermal_cdev_update+0x9f/0xe0
 step_wise_throttle+0x770/0xee0
 thermal_zone_device_update+0x3f6/0xdf0
 process_one_work+0xa42/0x1770
 worker_thread+0x62f/0x13e0
 kthread+0x3ee/0x4e0
 ret_from_fork+0x1f/0x30

Allocated by task 1:
 kasan_save_stack+0x1b/0x40
 __kasan_kmalloc+0x7c/0x90
 thermal_cooling_device_setup_sysfs+0x153/0x2c0
 __thermal_cooling_device_register.part.0+0x25b/0x9c0
 thermal_cooling_device_register+0xb3/0x100
 mlxsw_thermal_init+0x5c5/0x7e0
 __mlxsw_core_bus_device_register+0xcb3/0x19c0
 mlxsw_core_bus_device_register+0x56/0xb0
 mlxsw_pci_probe+0x54f/0x710
 local_pci_probe+0xc6/0x170
 pci_device_probe+0x2b2/0x4d0
 really_probe+0x293/0xd10
 __driver_probe_device+0x2af/0x440
 driver_probe_device+0x51/0x1e0
 __driver_attach+0x21b/0x530
 bus_for_each_dev+0x14c/0x1d0
 bus_add_driver+0x3ac/0x650
 driver_register+0x241/0x3d0
 mlxsw_sp_module_init+0xa2/0x174
 do_one_initcall+0xee/0x5f0
 kernel_init_freeable+0x45a/0x4de
 kernel_init+0x1f/0x210
 ret_from_fork+0x1f/0x30

The buggy address belongs to the object at ffff8881052f7800
 which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 1016 bytes inside of
 1024-byte region [ffff8881052f7800, ffff8881052f7c00)
The buggy address belongs to the page:
page:0000000052355272 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1052f0
head:0000000052355272 order:3 compound_mapcount:0 compound_pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffffea0005034800 0000000300000003 ffff888100041dc0
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8881052f7a80: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc
 ffff8881052f7b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8881052f7b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                                ^
 ffff8881052f7c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8881052f7c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

[2] https://lore.kernel.org/linux-pm/9aca37cb-1629-5c67-1895-1fdc45c0244e@linaro.org/
[3] https://lore.kernel.org/linux-pm/af9857f2-578e-de3a-e62b-6baff7e69fd4@linaro.org/

CC: Daniel Lezcano <daniel.lezcano@linaro.org>
Fixes: a50c1e3565 ("mlxsw: core: Implement thermal zone")
Fixes: a421ce088a ("mlxsw: core: Extend cooling device with cooling levels")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20211012174955.472928-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14 07:13:26 -07:00
..
3com 3com 3c515: make it compile on 64-bit architectures 2021-09-16 11:14:47 -07:00
8390 ne2000: fix unused function warning 2021-09-08 11:45:06 +01:00
actions net: ethernet: actions: Add helper dependency on COMPILE_TEST 2021-08-25 12:06:53 +01:00
adaptec
aeroflex
agere
alacritech
allwinner
alteon
altera
amazon ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
amd net: ni65: Avoid typecast of pointer to u32 2021-09-09 11:21:19 +01:00
apm xgene-v2: Fix a resource leak in the error handling path of 'xge_probe()' 2021-08-23 11:23:48 +01:00
apple
aquantia atlantic: Fix issue in the pm resume flow. 2021-09-23 13:24:14 +01:00
arc net: arc: select CRC32 2021-10-13 09:00:10 -07:00
atheros
broadcom net: bgmac-platform: handle mac-address deferral 2021-09-27 12:28:15 +01:00
brocade ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
cadence net: macb: fix use after free on rmmod 2021-09-09 10:55:44 +01:00
calxeda
cavium pci-v5.15-changes 2021-09-07 19:13:42 -07:00
chelsio pci-v5.15-changes 2021-09-07 19:13:42 -07:00
cirrus net: cs89x0: disable compile testing on powerpc 2021-09-03 13:42:27 +01:00
cisco ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
cortina ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
davicom
dec tulip: Remove deadcode on startup true condition 2021-08-07 09:39:54 +01:00
dlink
emulex ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
ezchip
faraday
freescale net: enetc: fix the incorrect clearing of IF_MODE bits 2021-09-24 14:03:04 +01:00
fujitsu
google gve: report 64bit tx_bytes counter from gve_handle_report_stats() 2021-10-06 15:11:51 +01:00
hisilicon net: hns3: disable firmware compatible features when uninstall PF 2021-09-29 11:03:54 +01:00
huawei ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
i825xx net: i825xx: Use absolute_pointer for memcpy from fixed memory location 2021-09-15 12:04:28 -07:00
ibm Revert "ibmvnic: check failover_pending in login response" 2021-09-27 13:21:53 +01:00
intel ice: fix locking for Tx timestamp tracking flush 2021-10-12 12:10:39 +01:00
litex net: Add depends on OF_NET for LiteX's LiteETH 2021-08-31 08:36:38 -07:00
marvell octeontx2-af: Fix some memory leaks in the error handling path of 'cgx_lmac_init()' 2021-09-04 13:07:00 +01:00
mediatek net: ethernet: mtk_eth_soc: avoid creating duplicate offload entries 2021-09-23 13:14:19 +01:00
mellanox mlxsw: thermal: Fix out-of-bounds memory accesses 2021-10-14 07:13:26 -07:00
micrel net: ks8851: fix link error 2021-09-28 13:11:20 +01:00
microchip net: encx24j600: check error in devm_regmap_init_encx24j600 2021-10-13 15:25:25 -07:00
microsoft net: mana: Fix error handling in mana_create_rxq() 2021-10-08 17:00:04 -07:00
moxa
mscc net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver 2021-10-12 17:35:18 -07:00
myricom ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
natsemi Driver core update for 5.15-rc1 2021-09-01 08:44:42 -07:00
neterion ethernet: s2io: fix setting mac address during resume 2021-10-14 07:12:33 -07:00
netronome nfp: flow_offload: move flow_indr_dev_register from app init to app start 2021-10-12 16:07:52 -07:00
ni net/mlxbf_gige: Make use of devm_platform_ioremap_resourcexxx() 2021-08-31 12:08:05 +01:00
nvidia forcedeth: switch from 'pci_' to 'dma_' API 2021-08-23 11:56:57 +01:00
nxp
oki-semi net: pch_gbe: remove mii_ethtool_gset() error handling 2021-08-19 13:06:53 +01:00
packetengines
pasemi net: pasemi: Remove usage of the deprecated "pci-dma-compat.h" API 2021-08-30 20:30:51 -07:00
pensando ionic: don't remove netdev->dev_addr when syncing uc list 2021-10-09 11:56:59 +01:00
qlogic qed: Fix missing error code in qed_slowpath_start() 2021-10-09 13:46:41 +01:00
qualcomm net: qcom/emac: Replace strlcpy with strscpy 2021-09-06 16:43:17 +01:00
rdc r6040: Restore MDIO clock frequency after MAC reset 2021-09-10 10:00:08 +01:00
realtek r8169: add rtl_enable_exit_l1 2021-08-26 12:05:43 +01:00
renesas net: renesas: sh_eth: Fix freeing wrong tx descriptor 2021-09-07 14:02:02 +01:00
rocker Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-13 06:41:22 -07:00
samsung ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
seeq
sfc Networking fixes for 5.15-rc2, including fixes from bpf. 2021-09-16 13:05:42 -07:00
sgi
silan
sis
smsc drivers: net: smc911x: clean up inconsistent indenting 2021-09-03 11:51:26 +01:00
socionext ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
stmicro net: stmmac: add support for dwmac 3.40a 2021-10-08 16:22:39 +01:00
sun net: sun: SUNVNET_COMMON should depend on INET 2021-09-28 13:20:21 +01:00
synopsys ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
tehuti ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
ti ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
toshiba net: spider_net: switch from 'pci_' to 'dma_' API 2021-08-29 10:50:24 +01:00
tundra
via ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
wiznet net: w5100: check return value after calling platform_get_resource() 2021-08-31 12:08:42 +01:00
xilinx ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
xircom
xscale net: ixp46x: Remove duplicate include of module.h 2021-09-01 11:40:22 +01:00
dnet.c
dnet.h
ec_bhf.c net: ec_bhf: switch from 'pci_' to 'dma_' API 2021-08-23 11:56:57 +01:00
ethoc.c
fealnx.c
jme.c ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
jme.h
Kconfig net: korina: select CRC32 2021-10-13 13:28:35 -07:00
korina.c
lantiq_etop.c
lantiq_xrx200.c
Makefile net: Add driver for LiteX's LiteETH network interface 2021-08-26 12:13:52 +01:00