linux

Author	SHA1	Message	Date
Mike Marciniszyn	5de61a47eb	IB/hfi1: Fix probe time panic when AIP is enabled with a buggy BIOS A panic can result when AIP is enabled: BUG: unable to handle kernel NULL pointer dereference at 000000000000000 PGD 0 P4D 0 Oops: 0000 1 SMP PTI CPU: 70 PID: 981 Comm: systemd-udevd Tainted: G OE --------- - - 4.18.0-240.el8.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.01.01.0005.101720141054 10/17/2014 RIP: 0010:__bitmap_and+0x1b/0x70 RSP: 0018:ffff99aa0845f9f0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8d5a6fc18000 RCX: 0000000000000048 RDX: 0000000000000000 RSI: ffffffffc06336f0 RDI: ffff8d5a8fa67750 RBP: 0000000000000079 R08: 0000000fffffffff R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffffffffc06336f0 R13: 00000000000000a0 R14: ffff8d5a6fc18000 R15: 0000000000000003 FS: 00007fec137a5980(0000) GS:ffff8d5a9fa80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000a04b48002 CR4: 00000000001606e0 Call Trace: hfi1_num_netdev_contexts+0x7c/0x110 [hfi1] hfi1_init_dd+0xd7f/0x1a90 [hfi1] ? pci_bus_read_config_dword+0x49/0x70 ? pci_mmcfg_read+0x3e/0xe0 do_init_one.isra.18+0x336/0x640 [hfi1] local_pci_probe+0x41/0x90 pci_device_probe+0x105/0x1c0 really_probe+0x212/0x440 driver_probe_device+0x49/0xc0 device_driver_attach+0x50/0x60 __driver_attach+0x61/0x130 ? device_driver_attach+0x60/0x60 bus_for_each_dev+0x77/0xc0 ? klist_add_tail+0x3b/0x70 bus_add_driver+0x14d/0x1e0 ? dev_init+0x10b/0x10b [hfi1] driver_register+0x6b/0xb0 ? dev_init+0x10b/0x10b [hfi1] hfi1_mod_init+0x1e6/0x20a [hfi1] do_one_initcall+0x46/0x1c3 ? free_unref_page_commit+0x91/0x100 ? _cond_resched+0x15/0x30 ? kmem_cache_alloc_trace+0x140/0x1c0 do_init_module+0x5a/0x220 load_module+0x14b4/0x17e0 ? __do_sys_finit_module+0xa8/0x110 __do_sys_finit_module+0xa8/0x110 do_syscall_64+0x5b/0x1a0 The issue happens when pcibus_to_node() returns NO_NUMA_NODE. Fix this issue by moving the initialization of dd->node to hfi1_devdata allocation and remove the other pcibus_to_node() calls in the probe path and use dd->node instead. Affinity logic is adjusted to use a new field dd->affinity_entry as a guard instead of dd->node. Fixes: `4730f4a6c6` ("IB/hfi1: Activate the dummy netdev") Link: https://lore.kernel.org/r/1617025700-31865-4-git-send-email-dennis.dalessandro@cornelisnetworks.com Cc: stable@vger.kernel.org Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-07 15:31:59 -03:00
Potnuri Bharat Teja	603c4690b0	RDMA/cxgb4: check for ipv6 address properly while destroying listener ipv6 bit is wrongly set by the below which causes fatal adapter lookup engine errors for ipv4 connections while destroying a listener. Fix it to properly check the local address for ipv6. Fixes: `3408be145a` ("RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening server") Link: https://lore.kernel.org/r/20210331135715.30072-1-bharat@chelsio.com Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-07 15:31:45 -03:00
Frank Rowand	649cab56de	of: properly check for error returned by fdt_get_name() fdt_get_name() returns error values via a parameter pointer instead of in function return. Fix check for this error value in populate_node() and callers of populate_node(). Chasing up the caller tree showed callers of various functions failing to initialize the value of pointer parameters that can return error values. Initialize those values to NULL. The bug was introduced by commit `e6a6928c3e` ("of/fdt: Convert FDT functions to use libfdt") but this patch can not be backported directly to that commit because the relevant code has further been restructured by commit `dfbd4c6eff` ("drivers/of: Split unflatten_dt_node()") The bug became visible by triggering a crash on openrisc with: commit `79edff1206` ("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9") as reported in: https://lore.kernel.org/lkml/20210327224116.69309-1-linux@roeck-us.net/ Fixes: `79edff1206` ("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Frank Rowand <frank.rowand@sony.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20210405032845.1942533-1-frowand.list@gmail.com Signed-off-by: Rob Herring <robh@kernel.org>	2021-04-07 13:07:30 -05:00
Vitaly Kuznetsov	fa26d0c778	ACPI: processor: Fix build when CONFIG_ACPI_PROCESSOR=m Commit `8cdddd182b` ("ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()") tried to fix CPU0 hotplug breakage by copying wakeup_cpu0() + start_cpu0() logic from hlt_play_dead()//mwait_play_dead() into acpi_idle_play_dead(). The problem is that these functions are not exported to modules so when CONFIG_ACPI_PROCESSOR=m build fails. The issue could've been fixed by exporting both wakeup_cpu0()/start_cpu0() (the later from assembly) but it seems putting the whole pattern into a new function and exporting it instead is better. Reported-by: kernel test robot <lkp@intel.com> Fixes: `8cdddd182b` ("CPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()") Cc: <stable@vger.kernel.org> # 5.10+ Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-04-07 19:02:43 +02:00
Linus Torvalds	3a22981230	ARM SoC fixes for v5.12, part 2 Most of the changes again are devicetree fixes, but there are also five trivial build fixes for issues I found when test building with gcc-11 or when running 'make W=1', and some OMAP platform specific code fixups. Broadcom - One revert for a Raspberry pi interrupt controller change that caused a regression. TI OMAP: - Remove unused duplicate sha2md5_fck clock node that can race with the OMAP4_SHA2MD5_CLKCTRL clock node for disable for unused clocks - Add aliases for omap4/5 mmc to put the slots back into the right order again - Fix typo for bionic voltage controllers that accidentally use mpu for all instances instead of mpu, core and iva - Fix random hangs for droid4 caused by missing fix from TI Android kernel tree to do a dummy smc call on cpuidle wakeup path NXP i.MX: - Fix a system failure on imx6qdl-phytec-pfla02 board when booting from SD, by adding missing vmmc supply for SD interfaces. - Fix address typo in i.MX8MM/Q IOMUXC_SD1_DATA0_GPIO2_IO2 definition. Marvell mvebu: - Fix storm interrupt on Turris Omnia - Enable hardware buffer management as it should be Build fixes for PXA, Freescale, Marvell, OMAP1 an Keystone. Signed-off-by: Arnd Bergmann <arnd@arndb.de> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmBs0OgACgkQYKtH/8kJ UifdEBAAiD3JebS8a1jsgL+/va/ptOuBZP2l4sCH3P/bczsNKeAn+BvwAy4jNJ4b C55ZFnz6tX37CGY7e1Pe7LC8WhVd1LGfCm/gSreKUTkETZd/87PoR1xM4GxbhmBQ 8HNJOVDBSes6tHgWTAgQ7rHGQQ71JoRYc9FJPOH2JDsk8SaeL8Z+Bjay3O3nlBQw RU0zoWv/khkdRvzt4oDTmW6pPDQh5c9twv2ORZM92+tXhSeF2AAY08GdAAmiZL5W Lq30YozGSJHPcIYSN+jSWPJNtzmrF3oZVTqDzqTN/aIVoH+8MFZHSmCd3iM1RWkT wkanNiqF7CRYAdLmC00YTToJUQxsbOYugfUMWYC04VocVbeEDAhnITFVF1zrJLZ4 q4E/S5WSZjLPUsiDhSK+d0S2bFVrEyQUaDaFWrC6Aet5wA6pI/8X0Q3ZSMV7jzq+ NkZYuA2oKoW0vwnH+7432/1g33CpCxKRVr/zBhesjCpB3Ymj0OWfqGeHA2fyjFQq fNvUnG6LyXE+NBgIfgZTGbBr1gCT/XHqd0GcYrBy4v0L3x8qJSh1ClA0qlpWr+Zl mY5jMC6MrGGuHXEhqIoS38mO0RTyx9i2iDjge2CrAMmRxdVR453Z4VIbDnSwGDAe K8lASQKHEyvRzdmJDVhaesHqwU9BDtWULY8Q2+3jKqv3wwf6d0I= =YY35 -----END PGP SIGNATURE----- Merge tag 'arm-fixes-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "Most of the changes again are devicetree fixes, but there are also five trivial build fixes for issues I found when test building with gcc-11 or when running 'make W=1', and some OMAP platform specific code fixups. Broadcom: - One revert for a Raspberry pi interrupt controller change that caused a regression. TI OMAP: - Remove unused duplicate sha2md5_fck clock node that can race with the OMAP4_SHA2MD5_CLKCTRL clock node for disable for unused clocks - Add aliases for omap4/5 mmc to put the slots back into the right order again - Fix typo for bionic voltage controllers that accidentally use mpu for all instances instead of mpu, core and iva - Fix random hangs for droid4 caused by missing fix from TI Android kernel tree to do a dummy smc call on cpuidle wakeup path NXP i.MX: - Fix a system failure on imx6qdl-phytec-pfla02 board when booting from SD, by adding missing vmmc supply for SD interfaces. - Fix address typo in i.MX8MM/Q IOMUXC_SD1_DATA0_GPIO2_IO2 definition. Marvell mvebu: - Fix storm interrupt on Turris Omnia - Enable hardware buffer management as it should be ... and build fixes for PXA, Freescale, Marvell, OMAP1 and Keystone" * tag 'arm-fixes-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: ARM: dts: turris-omnia: configure LED[2]/INTn pin as interrupt pin ARM: dts: turris-omnia: fix hardware buffer management Revert "arm64: dts: marvell: armada-cp110: Switch to per-port SATA interrupts" ARM: mvebu: avoid clang -Wtautological-constant warning ARM: pxa: mainstone: avoid -Woverride-init warning ARM: omap1: fix building with clang IAS soc/fsl: qbman: fix conflicting alignment attributes ARM: keystone: fix integer overflow warning ARM: dts: imx6: pbab01: Set vmmc supply for both SD interfaces arm64: dts: imx8mm/q: Fix pad control of SD1_DATA0 ARM: OMAP4: PM: update ROM return address for OSWR and OFF ARM: OMAP4: Fix PMIC voltage domains for bionic ARM: dts: Fix moving mmc devices with aliases for omap4 & 5 ARM: dts: Drop duplicate sha2md5_fck to fix clk_disable race Revert "ARM: dts: bcm2711: Add the BSC interrupt controller"	2021-04-07 09:26:50 -07:00
Linus Torvalds	dbaa5d1c25	Merge branch 'parisc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "One link error fix found by the kernel test robot, one sparse warning fix, remove a duplicate declaration and some spelling fixes" * 'parisc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: math-emu: Few spelling fixes in the file fpu.h parisc: avoid a warning on u8 cast for cmpxchg on u8 pointers parisc: parisc-agp requires SBA IOMMU driver parisc: Remove duplicate struct task_struct declaration	2021-04-07 09:20:07 -07:00
Linus Torvalds	5ba091db93	platform-drivers-x86 for v5.12-3 A single bugfix (on top of platform-drivers-x86-v5.12-2) to fix spurious wakeups from suspend caused by recent intel-hid driver changes. The following is an automated git shortlog grouped by driver: intel-hid: - Fix spurious wakeups caused by tablet-mode events during suspend -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmBtg4YUHGhkZWdvZWRl QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9x6DQf+MpmnVmDoJ5gB6yvICybA080oz4Vr 7y9QHHP98ErpS1jROLXxUyrbKksOvc7JBKEEk06soGLX4M7+rv0GgngE43EaK1O/ 7VSW0i59j9wCvMrav8IQL/br/CvJt8oSuJ3YGP4LuM6bgSCzOSyiHYqAcfH8/xhs 4UZQQWTnhTqmNOOGUOGRqHnT0GqF8S1DD3/dyBlX68TRV0+iNsy0/WCUQCfB9alY yqzHqHydFCpApVgaT4MsX4nJSrKDT/X5wBXBFDkBNevCrK2SOp/B4STfbSdB6GRR q6hsSIMPo86EHQWY4hkMk1+hi7G572RTyJ6vOi5DJe1bD3cFPJSrm2uO/Q== =aUK1 -----END PGP SIGNATURE----- Merge tag 'platform-drivers-x86-v5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fix from Hans de Goede: "A single bugfix to fix spurious wakeups from suspend caused by recent intel-hid driver changes" * tag 'platform-drivers-x86-v5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: intel-hid: Fix spurious wakeups caused by tablet-mode events during suspend	2021-04-07 09:14:04 -07:00
Linus Torvalds	e3bb2f4f96	regulator: bd9571mwv fixes for v5.12 A set of driver specific fixes here, the main one is a fix to not try to set unsupported voltages on this device. The other two patches clean up the error handling and eliminate the possibility that we could overflow the page when writing sysfs output (which AFAICT wasn't an issue but better to be sure). -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmBsnnAACgkQJNaLcl1U h9AZYAgAg2kEX44MHe5P7fOgSI7rTQum+Q55zGNIk0jdtThLjq62qSQuq3QJp19e +xVdNA76CtVONz3LYxMRRuyxXW29jvc+RCcCYbMQ/mlKQXIcia8nYzRqzhswZI4p AY3rVtSXf8WcL0cP30LeyB7+ynu0t8P4pZ+IlXCNc5gWJK0D06iG2zldHJFgy4Oa B/0eP/NxVgy3EzLWB2dhyQOeP0Yi+U24XFwY6qkMZtvcLfHllo5L8ts9GT+0PD5T ozXgZxD08V4zReFY0Y1Ptnt2ETh/yNLDhSp+NQcb7xrDjQ0qcKLSU7aV1I7N2vva 6tG8nO1I+HoHjldHUmcRF/4hqr4uoQ== =eD6Z -----END PGP SIGNATURE----- Merge tag 'regulator-fix-v5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "bd9571mwv regulator fixes for v5.12. A set of driver specific fixes here, the main one is a fix to not try to set unsupported voltages on this device. The other two patches clean up the error handling and eliminate the possibility that we could overflow the page when writing sysfs output (which AFAICT wasn't an issue but better to be sure)" * tag 'regulator-fix-v5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: bd9571mwv: Convert device attribute to sysfs_emit() regulator: bd9571mwv: Fix regulator name printed on registration failure regulator: bd9571mwv: Fix AVS and DVFS voltage range	2021-04-07 09:08:36 -07:00
Luca Fancellu	d120198bd5	xen/evtchn: Change irq_info lock to raw_spinlock_t Unmask operation must be called with interrupt disabled, on preempt_rt spin_lock_irqsave/spin_unlock_irqrestore don't disable/enable interrupts, so use raw_* implementation and change lock variable in struct irq_info from spinlock_t to raw_spinlock_t Cc: stable@vger.kernel.org Fixes: `25da4618af` ("xen/events: don't unmask an event channel when an eoi is pending") Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20210406105105.10141-1-luca.fancellu@arm.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2021-04-07 08:33:28 -05:00
Takashi Iwai	9c3195778c	ASoC: Fixes for v5.12 A fairly small batch of driver specific fixes, mainly for various x86 systems with the biggest set being fixes to power down DSPs properly on x86 SOF systems. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmBsoLcACgkQJNaLcl1U h9A+yAf+OvyzGS1gejvjZNs7KLfDs+ujjnVhPW+PX+1FOZmwcCQn+BirFL8iWUPG /8JxHDR87TZPk01mW1/o0m2MzmiwwE25IXoyH5qCr4h8azDloOcPz+AwEo7E/yBU pxY2m/GNeLRzQB4kXAs6XtA2HTK0pz4qtbLAVVfroXeOPx2sBN79jypN73nIDgku cLJAP5J+1e8BBZA0C8zupVRcs65nC7RN1AdaFYaEKCxyn/0Qd7LA+h7p5nJZvJmF v6vY0YUmA+cFOA9ExHW7Rt6QIn8iIwp53i3BrtEbuLkxUv+2GckDP//u9C0WnRJq BDPN9VXcmPRD3xjOdmkf79BJ7ptByw== =BzCL -----END PGP SIGNATURE----- Merge tag 'asoc-fix-v5.12-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v5.12 A fairly small batch of driver specific fixes, mainly for various x86 systems with the biggest set being fixes to power down DSPs properly on x86 SOF systems.	2021-04-07 15:00:33 +02:00
Heiko Carstens	ad31a8c051	s390/setup: use memblock_free_late() to free old stack Use memblock_free_late() to free the old machine check stack to the buddy allocator instead of leaking it. Fixes: `b61b159512` ("s390: add stack for machine check handler") Cc: Vasily Gorbik <gor@linux.ibm.com> Acked-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2021-04-07 14:37:28 +02:00
Jonas Holmberg	168632a495	ALSA: aloop: Fix initialization of controls Add a control to the card before copying the id so that the numid field is initialized in the copy. Otherwise the numid field of active_id, format_id, rate_id and channels_id will be the same (0) and snd_ctl_notify() will not queue the events properly. Signed-off-by: Jonas Holmberg <jonashg@axis.com> Reviewed-by: Jaroslav Kysela <perex@perex.cz> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210407075428.2666787-1-jonashg@axis.com Signed-off-by: Takashi Iwai <tiwai@suse.de>	2021-04-07 10:21:25 +02:00
Marc Kleine-Budde	c7eb923c3c	can: mcp251xfd: mcp251xfd_regmap_crc_read(): work around broken CRC on TBC register MCP251XFD_REG_TBC is the time base counter register. It increments once per SYS clock tick, which is 20 or 40 MHz. Observation shows that if the lowest byte (which is transferred first on the SPI bus) of that register is 0x00 or 0x80 the calculated CRC doesn't always match the transferred one. To reproduce this problem let the driver read the TBC register in a high frequency. This can be done by attaching only the mcp251xfd CAN controller to a valid terminated CAN bus and send a single CAN frame. As there are no other CAN controller on the bus, the sent CAN frame is not ACKed and the mcp251xfd repeats it. If user space enables the bus error reporting, each of the NACK errors is reported with a time stamp (which is read from the TBC register) to user space. $ ip link set can0 down $ ip link set can0 up type can bitrate 500000 berr-reporting on $ cansend can0 4FF#ff.01.00.00.00.00.00.00 This leads to several error messages per second: \| mcp251xfd spi0.0 can0: CRC read error at address 0x0010 (length=4, data=00 3a 86 da, CRC=0x7753) retrying. \| mcp251xfd spi0.0 can0: CRC read error at address 0x0010 (length=4, data=80 01 b4 da, CRC=0x5830) retrying. \| mcp251xfd spi0.0 can0: CRC read error at address 0x0010 (length=4, data=00 e9 23 db, CRC=0xa723) retrying. \| mcp251xfd spi0.0 can0: CRC read error at address 0x0010 (length=4, data=00 8a 30 db, CRC=0x4a9c) retrying. \| mcp251xfd spi0.0 can0: CRC read error at address 0x0010 (length=4, data=80 f3 43 db, CRC=0x66d2) retrying. If the highest bit in the lowest byte is flipped the transferred CRC matches the calculated one. We assume for now the CRC calculation in the chip works on wrong data and the transferred data is correct. This patch implements the following workaround: - If a CRC read error on the TBC register is detected and the lowest byte is 0x00 or 0x80, the highest bit of the lowest byte is flipped and the CRC is calculated again. - If the CRC now matches, the _original_ data is passed to the reader. For now we assume transferred data was OK. Link: https://lore.kernel.org/r/20210406110617.1865592-5-mkl@pengutronix.de Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-04-07 09:31:28 +02:00
Marc Kleine-Budde	ef7a8c3e75	can: mcp251xfd: mcp251xfd_regmap_crc_read_one(): Factor out crc check into separate function This patch factors out the crc check into a separate function. This is preparation for the next patch. Link: https://lore.kernel.org/r/20210406110617.1865592-4-mkl@pengutronix.de Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-04-07 09:31:28 +02:00
Marc Kleine-Budde	0084e298ac	can: mcp251xfd: add BQL support This patch re-adds BQL support to the driver. Support for netdev_xmit_more() will be added in a separate patch series. Link: https://lore.kernel.org/r/20210406110617.1865592-3-mkl@pengutronix.de Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-04-07 09:31:28 +02:00
Marc Kleine-Budde	8dc987519a	can: c_can: remove unused enum BOSCH_C_CAN_PLATFORM This patch removes the unused enum BOSCH_C_CAN_PLATFORM. Link: https://lore.kernel.org/r/20210406110617.1865592-2-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-04-07 09:31:28 +02:00
Marc Kleine-Budde	644022b1de	can: m_can: m_can_receive_skb(): add missing error handling to can_rx_offload_queue_sorted() call In commit `1be37d3b04` ("can: m_can: fix periph RX path: use rx-offload to ensure skbs are sent from softirq context") the RX path for peripherals (i.e. SPI based m_can controllers) was converted to the rx-offload infrastructure. However, the error handling for can_rx_offload_queue_sorted() was forgotten. can_rx_offload_queue_sorted() will return with an error if the internal queue is full. This patch adds the missing error handling, by increasing the rx_fifo_errors. Fixes: `1be37d3b04` ("can: m_can: fix periph RX path: use rx-offload to ensure skbs are sent from softirq context") Link: https://lore.kernel.org/r/20210401084515.1455013-1-mkl@pengutronix.de Reported-by: coverity-bot <keescook+coverity-bot@chromium.org> Addresses-Coverity-ID: 1503583 ("Error handling issues") Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Torin Cooper-Bennun <torin@maxiluxsystems.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-04-07 09:31:28 +02:00
Marc Kleine-Budde	c812948744	can: skb: alloc_can{,fd}_skb(): set "cf" to NULL if skb allocation fails The handling of CAN bus errors typically consist of allocating a CAN error SKB using alloc_can_err_skb() followed by stats handling and filling the error details in the newly allocated CAN error SKB. Even if the allocation of the SKB fails the stats handling should not be skipped. The common pattern in CAN drivers is to allocate the skb and work on the struct can_frame pointer "cf", if it has been assigned by alloc_can_err_skb(). \| skb = alloc_can_err_skb(priv->ndev, &cf); \| \| /* RX errors */ \| if (bdiag1 & (MCP251XFD_REG_BDIAG1_DCRCERR \| \| MCP251XFD_REG_BDIAG1_NCRCERR)) { \| netdev_dbg(priv->ndev, "CRC error\n"); \| \| stats->rx_errors++; \| if (cf) \| cf->data[3] \|= CAN_ERR_PROT_LOC_CRC_SEQ; \| } In case of an OOM alloc_can_err_skb() returns NULL, but doesn't set "cf" to NULL as well. For the above pattern to work the "cf" has to be initialized to NULL, which is easily forgotten. To solve this kind of problems, set "cf" to NULL if alloc_can_err_skb() returns NULL. Link: https://lore.kernel.org/r/20210402102245.1512583-1-mkl@pengutronix.de Suggested-by: Vincent MAILHOL <mailhol.vincent@wanadoo.fr> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-04-07 09:31:19 +02:00
Chris Mi	f94d6389f6	net/mlx5e: TC, Add support to offload sample action The following diagram illustrates the hardware model for tc sample action: +---------------------+ + original flow table + +---------------------+ + original match + +---------------------+ \| v +------------------------------------------------+ + Flow Sampler Object + +------------------------------------------------+ + sample ratio + +------------------------------------------------+ + sample table id \| default table id + +------------------------------------------------+ \| \| v v +-----------------------------+ +----------------------------------------+ + sample table + + default table per <vport, chain, prio> + +-----------------------------+ +----------------------------------------+ + forward to management vport + + original match + +-----------------------------+ +----------------------------------------+ + other actions + +----------------------------------------+ The sample action is translated to a goto flow table object destination which samples packets according to the provided sample ratio. Sampled packets are duplicated. One copy is processed by a termination table, named the sample table, which sends the packet to the eswitch manager port (that will be processed by software). The second copy is processed by the default table which executes the subsequent actions. The default table is created per <vport, chain, prio> tuple as rules with different prios and chains may overlap. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:36:05 -07:00
Chris Mi	be9dc00474	net/mlx5e: TC, Handle sampled packets Mark the sampled packets with a sample restore object. Send sampled packets using the psample api. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:36:04 -07:00
Chris Mi	7319a1cc3c	net/mlx5e: TC, Refactor tc update skb function As a pre-step to process sampled packet in this function. Signed-off-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:36:04 -07:00
Chris Mi	36a3196256	net/mlx5e: TC, Add sampler restore handle API Use common object pool to create an object ID to map sample parameters. Allocate a modify header action to write the object ID to reg_c0 lower 16 bits. Create a restore rule to pass the object ID to software. So software can identify sampled packets via the object ID and send it to userspace. Aggregate the modify header action, restore rule and object ID to a sample restore handle. Re-use identical sample restore handle for the same object ID. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:36:04 -07:00
Chris Mi	11ecd6c60b	net/mlx5e: TC, Add sampler object API In order to offload sample action, HW introduces sampler object. The sampler object samples packets according to the provided sample ratio. Sampled packets are duplicated. One copy is processed by a termination table, named the sample table, which sends the packet up to software. The second copy is processed by the default table. Instantiate sampler object. Re-use identical sampler object for the same sample ratio, sample table and default table as a prestep for offloading tc sample actions. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:36:03 -07:00
Chris Mi	2a9ab10a56	net/mlx5e: TC, Add sampler termination table API Sampled packets are sent to software using termination tables. There is only one rule in that table that is to forward sampled packets to the e-switch management vport. Create a sampler termination table and rule for each eswitch. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:36:03 -07:00
Chris Mi	41c2fd9498	net/mlx5e: TC, Parse sample action Parse TC sample action and save sample parameters in flow attribute data structure. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:36:03 -07:00
Chris Mi	c935568271	net/mlx5: Instantiate separate mapping objects for FDB and NIC tables Currently, the u32 chain id is mapped to u16 value which is stored on the lower 16 bits of reg_c0 for FDB and reg_b for NIC tables. The mapping is internally maintained by the chains object. However, with the introduction of reg_c0 objects the fdb may store more than just the chain id on reg_c0. This is not relevant for NIC tables. Separate the chains mapping instantiation for FDB and NIC tables. Remove the mapping from the chains object. For FDB tables, create the mapping per eswitch. For NIC tables, create the mapping per tc table. Pass the corresponding mapping pointer when creating the chains object. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:36:02 -07:00
Chris Mi	a91d98a0a2	net/mlx5: Map register values to restore objects Currently reg_c0 lower 16 bits and reg_b are used to store the chain id that missed in FDB and NIC tables accordingly. However, the registers' values may index a restore object, rather than a single u32 value. Different object types can be used to restore mutually exclusive contexts such as chain id and sample group id. Use the mapping object to associate an index with a restore object as a prestep for supporting additional restore types. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:36:02 -07:00
Chris Mi	c1904360dd	net/mlx5: E-switch, Set per vport table default group number Different per voprt table is created using a different per vport table namespace. Because we can't use variable to set the namespace member value. If max group number is 0 in the namespace, use the eswitch default max group number. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:36:02 -07:00
Chris Mi	c796bb7cd2	net/mlx5: E-switch, Generalize per vport table API Currently, per vport table was used only for port mirroring actions. However, sample action will also require a per vport table instance. Generalize the vport table API to work with multiple namespaces where each namespace manages its own vport table instance. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:36:01 -07:00
Chris Mi	0a9e230787	net/mlx5: E-switch, Rename functions to follow naming convention. Public api starts with mlx5 and remove mlx5 for non-public api. Signed-off-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:36:01 -07:00
Chris Mi	4c7f40287a	net/mlx5: E-switch, Move vport table functions to a new file Currently, the vport table functions are in common eswitch offload file. This file is too big. Move the vport table create, delete and lookup functions to a separate file. Put the file in esw directory. Pre-step for generalizing its functionality for serving both the mirroring and the sample features. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:36:01 -07:00
Xiaoming Ni	d5f9b005c3	net/mlx5: fix kfree mismatch in indir_table.c Memory allocated by kvzalloc() should be freed by kvfree(). Fixes: `34ca65352d` ("net/mlx5: E-Switch, Indirect table infrastructur") Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:04:36 -07:00
Aya Levin	534b1204ca	net/mlx5: Fix PBMC register mapping Add reserved mapping to cover all the register in order to avoid setting arbitrary values to newer FW which implements the reserved fields. Fixes: `50b4a3c236` ("net/mlx5: PPTB and PBMC register firmware command support") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:04:36 -07:00
Aya Levin	ce28f0fd67	net/mlx5: Fix PPLM register mapping Add reserved mapping to cover all the register in order to avoid setting arbitrary values to newer FW which implements the reserved fields. Fixes: `a58837f52d` ("net/mlx5e: Expose FEC feilds and related capability bit") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:04:35 -07:00
Raed Salem	a14587dfc5	net/mlx5: Fix placement of log_max_flow_counter The cited commit wrongly placed log_max_flow_counter field of mlx5_ifc_flow_table_prop_layout_bits, align it to the HW spec intended placement. Fixes: `16f1c5bb3e` ("net/mlx5: Check device capability for maximum flow counters") Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:04:35 -07:00
Eli Cohen	1a73704c82	net/mlx5: Fix HW spec violation configuring uplink Make sure to modify uplink port to follow only if the uplink_follow capability is set as required by the HW spec. Failure to do so causes traffic to the uplink representor net device to cease after switching to switchdev mode. Fixes: `7d0314b11c` ("net/mlx5e: Modify uplink state on interface up/down") Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-06 21:04:35 -07:00
Al Viro	4f0ed93fb9	LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late That (and traversals in case of umount .) should be done before complete_walk(). Either a braino or mismerge damage on queue reorders - either way, I should've spotted that much earlier. Fucked-up-by: Al Viro <viro@zeniv.linux.org.uk> X-Paperbag: Brown Fixes: `161aff1d93` "LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()" Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-04-06 20:33:00 -04:00
Jakub Kicinski	0b35e0deb5	docs: ethtool: correct quotes Quotes to backticks. All commands use backticks since the names are constants. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-06 16:56:58 -07:00
Jakub Kicinski	5219d6012d	docs: ethtool: fix some copy-paste errors Fix incorrect documentation. Mostly referring to other objects, likely because the text was copied and not adjusted. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-06 16:55:41 -07:00
Phillip Potter	cca8ea3b05	net: tun: set tun->dev->addr_len during TUNSETLINK processing When changing type with TUNSETLINK ioctl command, set tun->dev->addr_len to match the appropriate type, using new tun_get_addr_len utility function which returns appropriate address length for given type. Fixes a KMSAN-found uninit-value bug reported by syzbot at: https://syzkaller.appspot.com/bug?id=0766d38c656abeace60621896d705743aeefed51 Reported-by: syzbot+001516d86dbe88862cec@syzkaller.appspotmail.com Diagnosed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-06 16:52:21 -07:00
Peng Zhang	631a44ed25	nfp: flower: add support for packet-per-second policing Allow hardware offload of a policer action attached to a matchall filter which enforces a packets-per-second rate-limit. e.g. tc filter add dev tap1 parent ffff: u32 match \ u32 0 0 police pkts_rate 3000 pkts_burst 1000 Signed-off-by: Peng Zhang <peng.zhang@corigine.com> Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-06 16:47:46 -07:00
Wong Vee Khee	63cf323899	ethtool: fix incorrect datatype in set_eee ops The member 'tx_lpi_timer' is defined with __u32 datatype in the ethtool header file. Hence, we should use ethnl_update_u32() in set_eee ops. Fixes: `fd77be7bd4` ("ethtool: set EEE settings with EEE_SET request") Cc: <stable@vger.kernel.org> # 5.10.x Cc: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-06 16:42:25 -07:00
Guangbin Huang	ed7bedd2c3	net: hns3: clear VF down state bit before request link status Currently, the VF down state bit is cleared after VF sending link status request command. There is problem that when VF gets link status replied from PF, the down state bit may still set as 1. In this case, the link status replied from PF will be ignored and always set VF link status to down. To fix this problem, clear VF down state bit before VF requests link status. Fixes: `e2cb1dec97` ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-06 16:40:06 -07:00
David S. Miller	5106efe6ed	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following batch contains Netfilter/IPVS updates for your net-next tree: 1) Simplify log infrastructure modularity: Merge ipv4, ipv6, bridge, netdev and ARP families to nf_log_syslog.c. Add module softdeps. This fixes a rare deadlock condition that might occur when log module autoload is required. From Florian Westphal. 2) Moves part of netfilter related pernet data from struct net to net_generic() infrastructure. All of these users can be modules, so if they are not loaded there is no need to waste space. Size reduction is 7 cachelines on x86_64, also from Florian. 2) Update nftables audit support to report events once per table, to get it aligned with iptables. From Richard Guy Briggs. 3) Check for stale routes from the flowtable garbage collector path. This is fixing IPv6 which breaks due missing check for the dst_cookie. 4) Add a nfnl_fill_hdr() function to simplify netlink + nfnetlink headers setup. 5) Remove documentation on several statified functions. 6) Remove printk on netns creation for the FTP IPVS tracker, from Florian Westphal. 7) Remove unnecessary nf_tables_destroy_list_lock spinlock initialization, from Yang Yingliang. 7) Remove a duplicated forward declaration in ipset, from Wan Jiabing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-06 16:36:41 -07:00
David S. Miller	f57796a4b8	linux-can-fixes-for-5.12-20210406 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmBsOJcTHG1rbEBwZW5n dXRyb25peC5kZQAKCRCpyVqK+u3vqWR9B/4zU7gK2vJ1HfN8zk4pHrCde5KGuqyJ 8Neh73b6p6u/4fssUtKigmMmdcEDbjAM7fWxrIA13RVbUXgpye98ikZAQPQRlwLY WWAEY9RZ97uEWUKAx5J4TKxSbq81c649t7NHvnhJqLWdHZ9WPhTIqo3AuLhrSqP8 ezszCZbiiu+AJIJzpcMeWFWowYo7/uFbrc/F5nz+2Xb9jp83n2yUYcp6S6XYIsZw I1lZqlQX6PQ6cRJKugxX5DkXy8VO2Xi+sm+YwE5Bj3NLTC70DgEIEpNDPch3g6mu uKChSuyjCj2emCCBbnkGK3azdnW4wbtbhF913puw/crb38DNlCi/XlTi =JN6W -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-5.12-20210406' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2021-04-06 this is a pull request of 1 patch for net/master. The patch is by me and fixes the SPI half duplex support in the mcp251x CAN driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-06 16:34:11 -07:00
Andy Shevchenko	a460513ed4	time64.h: Consolidated PSEC_PER_SEC definition We have currently three users of the PSEC_PER_SEC each of them defining it individually. Instead, move it to time64.h to be available for everyone. There is a new user coming with the same constant in use. It will also make its life easier. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-06 16:32:17 -07:00
Andy Shevchenko	3036ec035c	stmmac: intel: Drop duplicate ID in the list of PCI device IDs The PCI device IDs are defined with a prefix PCI_DEVICE_ID. There is no need to repeat the ID part at the end of each definition. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-06 16:31:18 -07:00
Guenter Roeck	66c3f05ddc	pcnet32: Use pci_resource_len to validate PCI resource pci_resource_start() is not a good indicator to determine if a PCI resource exists or not, since the resource may start at address 0. This is seen when trying to instantiate the driver in qemu for riscv32 or riscv64. pci 0000:00:01.0: reg 0x10: [io 0x0000-0x001f] pci 0000:00:01.0: reg 0x14: [mem 0x00000000-0x0000001f] ... pcnet32: card has no PCI IO resources, aborting Use pci_resouce_len() instead. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-06 16:30:17 -07:00
John Fastabend	144748eb0c	bpf, sockmap: Fix incorrect fwd_alloc accounting Incorrect accounting fwd_alloc can result in a warning when the socket is torn down, [18455.319240] WARNING: CPU: 0 PID: 24075 at net/core/stream.c:208 sk_stream_kill_queues+0x21f/0x230 [...] [18455.319543] Call Trace: [18455.319556] inet_csk_destroy_sock+0xba/0x1f0 [18455.319577] tcp_rcv_state_process+0x1b4e/0x2380 [18455.319593] ? lock_downgrade+0x3a0/0x3a0 [18455.319617] ? tcp_finish_connect+0x1e0/0x1e0 [18455.319631] ? sk_reset_timer+0x15/0x70 [18455.319646] ? tcp_schedule_loss_probe+0x1b2/0x240 [18455.319663] ? lock_release+0xb2/0x3f0 [18455.319676] ? __release_sock+0x8a/0x1b0 [18455.319690] ? lock_downgrade+0x3a0/0x3a0 [18455.319704] ? lock_release+0x3f0/0x3f0 [18455.319717] ? __tcp_close+0x2c6/0x790 [18455.319736] ? tcp_v4_do_rcv+0x168/0x370 [18455.319750] tcp_v4_do_rcv+0x168/0x370 [18455.319767] __release_sock+0xbc/0x1b0 [18455.319785] __tcp_close+0x2ee/0x790 [18455.319805] tcp_close+0x20/0x80 This currently happens because on redirect case we do skb_set_owner_r() with the original sock. This increments the fwd_alloc memory accounting on the original sock. Then on redirect we may push this into the queue of the psock we are redirecting to. When the skb is flushed from the queue we give the memory back to the original sock. The problem is if the original sock is destroyed/closed with skbs on another psocks queue then the original sock will not have a way to reclaim the memory before being destroyed. Then above warning will be thrown sockA sockB sk_psock_strp_read() sk_psock_verdict_apply() -- SK_REDIRECT -- sk_psock_skb_redirect() skb_queue_tail(psock_other->ingress_skb..) sk_close() sock_map_unref() sk_psock_put() sk_psock_drop() sk_psock_zap_ingress() At this point we have torn down our own psock, but have the outstanding skb in psock_other. Note that SK_PASS doesn't have this problem because the sk_psock_drop() logic releases the skb, its still associated with our psock. To resolve lets only account for sockets on the ingress queue that are still associated with the current socket. On the redirect case we will check memory limits per `6fa9201a89`, but will omit fwd_alloc accounting until skb is actually enqueued. When the skb is sent via skb_send_sock_locked or received with sk_psock_skb_ingress memory will be claimed on psock_other. Fixes: `6fa9201a89` ("bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/161731444013.68884.4021114312848535993.stgit@john-XPS-13-9370	2021-04-07 01:29:06 +02:00
John Fastabend	1c84b33101	bpf, sockmap: Fix sk->prot unhash op reset In '4da6a196f93b1' we fixed a potential unhash loop caused when a TLS socket in a sockmap was removed from the sockmap. This happened because the unhash operation on the TLS ctx continued to point at the sockmap implementation of unhash even though the psock has already been removed. The sockmap unhash handler when a psock is removed does the following, void sock_map_unhash(struct sock sk) { void (saved_unhash)(struct sock sk); struct sk_psock psock; rcu_read_lock(); psock = sk_psock(sk); if (unlikely(!psock)) { rcu_read_unlock(); if (sk->sk_prot->unhash) sk->sk_prot->unhash(sk); return; } [...] } The unlikely() case is there to handle the case where psock is detached but the proto ops have not been updated yet. But, in the above case with TLS and removed psock we never fixed sk_prot->unhash() and unhash() points back to sock_map_unhash resulting in a loop. To fix this we added this bit of code, static inline void sk_psock_restore_proto(struct sock sk, struct sk_psock psock) { sk->sk_prot->unhash = psock->saved_unhash; This will set the sk_prot->unhash back to its saved value. This is the correct callback for a TLS socket that has been removed from the sock_map. Unfortunately, this also overwrites the unhash pointer for all psocks. We effectively break sockmap unhash handling for any future socks. Omitting the unhash operation will leave stale entries in the map if a socket transition through unhash, but does not do close() op. To fix set unhash correctly before calling into tls_update. This way the TLS enabled socket will point to the saved unhash() handler. Fixes: `4da6a196f9` ("bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loop") Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Reported-by: Lorenz Bauer <lmb@cloudflare.com> Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/161731441904.68884.15593917809745631972.stgit@john-XPS-13-9370	2021-04-07 01:29:06 +02:00

... 3 4 5 6 7 ...

999697 Commits