linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-25 20:32:22 +00:00

Author	SHA1	Message	Date
Ian Abbott	647a6002cb	staging: comedi: cb_pcidas: Allow 2-channel commands for AO subdevice The "cb_pcidas" driver supports asynchronous commands on the analog output (AO) subdevice for those boards that have an AO FIFO. The code (in `cb_pcidas_ao_check_chanlist()` and `cb_pcidas_ao_cmd()`) to validate and set up the command supports output to a single channel or to two channels simultaneously (the boards have two AO channels). However, the code in `cb_pcidas_auto_attach()` that initializes the subdevices neglects to initialize the AO subdevice's `len_chanlist` member, leaving it set to 0, but the Comedi core will "correct" it to 1 if the driver neglected to set it. This limits commands to use a single channel (either channel 0 or 1), but the limit should be two channels. Set the AO subdevice's `len_chanlist` member to be the same value as the `n_chan` member, which will be 2. Cc: <stable@vger.kernel.org> Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Link: https://lore.kernel.org/r/20201021122142.81628-1-abbotti@mev.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-10-27 13:19:01 +01:00
Alexander Sverdlin	49d28ebdf1	staging: octeon: Drop on uncorrectable alignment or FCS error Currently in case of alignment or FCS error if the packet cannot be corrected it's still not dropped. Report the error properly and drop the packet while making the code around a little bit more readable. Fixes: `80ff0fd3ab` ("Staging: Add octeon-ethernet driver files.") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20201016145630.41852-1-alexander.sverdlin@nokia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-10-27 13:18:50 +01:00
Alexander Sverdlin	179f5dc36b	staging: octeon: repair "fixed-link" support The PHYs must be registered once in device probe function, not in device open callback because it's only possible to register them once. Fixes: `a25e278020` ("staging: octeon: support fixed-link phys") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20201016101858.11374-1-alexander.sverdlin@nokia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-10-27 13:18:34 +01:00
Mauro Carvalho Chehab	b52817e9de	drm: drm_print.h: fix kernel-doc markups A kernel-doc markup should start with the identifier on its first line. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/5b76c5625709aaaa3abee98faa620b9f3d27ff85.1603791716.git.mchehab+huawei@kernel.org	2020-10-27 11:21:39 +01:00
Mauro Carvalho Chehab	38a8b32f46	drm: kernel-doc: drm_dp_helper.h: fix a typo Right now, kernel-doc generates a warning: ./include/drm/drm_dp_helper.h:1786: warning: Function parameter or member 'hbr2_reset' not described in 'drm_dp_phy_test_params' This is due to a typo: @hb2_reset -> @hbr2_reset Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/2a615cb38e951215bb1bddc2481ad323c9cf3fc9.1603791716.git.mchehab+huawei@kernel.org	2020-10-27 11:21:27 +01:00
Mauro Carvalho Chehab	7811a339da	drm: kernel-doc: add description for a new function parameter As reported by "make htmldocs": ./drivers/gpu/drm/drm_prime.c:808: warning: Function parameter or member 'dev' not described in 'drm_prime_pages_to_sg' Add a description for the new parameter. Fixes: `707d561f77` ("drm: allow limiting the scatter list size.") Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/9366f48e6e9c3ec2f31a3e68452a2b23a1089fce.1603791716.git.mchehab+huawei@kernel.org	2020-10-27 11:21:04 +01:00
Mauro Carvalho Chehab	08989335e2	drm: drm_edid: remove a duplicated kernel-doc declaration It is not possible to create cross-references for duplicated symbols. While Sphinx always detected it, on Sphinx 3 it generates warnings like this: .../Documentation/gpu/drm-kms-helpers:326: ../drivers/gpu/drm/drm_edid.c:1626: WARNING: Duplicate C declaration, also defined in 'gpu/drm-kms-helpers'. Declaration is 'bool drm_edid_are_equal (const struct edid edid1, const struct edid edid2)'. So, get rid of the duplicated kernel-doc markup. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/9310f4074fa9d29cd3ad60684d86d0ace8dab7ae.1603791716.git.mchehab+huawei@kernel.org	2020-10-27 11:20:55 +01:00
Mauro Carvalho Chehab	8d7d8c0afb	drm/dp: fix a kernel-doc issue at drm_edid.c The name of the argument is different, causing those warnings: ./drivers/gpu/drm/drm_edid.c:3754: warning: Function parameter or member 'video_code' not described in 'drm_display_mode_from_cea_vic' ./drivers/gpu/drm/drm_edid.c:3754: warning: Excess function parameter 'vic' description in 'drm_display_mode_from_cea_vic' Fixes: `7af655bce2` ("drm/dp: Add drm_dp_downstream_mode()") Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/7f4d6c3ff6df63ebd006eb90a5108006c23e2168.1603791716.git.mchehab+huawei@kernel.org	2020-10-27 11:20:42 +01:00
Mauro Carvalho Chehab	21a53bbd46	drm/dp: fix kernel-doc warnings at drm_dp_helper.c As warned by kernel-doc: ./drivers/gpu/drm/drm_dp_helper.c:385: warning: Function parameter or member 'type' not described in 'drm_dp_downstream_is_type' ./drivers/gpu/drm/drm_dp_helper.c:886: warning: Function parameter or member 'dev' not described in 'drm_dp_downstream_mode' Some function parameters weren't documented. Fixes: `38784f6f88` ("drm/dp: Add helpers to identify downstream facing port types") Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/03c9c8ba3f492aca76e2b4836803219cd9c971cf.1603791716.git.mchehab+huawei@kernel.org	2020-10-27 11:20:36 +01:00
Mauro Carvalho Chehab	444d03badc	drm: kernel-doc: document drm_dp_set_subconnector_property() params Changeset `e5b9277328` ("drm: report dp downstream port type as a subconnector property") added a new function to the kAPI, but didn't add any documentation for the parameters for drm_dp_set_subconnector_property(). Fixes: `e5b9277328` ("drm: report dp downstream port type as a subconnector property") Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/0870be85a77bea4ba5cf1715010834289a4e10b1.1603791716.git.mchehab+huawei@kernel.org	2020-10-27 11:20:21 +01:00
Arnd Bergmann	e5a3297904	i.MX fixes for 5.10, 2nd round: A couple of more defconfig fixes to enable CONFIG_GPIO_MXC option for i.MX ARMv4/v5 devices. -----BEGIN PGP SIGNATURE----- iQFIBAABCgAyFiEEFmJXigPl4LoGSz08UFdYWoewfM4FAl+XYHEUHHNoYXduZ3Vv QGtlcm5lbC5vcmcACgkQUFdYWoewfM5jPwf+KkUIwf0+SPGhJGA98i9tuMFAK8ts Xv5ml0sogJ710wjY8KqLOXBWZGbxB7hHNvlYdq2IHoDRGA1tN+qKidyR0eV+THy/ D8VOnefX6kKC3ONCP/9b2s18PZemcTcsCyyaGAT4Vi7hIiI1ALIGbx/bGVLrbLiD JYSUM2YbsEfGTy/9fWhUaVlK+IbB6zXpVSvrY0/3gmbzJ/67vthTkuHdhlEqGmqe Z2heEToUNOLKCB47m9OwKlXcHp6F2OxdT/i4pRTWYasccETIN3F7/VIfOud+Csyu D6G2OfT6Zh9GS0WE6IoULMS/4QmWVkSymtTjCv+zKkJKdO/x71jy++n93w== =Qcq+ -----END PGP SIGNATURE----- Merge tag 'imx-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 5.10, 2nd round: A couple of more defconfig fixes to enable CONFIG_GPIO_MXC option for i.MX ARMv4/v5 devices. * tag 'imx-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: multi_v5_defconfig: Select CONFIG_GPIO_MXC ARM: imx_v4_v5_defconfig: Select CONFIG_GPIO_MXC Link: https://lore.kernel.org/r/20201026235230.GL9880@dragon Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2020-10-27 10:45:45 +01:00
Arnd Bergmann	91caef27a1	Amlogic fixes for v5.10-rc1 - misc DT only fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEe4dGDhaSf6n1v/EMWTcYmtP7xmUFAl+XNX4ACgkQWTcYmtP7 xmVoHw/9FkmZXDbAfr8vZzD+4OGAmMABwzTDzbn2/1eBBXi4yKJLYnX7IwDYpdiG qL7sHBjauCCJXAuDvnmPyTaXBVmHlyrwnsjs3zDGCJ93GzlzewCmJCsam3KCw9E7 CFKJEYZmijcBLHbm98Ia98dUj2NeYnmO7kZP402C/gKHZusC2A6q9PPLVsTJ3asS hRZqN2DfSGytkWHBIlwaEg8j67nWKwY2dqXVF3nx5+PXakS4KbNWeVXPy2/2xE1Q W18AsHvDp/3S1/PcxCPiIZeDkOgOdS7eBQruFFm4oZtAMGRrIhg6YH6qIuHkuRqY qXc7ztoZaLmulsCgdSUqPX0RDztQLrXUuvP8xnbY/NmMtAhxpmAOpgEvqEEiBOq8 Z1t1RQetrIO3ktohqq+UrSRgyzcoGQgy+ghFH6g+Wd6gqixc9/tdFSvKtGNf9CsB YajmrsBWHmZKeuEBASgNUgBVIrMM2uAtcOwGbzfuGLDqkA077MJW/iLfNTt7t6Lx zE5hmp06NV/P5Uiv9BU/sbSfEwUVPou+XFZ/0IZ4G6pLgB8wkL53gZm3QoghQBcZ 9tT0I/RA1s2hOVYevXbSzAKOquD9FWTSHXGiy58sL7zP9murxbUkeA/2nOmR0ZM5 iD+G9+rXJKdNl+k4Il51Xbm1UliuCln/VT9BZj6A3cv6Wk71dUo= =Z+jV -----END PGP SIGNATURE----- Merge tag 'amlogic-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic into arm/fixes Amlogic fixes for v5.10-rc1 - misc DT only fixes * tag 'amlogic-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic: arm64: dts: meson: odroid-n2 plus: fix vddcpu_a pwm ARM: dts: meson8: remove two invalid interrupt lines from the GPU node arm64: dts: amlogic: add missing ethernet reset ID arm64: dts: amlogic: meson-g12: use the G12A specific dwmac compatible arm64: dts: meson: add missing g12 rng clock arm64: dts: meson-axg-s400: enable USB OTG arm64: dts: meson-axg: add USB nodes Link: https://lore.kernel.org/r/7hlffshfu3.fsf@baylibre.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2020-10-27 10:44:11 +01:00
Chaitanya Kulkarni	3c3751f2da	nvmet: fix a NULL pointer dereference when tracing the flush command When target side trace in turned on and flush command is issued from the host it results in the following Oops. [ 856.789724] BUG: kernel NULL pointer dereference, address: 0000000000000068 [ 856.790686] #PF: supervisor read access in kernel mode [ 856.791262] #PF: error_code(0x0000) - not-present page [ 856.791863] PGD 6d7110067 P4D 6d7110067 PUD 66f0ad067 PMD 0 [ 856.792527] Oops: 0000 [#1] SMP NOPTI [ 856.792950] CPU: 15 PID: 7034 Comm: nvme Tainted: G OE 5.9.0nvme-5.9+ #71 [ 856.793790] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e3214 [ 856.794956] RIP: 0010:trace_event_raw_event_nvmet_req_init+0x13e/0x170 [nvmet] [ 856.795734] Code: 41 5c 41 5d c3 31 d2 31 f6 e8 4e 9b b8 e0 e9 0e ff ff ff 49 8b 55 00 48 8b 38 8b 0 [ 856.797740] RSP: 0018:ffffc90001be3a60 EFLAGS: 00010246 [ 856.798375] RAX: 0000000000000000 RBX: ffff8887e7d2c01c RCX: 0000000000000000 [ 856.799234] RDX: 0000000000000020 RSI: 0000000057e70ea2 RDI: ffff8887e7d2c034 [ 856.800088] RBP: ffff88869f710578 R08: ffff888807500d40 R09: 00000000fffffffe [ 856.800951] R10: 0000000064c66670 R11: 00000000ef955201 R12: ffff8887e7d2c034 [ 856.801807] R13: ffff88869f7105c8 R14: 0000000000000040 R15: ffff88869f710440 [ 856.802667] FS: 00007f6a22bd8780(0000) GS:ffff888813a00000(0000) knlGS:0000000000000000 [ 856.803635] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 856.804367] CR2: 0000000000000068 CR3: 00000006d73e0000 CR4: 00000000003506e0 [ 856.805283] Call Trace: [ 856.805613] nvmet_req_init+0x27c/0x480 [nvmet] [ 856.806200] nvme_loop_queue_rq+0xcb/0x1d0 [nvme_loop] [ 856.806862] blk_mq_dispatch_rq_list+0x123/0x7b0 [ 856.807459] ? kvm_sched_clock_read+0x14/0x30 [ 856.808025] __blk_mq_sched_dispatch_requests+0xc7/0x170 [ 856.808708] blk_mq_sched_dispatch_requests+0x30/0x60 [ 856.809372] __blk_mq_run_hw_queue+0x70/0x100 [ 856.809935] __blk_mq_delay_run_hw_queue+0x156/0x170 [ 856.810574] blk_mq_run_hw_queue+0x86/0xe0 [ 856.811104] blk_mq_sched_insert_request+0xef/0x160 [ 856.811733] blk_execute_rq+0x69/0xc0 [ 856.812212] ? blk_mq_rq_ctx_init+0xd0/0x230 [ 856.812784] nvme_execute_passthru_rq+0x57/0x130 [nvme_core] [ 856.813461] nvme_submit_user_cmd+0xeb/0x300 [nvme_core] [ 856.814099] nvme_user_cmd.isra.82+0x11e/0x1a0 [nvme_core] [ 856.814752] blkdev_ioctl+0x1dc/0x2c0 [ 856.815197] block_ioctl+0x3f/0x50 [ 856.815606] __x64_sys_ioctl+0x84/0xc0 [ 856.816074] do_syscall_64+0x33/0x40 [ 856.816533] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 856.817168] RIP: 0033:0x7f6a222ed107 [ 856.817617] Code: 44 00 00 48 8b 05 81 cd 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 8 [ 856.819901] RSP: 002b:00007ffca848f058 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 856.820846] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f6a222ed107 [ 856.821726] RDX: 00007ffca848f060 RSI: 00000000c0484e43 RDI: 0000000000000003 [ 856.822603] RBP: 0000000000000003 R08: 000000000000003f R09: 0000000000000005 [ 856.823478] R10: 00007ffca848ece0 R11: 0000000000000202 R12: 00007ffca84912d3 [ 856.824359] R13: 00007ffca848f4d0 R14: 0000000000000002 R15: 000000000067e900 [ 856.825236] Modules linked in: nvme_loop(OE) nvmet(OE) nvme_fabrics(OE) null_blk nvme(OE) nvme_corel Move the nvmet_req_init() tracepoint after we parse the command in nvmet_req_init() so that we can get rid of the duplicate nvmet_find_namespace() call. Rename __assign_disk_name() -> __assign_req_name(). Now that we call tracepoint after parsing the command simplify the newly added __assign_req_name() which fixes this bug. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-10-27 10:02:50 +01:00
James Smart	ac9b820e71	nvme-fc: remove nvme_fc_terminate_io() __nvme_fc_terminate_io() is now called by only 1 place, in reset_work. Consoldate and move the functionality of terminate_io into reset_work. In reset_work, rather than calling the create_association directly, schedule the connect work element to do its thing. After scheduling, flush the connect work element to continue with semantic of not returning until connect has been attempted at least once. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-10-27 10:02:29 +01:00
James Smart	95ced8a2c7	nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery nvme_fc_error_recovery() special cases handling when in CONNECTING state and calls __nvme_fc_terminate_io(). __nvme_fc_terminate_io() itself special cases CONNECTING state and calls the routine to abort outstanding ios. Simplify the sequence by putting the call to abort outstanding I/Os directly in nvme_fc_error_recovery. Move the location of __nvme_fc_abort_outstanding_ios(), and nvme_fc_terminate_exchange() which is called by it, to avoid adding function prototypes for nvme_fc_error_recovery(). Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-10-27 10:02:08 +01:00
James Smart	9c2bb2577d	nvme-fc: remove err_work work item err_work was created to handle errors (mainly I/O timeouts) while in CONNECTING state. The flag for err_work_active is also unneeded. Remove err_work_active and err_work. The actions to abort I/Os are moved inline to nvme_error_recovery(). Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-10-27 10:01:39 +01:00
James Smart	caf1cbe367	nvme-fc: track error_recovery while connecting Whenever there are errors during CONNECTING, the driver recovers by aborting all outstanding ios and counts on the io completion to fail them and thus the connection/association they are on. However, the connection failure depends on a failure state from the core routines. Not all commands that are issued by the core routine are guaranteed to cause a failure of the core routine. They may be treated as a failure status and the status is then ignored. As such, whenever the transport enters error_recovery while CONNECTING, it will set a new flag indicating an association failed. The create_association routine which creates and initializes the controller, will monitor the state of the flag as well as the core routine error status and ensure the association fails if there was an error. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-10-27 10:01:30 +01:00
zhenwei pi	25c1ca6eca	nvme-rdma: handle unexpected nvme completion data length Receiving a zero length message leads to the following warnings because the CQE is processed twice: refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 RIP: 0010:refcount_warn_saturate+0xd9/0xe0 Call Trace: <IRQ> nvme_rdma_recv_done+0xf3/0x280 [nvme_rdma] __ib_process_cq+0x76/0x150 [ib_core] ... Sanity check the received data length, to avoids this. Thanks to Chao Leng & Sagi for suggestions. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-10-27 10:00:05 +01:00
Keith Busch	8685699c28	nvme: ignore zone validate errors on subsequent scans Revalidating nvme zoned namespaces requires IO commands, and there are controller states that prevent IO. For example, a sanitize in progress is required to fail all IO, but we don't want to remove a namespace we've previously added just because the controller is in such a state. Suppress the error in this case. Reported-by: Michael Nguyen <michael.nguyen@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-10-27 09:58:42 +01:00
Zenghui Yu	e3364c5ff3	net: hns3: Clear the CMDQ registers before unmapping BAR region When unbinding the hns3 driver with the HNS3 VF, I got the following kernel panic: [ 265.709989] Unable to handle kernel paging request at virtual address ffff800054627000 [ 265.717928] Mem abort info: [ 265.720740] ESR = 0x96000047 [ 265.723810] EC = 0x25: DABT (current EL), IL = 32 bits [ 265.729126] SET = 0, FnV = 0 [ 265.732195] EA = 0, S1PTW = 0 [ 265.735351] Data abort info: [ 265.738227] ISV = 0, ISS = 0x00000047 [ 265.742071] CM = 0, WnR = 1 [ 265.745055] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000009b54000 [ 265.751753] [ffff800054627000] pgd=0000202ffffff003, p4d=0000202ffffff003, pud=00002020020eb003, pmd=00000020a0dfc003, pte=0000000000000000 [ 265.764314] Internal error: Oops: 96000047 [#1] SMP [ 265.830357] CPU: 61 PID: 20319 Comm: bash Not tainted 5.9.0+ #206 [ 265.836423] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 1.05 09/18/2019 [ 265.843873] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--) [ 265.843890] pc : hclgevf_cmd_uninit+0xbc/0x300 [ 265.861988] lr : hclgevf_cmd_uninit+0xb0/0x300 [ 265.861992] sp : ffff80004c983b50 [ 265.881411] pmr_save: 000000e0 [ 265.884453] x29: ffff80004c983b50 x28: ffff20280bbce500 [ 265.889744] x27: 0000000000000000 x26: 0000000000000000 [ 265.895034] x25: ffff800011a1f000 x24: ffff800011a1fe90 [ 265.900325] x23: ffff0020ce9b00d8 x22: ffff0020ce9b0150 [ 265.905616] x21: ffff800010d70e90 x20: ffff800010d70e90 [ 265.910906] x19: ffff0020ce9b0080 x18: 0000000000000004 [ 265.916198] x17: 0000000000000000 x16: ffff800011ae32e8 [ 265.916201] x15: 0000000000000028 x14: 0000000000000002 [ 265.916204] x13: ffff800011ae32e8 x12: 0000000000012ad8 [ 265.946619] x11: ffff80004c983b50 x10: 0000000000000000 [ 265.951911] x9 : ffff8000115d0888 x8 : 0000000000000000 [ 265.951914] x7 : ffff800011890b20 x6 : c0000000ffff7fff [ 265.951917] x5 : ffff80004c983930 x4 : 0000000000000001 [ 265.951919] x3 : ffffa027eec1b000 x2 : 2b78ccbbff369100 [ 265.964487] x1 : 0000000000000000 x0 : ffff800054627000 [ 265.964491] Call trace: [ 265.964494] hclgevf_cmd_uninit+0xbc/0x300 [ 265.964496] hclgevf_uninit_ae_dev+0x9c/0xe8 [ 265.964501] hnae3_unregister_ae_dev+0xb0/0x130 [ 265.964516] hns3_remove+0x34/0x88 [hns3] [ 266.009683] pci_device_remove+0x48/0xf0 [ 266.009692] device_release_driver_internal+0x114/0x1e8 [ 266.030058] device_driver_detach+0x28/0x38 [ 266.034224] unbind_store+0xd4/0x108 [ 266.037784] drv_attr_store+0x40/0x58 [ 266.041435] sysfs_kf_write+0x54/0x80 [ 266.045081] kernfs_fop_write+0x12c/0x250 [ 266.049076] vfs_write+0xc4/0x248 [ 266.052378] ksys_write+0x74/0xf8 [ 266.055677] __arm64_sys_write+0x24/0x30 [ 266.059584] el0_svc_common.constprop.3+0x84/0x270 [ 266.064354] do_el0_svc+0x34/0xa0 [ 266.067658] el0_svc+0x38/0x40 [ 266.070700] el0_sync_handler+0x8c/0xb0 [ 266.074519] el0_sync+0x140/0x180 It looks like the BAR memory region had already been unmapped before we start clearing CMDQ registers in it, which is pretty bad and the kernel happily kills itself because of a Current EL Data Abort (on arm64). Moving the CMDQ uninitialization a bit early fixes the issue for me. Fixes: `862d969a3a` ("net: hns3: do VF's pci re-initialization while PF doing FLR") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20201023051550.793-1-yuzenghui@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 20:25:04 -07:00
Jakub Kicinski	10067b5019	Merge branch 'bnxt_en-bug-fixes' Michael Chan says: ==================== bnxt_en: Bug fixes. These 5 bug fixes are all related to the firmware reset or AER recovery. 2 patches fix the cleanup logic for the workqueue used to handle firmware reset and recovery. 1 patch ensures that the chip will have the proper BAR addresses latched after fatal AER recovery. 1 patch fixes the open path to check for firmware reset abort error. The last one sends the fw reset command unconditionally to fix the AER reset logic. ==================== Link: https://lore.kernel.org/r/1603685901-17917-1-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 18:26:38 -07:00
Vasundhara Volam	825741b071	bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally. In the AER or firmware reset flow, if we are in fatal error state or if pci_channel_offline() is true, we don't send any commands to the firmware because the commands will likely not reach the firmware and most commands don't matter much because the firmware is likely to be reset imminently. However, the HWRM_FUNC_RESET command is different and we should always attempt to send it. In the AER flow for example, the .slot_reset() call will trigger this fw command and we need to try to send it to effect the proper reset. Fixes: `b340dc680e` ("bnxt_en: Avoid sending firmware messages when AER error is detected.") Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 18:26:35 -07:00
Michael Chan	a1301f08c5	bnxt_en: Check abort error state in bnxt_open_nic(). bnxt_open_nic() is called during configuration changes that require the NIC to be closed and then opened. This call is protected by rtnl_lock. Firmware reset can be happening at the same time. Only critical portions of the entire firmware reset sequence are protected by the rtnl_lock. It is possible that bnxt_open_nic() can be called when the firmware reset sequence is aborting. In that case, bnxt_open_nic() needs to check if the ABORT_ERR flag is set and abort if it is. The configuration change that resulted in the bnxt_open_nic() call will fail but the NIC will be brought to a consistent IF_DOWN state. Without this patch, if bnxt_open_nic() were to continue in this error state, it may crash like this: [ 1648.659736] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1648.659768] IP: [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.659796] PGD 101e1b3067 PUD 101e1b2067 PMD 0 [ 1648.659813] Oops: 0000 [#1] SMP [ 1648.659825] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dell_smbios dell_wmi_descriptor dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper vfat cryptd fat pcspkr ipmi_ssif sg k10temp i2c_piix4 wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm libahci megaraid_sas crct10dif_pclmul crct10dif_common [ 1648.660063] tg3 libata crc32c_intel bnxt_en(OE) drm_panel_orientation_quirks devlink ptp pps_core dm_mirror dm_region_hash dm_log dm_mod fuse [ 1648.660105] CPU: 13 PID: 3867 Comm: ethtool Kdump: loaded Tainted: G OE ------------ 3.10.0-1152.el7.x86_64 #1 [ 1648.660911] Hardware name: Dell Inc. PowerEdge R7515/0R4CNN, BIOS 1.2.14 01/28/2020 [ 1648.661662] task: ffff94e64cbc9080 ti: ffff94f55df1c000 task.ti: ffff94f55df1c000 [ 1648.662409] RIP: 0010:[<ffffffffc01e9b3a>] [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.663171] RSP: 0018:ffff94f55df1fba8 EFLAGS: 00010202 [ 1648.663927] RAX: 0000000000000000 RBX: ffff94e6827e0000 RCX: 0000000000000000 [ 1648.664684] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94e6827e08c0 [ 1648.665433] RBP: ffff94f55df1fc20 R08: 00000000000001ff R09: 0000000000000008 [ 1648.666184] R10: 0000000000000d53 R11: ffff94f55df1f7ce R12: ffff94e6827e08c0 [ 1648.666940] R13: ffff94e6827e08c0 R14: ffff94e6827e08c0 R15: ffffffffb9115e40 [ 1648.667695] FS: 00007f8aadba5740(0000) GS:ffff94f57eb40000(0000) knlGS:0000000000000000 [ 1648.668447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1648.669202] CR2: 0000000000000000 CR3: 0000001022772000 CR4: 0000000000340fe0 [ 1648.669966] Call Trace: [ 1648.670730] [<ffffffffc01f1d5d>] ? bnxt_need_reserve_rings+0x9d/0x170 [bnxt_en] [ 1648.671496] [<ffffffffc01fa7ea>] __bnxt_open_nic+0x8a/0x9a0 [bnxt_en] [ 1648.672263] [<ffffffffc01f7479>] ? bnxt_close_nic+0x59/0x1b0 [bnxt_en] [ 1648.673031] [<ffffffffc01fb11b>] bnxt_open_nic+0x1b/0x50 [bnxt_en] [ 1648.673793] [<ffffffffc020037c>] bnxt_set_ringparam+0x6c/0xa0 [bnxt_en] [ 1648.674550] [<ffffffffb8a5f564>] dev_ethtool+0x1334/0x21a0 [ 1648.675306] [<ffffffffb8a719ff>] dev_ioctl+0x1ef/0x5f0 [ 1648.676061] [<ffffffffb8a324bd>] sock_do_ioctl+0x4d/0x60 [ 1648.676810] [<ffffffffb8a326bb>] sock_ioctl+0x1eb/0x2d0 [ 1648.677548] [<ffffffffb8663230>] do_vfs_ioctl+0x3a0/0x5b0 [ 1648.678282] [<ffffffffb8b8e678>] ? __do_page_fault+0x238/0x500 [ 1648.679016] [<ffffffffb86634e1>] SyS_ioctl+0xa1/0xc0 [ 1648.679745] [<ffffffffb8b93f92>] system_call_fastpath+0x25/0x2a [ 1648.680461] Code: 9e 60 01 00 00 0f 1f 40 00 45 8b 8e 48 01 00 00 31 c9 45 85 c9 0f 8e 73 01 00 00 66 0f 1f 44 00 00 49 8b 86 a8 00 00 00 48 63 d1 <48> 8b 14 d0 48 85 d2 0f 84 46 01 00 00 41 8b 86 44 01 00 00 c7 [ 1648.681986] RIP [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.682724] RSP <ffff94f55df1fba8> [ 1648.683451] CR2: 0000000000000000 Fixes: `ec5d31e3c1` ("bnxt_en: Handle firmware reset status during IF_UP.") Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 18:26:35 -07:00
Vasundhara Volam	f75d9a0aa9	bnxt_en: Re-write PCI BARs after PCI fatal error. When a PCIe fatal error occurs, the internal latched BAR addresses in the chip get reset even though the BAR register values in config space are retained. pci_restore_state() will not rewrite the BAR addresses if the BAR address values are valid, causing the chip's internal BAR addresses to stay invalid. So we need to zero the BAR registers during PCIe fatal error to force pci_restore_state() to restore the BAR addresses. These write cycles to the BAR registers will cause the proper BAR addresses to latch internally. Fixes: `6316ea6db9` ("bnxt_en: Enable AER support.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 18:26:35 -07:00
Vasundhara Volam	631ce27a30	bnxt_en: Invoke cancel_delayed_work_sync() for PFs also. As part of the commit `b148bb238c` ("bnxt_en: Fix possible crash in bnxt_fw_reset_task()."), cancel_delayed_work_sync() is called only for VFs to fix a possible crash by cancelling any pending delayed work items. It was assumed by mistake that the flush_workqueue() call on the PF would flush delayed work items as well. As flush_workqueue() does not cancel the delayed workqueue, extend the fix for PFs. This fix will avoid the system crash, if there are any pending delayed work items in fw_reset_task() during driver's .remove() call. Unify the workqueue cleanup logic for both PF and VF by calling cancel_work_sync() and cancel_delayed_work_sync() directly in bnxt_remove_one(). Fixes: `b148bb238c` ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 18:26:35 -07:00
Vasundhara Volam	21d6a11e2c	bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one(). A recent patch has moved the workqueue cleanup logic before calling unregister_netdev() in bnxt_remove_one(). This caused a regression because the workqueue can be restarted if the device is still open. Workqueue cleanup must be done after unregister_netdev(). The workqueue will not restart itself after the device is closed. Call bnxt_cancel_sp_work() after unregister_netdev() and call bnxt_dl_fw_reporters_destroy() after that. This fixes the regession and the original NULL ptr dereference issue. Fixes: `b16939b59c` ("bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task()") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 18:26:35 -07:00
Jakub Kicinski	19c176eb07	Merge branch 'mlxsw-various-fixes' Ido Schimmel says: ==================== mlxsw: Various fixes This patch set contains various fixes for mlxsw. Patch #1 ensures that only link modes that are supported by both the device and the driver are advertised. When a link mode that is not supported by the driver is negotiated by the device, it will be presented as an unknown speed by ethtool, causing the bond driver to wrongly assume that the link is down. Patch #2 fixes a trivial memory leak upon module removal. Patch #3 fixes a use-after-free that syzkaller was able to trigger once on a slow emulator after a few months of fuzzing. ==================== Link: https://lore.kernel.org/r/20201024133733.2107509-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 16:45:53 -07:00
Amit Cohen	0daf2bf5a2	mlxsw: core: Fix use-after-free in mlxsw_emad_trans_finish() Each EMAD transaction stores the skb used to issue the EMAD request ('trans->tx_skb') so that the request could be retried in case of a timeout. The skb can be freed when a corresponding response is received or as part of the retry logic (e.g., failed retransmit, exceeded maximum number of retries). The two tasks (i.e., response processing and retransmits) are synchronized by the atomic 'trans->active' field which ensures that responses to inactive transactions are ignored. In case of a failed retransmit the transaction is finished and all of its resources are freed. However, the current code does not mark it as inactive. Syzkaller was able to hit a race condition in which a concurrent response is processed while the transaction's resources are being freed, resulting in a use-after-free [1]. Fix the issue by making sure to mark the transaction as inactive after a failed retransmit and free its resources only if a concurrent task did not already do that. [1] BUG: KASAN: use-after-free in consume_skb+0x30/0x370 net/core/skbuff.c:833 Read of size 4 at addr ffff88804f570494 by task syz-executor.0/1004 CPU: 0 PID: 1004 Comm: syz-executor.0 Not tainted 5.8.0-rc7+ #68 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xf6/0x16e lib/dump_stack.c:118 print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 check_memory_region_inline mm/kasan/generic.c:186 [inline] check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192 instrument_atomic_read include/linux/instrumented.h:56 [inline] atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] refcount_read include/linux/refcount.h:147 [inline] skb_unref include/linux/skbuff.h:1044 [inline] consume_skb+0x30/0x370 net/core/skbuff.c:833 mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592 mlxsw_emad_process_response drivers/net/ethernet/mellanox/mlxsw/core.c:651 [inline] mlxsw_emad_rx_listener_func+0x5c9/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:672 mlxsw_core_skb_receive+0x4df/0x770 drivers/net/ethernet/mellanox/mlxsw/core.c:2063 mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline] mlxsw_pci_cq_tasklet+0x12a6/0x2520 drivers/net/ethernet/mellanox/mlxsw/pci.c:651 tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550 __do_softirq+0x223/0x964 kernel/softirq.c:292 asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711 Allocated by task 1006: save_stack+0x1b/0x40 mm/kasan/common.c:48 set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc mm/kasan/common.c:494 [inline] __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467 slab_post_alloc_hook mm/slab.h:586 [inline] slab_alloc_node mm/slub.c:2824 [inline] slab_alloc mm/slub.c:2832 [inline] kmem_cache_alloc+0xcd/0x2e0 mm/slub.c:2837 __build_skb+0x21/0x60 net/core/skbuff.c:311 __netdev_alloc_skb+0x1e2/0x360 net/core/skbuff.c:464 netdev_alloc_skb include/linux/skbuff.h:2810 [inline] mlxsw_emad_alloc drivers/net/ethernet/mellanox/mlxsw/core.c:756 [inline] mlxsw_emad_reg_access drivers/net/ethernet/mellanox/mlxsw/core.c:787 [inline] mlxsw_core_reg_access_emad+0x1ab/0x1420 drivers/net/ethernet/mellanox/mlxsw/core.c:1817 mlxsw_reg_trans_query+0x39/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:1831 mlxsw_sp_sb_pm_occ_clear drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:260 [inline] mlxsw_sp_sb_occ_max_clear+0xbff/0x10a0 drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:1365 mlxsw_devlink_sb_occ_max_clear+0x76/0xb0 drivers/net/ethernet/mellanox/mlxsw/core.c:1037 devlink_nl_cmd_sb_occ_max_clear_doit+0x1ec/0x280 net/core/devlink.c:1765 genl_family_rcv_msg_doit net/netlink/genetlink.c:669 [inline] genl_family_rcv_msg net/netlink/genetlink.c:714 [inline] genl_rcv_msg+0x617/0x980 net/netlink/genetlink.c:731 netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2470 genl_rcv+0x24/0x40 net/netlink/genetlink.c:742 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0x150/0x190 net/socket.c:671 ____sys_sendmsg+0x6d8/0x840 net/socket.c:2359 ___sys_sendmsg+0xff/0x170 net/socket.c:2413 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2446 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 73: save_stack+0x1b/0x40 mm/kasan/common.c:48 set_track mm/kasan/common.c:56 [inline] kasan_set_free_info mm/kasan/common.c:316 [inline] __kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455 slab_free_hook mm/slub.c:1474 [inline] slab_free_freelist_hook mm/slub.c:1507 [inline] slab_free mm/slub.c:3072 [inline] kmem_cache_free+0xbe/0x380 mm/slub.c:3088 kfree_skbmem net/core/skbuff.c:622 [inline] kfree_skbmem+0xef/0x1b0 net/core/skbuff.c:616 __kfree_skb net/core/skbuff.c:679 [inline] consume_skb net/core/skbuff.c:837 [inline] consume_skb+0xe1/0x370 net/core/skbuff.c:831 mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592 mlxsw_emad_transmit_retry.isra.0+0x9d/0xc0 drivers/net/ethernet/mellanox/mlxsw/core.c:613 mlxsw_emad_trans_timeout_work+0x43/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:625 process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269 worker_thread+0x9e/0x1050 kernel/workqueue.c:2415 kthread+0x355/0x470 kernel/kthread.c:291 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293 The buggy address belongs to the object at ffff88804f5703c0 which belongs to the cache skbuff_head_cache of size 224 The buggy address is located 212 bytes inside of 224-byte region [ffff88804f5703c0, ffff88804f5704a0) The buggy address belongs to the page: page:ffffea00013d5c00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x100000000000200(slab) raw: 0100000000000200 dead000000000100 dead000000000122 ffff88806c625400 raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88804f570380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff88804f570400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88804f570480: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff88804f570500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88804f570580: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc Fixes: `caf7297e7a` ("mlxsw: core: Introduce support for asynchronous EMAD register access") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 16:45:50 -07:00
Ido Schimmel	adc80b6cfe	mlxsw: core: Fix memory leak on module removal Free the devlink instance during the teardown sequence in the non-reload case to avoid the following memory leak. unreferenced object 0xffff888232895000 (size 2048): comm "modprobe", pid 1073, jiffies 4295568857 (age 164.871s) hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de ........"....... 10 50 89 32 82 88 ff ff 10 50 89 32 82 88 ff ff .P.2.....P.2.... backtrace: [<00000000c704e9a6>] __kmalloc+0x13a/0x2a0 [<00000000ee30129d>] devlink_alloc+0xff/0x760 [<0000000092ab3e5d>] 0xffffffffa042e5b0 [<000000004f3f8a31>] 0xffffffffa042f6ad [<0000000092800b4b>] 0xffffffffa0491df3 [<00000000c4843903>] local_pci_probe+0xcb/0x170 [<000000006993ded7>] pci_device_probe+0x2c2/0x4e0 [<00000000a8e0de75>] really_probe+0x2c5/0xf90 [<00000000d42ba75d>] driver_probe_device+0x1eb/0x340 [<00000000bcc95e05>] device_driver_attach+0x294/0x300 [<000000000e2bc177>] __driver_attach+0x167/0x2f0 [<000000007d44cd6e>] bus_for_each_dev+0x148/0x1f0 [<000000003cd5a91e>] driver_attach+0x45/0x60 [<000000000041ce51>] bus_add_driver+0x3b8/0x720 [<00000000f5215476>] driver_register+0x230/0x4e0 [<00000000d79356f5>] __pci_register_driver+0x190/0x200 Fixes: `a22712a962` ("mlxsw: core: Fix devlink unregister flow") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Vadim Pasternak <vadimp@nvidia.com> Tested-by: Oleksandr Shamray <oleksandrs@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 16:45:50 -07:00
Amit Cohen	1601559be3	mlxsw: Only advertise link modes supported by both driver and device During port creation the driver instructs the device to advertise all the supported link modes queried from the device. Since cited commit not all the link modes supported by the device are supported by the driver. This can result in the device negotiating a link mode that is not recognized by the driver causing ethtool to show an unsupported speed: $ ethtool swp1 ... Speed: Unknown! This is especially problematic when the netdev is enslaved to a bond, as the bond driver uses unknown speed as an indication that the link is down: [13048.900895] net_ratelimit: 86 callbacks suppressed [13048.900902] t_bond0: (slave swp52): failed to get link speed/duplex [13048.912160] t_bond0: (slave swp49): failed to get link speed/duplex Fix this by making sure that only link modes that are supported by both the device and the driver are advertised. Fixes: `b97cd89126` ("mlxsw: Remove 56G speed support") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 16:45:50 -07:00
Fabio Estevam	ccee91b568	ARM: multi_v5_defconfig: Select CONFIG_GPIO_MXC Since commit `12d16b397c` ("gpio: mxc: Support module build") the CONFIG_GPIO_MXC option needs to be explicitly selected. Select it to avoid boot issues on imx25/imx27 due to the lack of the GPIO driver. Signed-off-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>	2020-10-27 07:45:43 +08:00
Fabio Estevam	24cb909646	ARM: imx_v4_v5_defconfig: Select CONFIG_GPIO_MXC Since commit `12d16b397c` ("gpio: mxc: Support module build") the CONFIG_GPIO_MXC option needs to be explicitly selected. Select it to avoid boot issues on imx25/imx27 due to the lack of the GPIO driver. Signed-off-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>	2020-10-27 07:45:43 +08:00
Jakub Kicinski	522ee51e67	Merge branch 'net-smc-fixes-2020-10-23' Karsten Graul says: ==================== net/smc: fixes 2020-10-23 Patch 1 fixes a potential null pointer dereference. Patch 2 takes care of a suppressed return code and patch 3 corrects the system EID in the ISM driver. ==================== Link: https://lore.kernel.org/r/20201023184830.59548-1-kgraul@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 16:29:38 -07:00
Karsten Graul	1dc0d1cf6f	s390/ism: fix incorrect system EID The system EID that is defined by the ISM driver is not correct. Using an incorrect system EID allows to communicate with remote Linux systems that use the same incorrect system EID, but when it comes to interoperability with other operating systems then the system EIDs do never match which prevents SMC-Dv2 communication. Using the correct system EID fixes this problem. Fixes: `201091ebb2` ("net/smc: introduce System Enterprise ID (SEID)") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 16:29:14 -07:00
Karsten Graul	96d6fded95	net/smc: fix suppressed return code The patch that repaired the invalid return code in smcd_new_buf_create() missed to take care of errno ENOSPC which has a special meaning that no more DMBEs can be registered on the device. Fix that by keeping this errno value during the translation of the return code. Fixes: `6b1bbf94ab` ("net/smc: fix invalid return code in smcd_new_buf_create()") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 16:29:14 -07:00
Karsten Graul	4a9baf45fd	net/smc: fix null pointer dereference in smc_listen_decline() smc_listen_work() calls smc_listen_decline() on label out_decl, providing the ini pointer variable. But this pointer can still be null when the label out_decl is reached. Fix this by checking the ini variable in smc_listen_work() and call smc_listen_decline() with the result directly. Fixes: `a7c9c5f4af` ("net/smc: CLC accept / confirm V2") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 16:29:14 -07:00
Jeff Vander Stoep	af545bb5ee	vsock: use ns_capable_noaudit() on socket create During __vsock_create() CAP_NET_ADMIN is used to determine if the vsock_sock->trusted should be set to true. This value is used later for determing if a remote connection should be allowed to connect to a restricted VM. Unfortunately, if the caller doesn't have CAP_NET_ADMIN, an audit message such as an selinux denial is generated even if the caller does not want a trusted socket. Logging errors on success is confusing. To avoid this, switch the capable(CAP_NET_ADMIN) check to the noaudit version. Reported-by: Roman Kiryanov <rkir@google.com> https://android-review.googlesource.com/c/device/generic/goldfish/+/1468545/ Signed-off-by: Jeff Vander Stoep <jeffv@google.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/r/20201023143757.377574-1-jeffv@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 16:22:42 -07:00
Raju Rangoju	937d842058	cxgb4: set up filter action after rewrites The current code sets up the filter action field before rewrites are set up. When the action 'switch' is used with rewrites, this may result in initial few packets that get switched out don't have rewrites applied on them. So, make sure filter action is set up along with rewrites or only after everything else is set up for rewrites. Fixes: `12b276fbf6` ("cxgb4: add support to create hash filters") Signed-off-by: Raju Rangoju <rajur@chelsio.com> Link: https://lore.kernel.org/r/20201023115852.18262-1-rajur@chelsio.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 16:18:18 -07:00
Dan Carpenter	ee7a376421	net: hns3: clean up a return in hclge_tm_bp_setup() Smatch complains that "ret" might be uninitialized if we don't enter the loop. We do always enter the loop so it's a false positive, but it's cleaner to just return a literal zero and that silences the warning as well. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20201023112212.GA282278@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 16:16:23 -07:00
Linus Torvalds	4525c8781e	scsi: qla2xxx: remove incorrect sparse #ifdef The code to try to shut up sparse warnings about questionable locking didn't shut up sparse: it made the result not parse as valid C at all, since the end result now has a label with no statement. The proper fix is to just always lock the hardware, the same way Bart did in commit `8ae178760b` ("scsi: qla2xxx: Simplify the functions for dumping firmware"). That avoids the whole problem with having locking that is not statically obvious. But in the meantime, just remove the incorrect attempt at trying to avoid a sparse warning that just made things worse. This was exposed by commit `3e6efab865` ("scsi: qla2xxx: Fix reset of MPI firmware"), very similarly to how commit `cbb01c2f2f` ("scsi: qla2xxx: Fix MPI failure AEN (8200) handling") exposed the same problem in another place, and caused that commit `8ae178760b`. Please don't add code to just shut up sparse without actually fixing what sparse complains about. Reported-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Arun Easi <aeasi@marvell.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-26 15:45:22 -07:00
Linus Torvalds	bf9a76a470	arch/um: partially revert the conversion to __section() macro A couple of um files ended up not including the header file that defines the __section() macro, and the simplest fix is to just revert the change for those files. Fixes: `33def8498f` treewide: Convert macro and uses of __section(foo) to __section("foo") Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Cc: Joe Perches <joe@perches.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-26 15:39:37 -07:00
Gal Pressman	7d66a71488	RDMA/uverbs: Fix false error in query gid IOCTL Some drivers (such as EFA) have a GID table, but aren't IB/RoCE devices. Remove the unnecessary rdma_ib_or_roce() check. This fixes rdma-core failures for EFA when it uses the new ioctl interface for querying the GID table. Fixes: `9f85cbe50a` ("RDMA/uverbs: Expose the new GID query API to user space") Link: https://lore.kernel.org/r/20201026082621.32463-1-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:29:26 -03:00
Parav Pandit	fbdd0049d9	RDMA/mlx5: Fix devlink deadlock on net namespace deletion When a mlx5 core devlink instance is reloaded in different net namespace, its associated IB device is deleted and recreated. Example sequence is: $ ip netns add foo $ devlink dev reload pci/0000:00:08.0 netns foo $ ip netns del foo mlx5 IB device needs to attach and detach the netdevice to it through the netdev notifier chain during load and unload sequence. A below call graph of the unload flow. cleanup_net() down_read(&pernet_ops_rwsem); <- first sem acquired ops_pre_exit_list() pre_exit() devlink_pernet_pre_exit() devlink_reload() mlx5_devlink_reload_down() mlx5_unload_one() [...] mlx5_ib_remove() mlx5_ib_unbind_slave_port() mlx5_remove_netdev_notifier() unregister_netdevice_notifier() down_write(&pernet_ops_rwsem);<- recurrsive lock Hence, when net namespace is deleted, mlx5 reload results in deadlock. When deadlock occurs, devlink mutex is also held. This not only deadlocks the mlx5 device under reload, but all the processes which attempt to access unrelated devlink devices are deadlocked. Hence, fix this by mlx5 ib driver to register for per net netdev notifier instead of global one, which operats on the net namespace without holding the pernet_ops_rwsem. Fixes: `4383cfcc65` ("net/mlx5: Add devlink reload") Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:18:19 -03:00
Bob Pearson	edebc8407b	RDMA/rxe: Fix small problem in network_type patch The patch referenced below has a typo that results in using the wrong L2 header size for outbound traffic. (V4 <-> V6). It also breaks kernel-side RC traffic because they use AVs that use RDMA_NETWORK_XXX enums instead of RXE_NETWORK_TYPE_XXX enums. Fix this by transcoding between these enum types. Fixes: `e0d696d201` ("RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI") Link: https://lore.kernel.org/r/20201016211343.22906-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:12:17 -03:00
John Garry	fab09aaee8	scsi: hisi_sas: Stop using queue #0 always for v2 hw In commit `8d98416a55` ("scsi: hisi_sas: Switch v3 hw to MQ"), the dispatch function was changed to choose the delivery queue based on the request tag HW queue index. This heavily degrades performance for v2 hw, since the HW queues are not exposed there, and, as such, HW queue #0 is used for every command. Revert to previous behaviour for when nr_hw_queues is not set, that being to choose the HW queue based on target device index. Link: https://lore.kernel.org/r/1602750425-240341-1-git-send-email-john.garry@huawei.com Fixes: `8d98416a55` ("scsi: hisi_sas: Switch v3 hw to MQ") Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-10-26 17:34:27 -04:00
Takashi Iwai	bcc3775dcf	drm/amd/display: Clean up debug macros This patch simplifies the ASSERT*() and BREAK_TO_DEBUGGER() macros: - Move the dependency check of CONFIG_KGDB into Kconfig - Unify the kgdb_breakpoint() call - Drop the non-existing CONFIG_HAVE_KGDB Also align the behavior of ASSERT() macro in both cases with and without CONFIG_DEBUG_KERNEL_DC. Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-10-26 17:19:30 -04:00
Takashi Iwai	8b7dc1fe1a	drm/amd/display: Don't invoke kgdb_breakpoint() unconditionally ASSERT_CRITICAL() invokes kgdb_breakpoint() whenever either CONFIG_KGDB or CONFIG_HAVE_KGDB is set. This, however, may lead to a kernel panic when no kdb stuff is attached, since the kgdb_breakpoint() call issues INT3. It's nothing but a surprise for normal end-users. For avoiding the pitfall, make the kgdb_breakpoint() call only when CONFIG_DEBUG_KERNEL_DC is set. https://bugzilla.opensuse.org/show_bug.cgi?id=1177973 Cc: <stable@vger.kernel.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-10-26 17:19:15 -04:00
Takashi Iwai	920bb38c51	drm/amd/display: Fix kernel panic by dal_gpio_open() error Currently both error code paths handled in dal_gpio_open_ex() issues ASSERT_CRITICAL(), and this leads to a kernel panic unnecessarily if CONFIG_KGDB is enabled. Since basically both are non-critical errors and can be recovered, drop those assert calls and use a safer one, BREAK_TO_DEBUGGER(), for allowing the debugging, instead. BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1177973 Cc: <stable@vger.kernel.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-10-26 17:19:02 -04:00
Alex Deucher	0689dcf3e4	drm/amdgpu/display: use kvzalloc again in dc_create_state It looks this was accidently lost in a follow up patch. dc context is large and we don't need contiguous pages. Fixes: `e4863f118a` ("drm/amd/display: Multi display cause system lag on mode change") Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Aric Cyr <aric.cyr@amd.com> Cc: Alex Xu <alex_y_xu@yahoo.ca> Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Cc: stable@vger.kernel.org	2020-10-26 17:17:22 -04:00
Martin Leung	a1d2afc5dd	drm/amd/display: adding ddc_gpio_vga_reg_list to ddc reg def'ns why: oem-related ddc read/write fails without these regs how: copy from hw_factory_dcn20.c Signed-off-by: Martin Leung <martin.leung@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-10-26 17:16:18 -04:00

... 5 6 7 8 9 ...

966503 Commits