linux

Author	SHA1	Message	Date
Chris Mi	88bbd7b236	net/mlx5e: TC, Fix error handling memory leak Free the offload sample action on error. Fixes: `f94d6389f6` ("net/mlx5e: TC, Add support to offload sample action") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-09 20:56:58 -07:00
Shay Drory	ba317e832d	net/mlx5: Destroy pool->mutex Destroy pool->mutex when we destroy the pool. Fixes: `c36326d38d` ("net/mlx5: Round-Robin EQs over IRQs") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-09 20:56:55 -07:00
Shay Drory	5957cc557d	net/mlx5: Set all field of mlx5_irq before inserting it to the xarray Currently irq->pool is set after the irq is insert to the xarray. Set irq->pool before the irq is inserted to the xarray. Fixes: `71e084e264` ("net/mlx5: Allocating a pool of MSI-X vectors for SFs") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-09 20:56:52 -07:00
Shay Drory	3c8946e0e2	net/mlx5: Fix order of functions in mlx5_irq_detach_nb() Change order of functions in mlx5_irq_detach_nb() so it will be a mirror of mlx5_irq_attach_nb. Fixes: `71e084e264` ("net/mlx5: Allocating a pool of MSI-X vectors for SFs") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-09 20:56:49 -07:00
Aya Levin	c85a6b8feb	net/mlx5: Block switchdev mode while devlink traps are active Since switchdev mode can't support devlink traps, verify there are no active devlink traps before moving eswitch to switchdev mode. If there are active traps, prevent the switchdev mode configuration. Fixes: `eb3862a052` ("net/mlx5e: Enable traps according to link state") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-09 20:56:46 -07:00
Maxim Mikityanskiy	8ba3e4c858	net/mlx5e: Destroy page pool after XDP SQ to fix use-after-free mlx5e_close_xdpsq does the cleanup: it calls mlx5e_free_xdpsq_descs to free the outstanding descriptors, which relies on mlx5e_page_release_dynamic and page_pool_release_page. However, page_pool_destroy is already called by this point, because mlx5e_close_rq runs before mlx5e_close_xdpsq. This commit fixes the use-after-free by swapping mlx5e_close_xdpsq and mlx5e_close_rq. The commit cited below started calling page_pool_destroy directly from the driver. Previously, the page pool was destroyed under a call_rcu from xdp_rxq_info_unreg_mem_model, which would defer the deallocation until after the XDPSQ is cleaned up. Fixes: `1da4bbeffe` ("net: core: page_pool: add user refcnt and reintroduce page_pool_destroy") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-09 20:56:43 -07:00
Vlad Buslov	6d8680da2e	net/mlx5: Bridge, fix ageing time Ageing time is not converted from clock_t to jiffies which results incorrect ageing timeout calculation in workqueue update task. Fix it by applying clock_t_to_jiffies() to provided value. Fixes: `c636a0f0f3` ("net/mlx5: Bridge, dynamic entry ageing") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-09 20:56:40 -07:00
Roi Dayan	c623c95afa	net/mlx5e: Avoid creating tunnel headers for local route It could be local and remote are on the same machine and the route result will be a local route which will result in creating encap id with src/dst mac address of 0. Fixes: `a54e20b4fc` ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-09 20:56:38 -07:00
Alex Vesker	d3875924da	net/mlx5: DR, Add fail on error check on decap While processing encapsulated packet on RX, one of the fields that is checked is the inner packet length. If the length as specified in the header doesn't match the actual inner packet length, the packet is invalid and should be dropped. However, such packet caused the NIC to hang. This patch turns on a 'fail_on_error' HW bit which allows HW to drop such an invalid packet while processing RX packet and trying to decap it. Fixes: `ad17dc8cf9` ("net/mlx5: DR, Move STEv0 action apply logic") Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-09 20:56:34 -07:00
Leon Romanovsky	c633e79964	net/mlx5: Don't skip subfunction cleanup in case of error in module init Clean SF resources if mlx5 eth failed to initialize. Fixes: `1958fc2f07` ("net/mlx5: SF, Add auxiliary device driver") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-08-09 20:56:31 -07:00
Colin Ian King	83da6ad6f9	scsi: pm8001: Remove redundant initialization of variable 'rv' The variable 'rv' is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Link: https://lore.kernel.org/r/20210804143319.115340-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Addresses-Coverity: ("Unused value")	2021-08-09 23:47:28 -04:00
Colin Ian King	40d3272793	scsi: mpt3sas: Fix incorrectly assigned error return and check Currently the call to _base_static_config_pages() is assigning the error return to variable 'rc' but checking the error return in error 'r'. Fix this by assigning the error return to variable 'r' instead of 'rc'. Link: https://lore.kernel.org/r/20210804134940.114011-1-colin.king@canonical.com Fixes: `19a622c39a` ("scsi: mpt3sas: Handle firmware faults during first half of IOC init") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Addresses-Coverity: ("Unused value")	2021-08-09 23:46:19 -04:00
Colin Ian King	102851fc9a	scsi: ufs: ufshpb: Remove redundant initialization of variable 'lba' The variable 'lba' is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Link: https://lore.kernel.org/r/20210804133241.113509-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Addresses-Coverity: ("Unused value")	2021-08-09 23:43:05 -04:00
Colin Ian King	e71dd41ea0	scsi: elx: efct: Remove redundant initialization of variable 'ret' The variable 'ret' is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Link: https://lore.kernel.org/r/20210804132451.113086-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Addresses-Coverity: ("Unused value")	2021-08-09 23:41:19 -04:00
Wei Li	632c4ae6da	scsi: fdomain: Fix error return code in fdomain_probe() If request_region() fails the return value is not set. Return -EBUSY on error. Link: https://lore.kernel.org/r/20210715032625.1395495-1-liwei391@huawei.com Fixes: `8674a8aa2c` ("scsi: fdomain: Add PCMCIA support") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Li <liwei391@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-08-09 23:30:25 -04:00
Colin Ian King	e9b1adb7c5	scsi: snic: Remove redundant assignment to variable ret The variable ret is being initialized with a value that is never read, the assignment is redundant and can be removed. Link: https://lore.kernel.org/r/20210806112313.12434-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Addresses-Coverity: ("Unused value")	2021-08-09 23:24:32 -04:00
Adrian Hunter	bf25967ac5	scsi: ufshcd: Fix device links when BOOT WLUN fails to probe Managed device links are deleted by device_del(). However it is possible to add a device link to a consumer before device_add(), and then discovering an error prevents the device from being used. In that case normally references to the device would be dropped and the device would be deleted. However the device link holds a reference to the device, so the device link and device remain indefinitely (unless the supplier is deleted). For UFSHCD, if a LUN fails to probe (e.g. absent BOOT WLUN), the device will not have been registered but can still have a device link holding a reference to the device. The unwanted device link will prevent runtime suspend indefinitely. Amend device link removal to accept removal of a link with an unregistered consumer device (suggested by Rafael), and fix UFSHCD by explicitly deleting the device link when SCSI destroys the SCSI device. Link: https://lore.kernel.org/r/a1c9bac8-b560-b662-f0aa-58c7e000cbbd@intel.com Fixes: `b294ff3e34` ("scsi: ufs: core: Enable power management for wlun") Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-08-09 23:19:26 -04:00
Michael Kelley	dbe7633c39	scsi: storvsc: Log TEST_UNIT_READY errors as warnings Commit `08f76547f0` ("scsi: storvsc: Update error logging") added more robust logging of errors, particularly those reported as Hyper-V errors. But this change produces extra logging noise in that TEST_UNIT_READY may report errors during the normal course of detecting device adds and removes. Fix this by logging TEST_UNIT_READY errors as warnings, so that log lines are produced only if the storvsc log level is changed to WARN level on the kernel boot line. Link: https://lore.kernel.org/r/1628269970-87876-1-git-send-email-mikelley@microsoft.com Fixes: `08f76547f0` ("scsi: storvsc: Update error logging") Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-08-09 23:17:10 -04:00
Colin Ian King	a5402cdcc2	scsi: ufs: Fix unsigned int compared with less than zero Variable 'tag' is currently an unsigned int and is being compared to less than zero, this check is always false. Fix this by making 'tag' an int. Link: https://lore.kernel.org/r/20210806144301.19864-1-colin.king@canonical.com Fixes: `4728ab4a8e` ("scsi: ufs: Remove ufshcd_valid_tag()") Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Addresses-Coverity: ("Macro compares unsigned to 0")	2021-08-09 23:14:52 -04:00
Damien Le Moal	4758fd91d5	scsi: mpt3sas: Introduce sas_ncq_prio_supported sysfs sttribute Similarly to AHCI, introduce the device sysfs attribute sas_ncq_prio_supported to advertise if a SATA device supports the NCQ priority feature. Without this new attribute, the user can only discover if a SATA device supports NCQ priority by trying to enable the feature use with the sas_ncq_prio_enable sysfs device attribute, which fails when the device does not support high prioity commands. Link: https://lore.kernel.org/r/20210807041859.579409-11-damien.lemoal@wdc.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-08-09 23:10:58 -04:00
Suganath Prabu S	cdc1767698	scsi: mpt3sas: Update driver version to 39.100.00.00 Update driver version to 39.100.00.00. Link: https://lore.kernel.org/r/20210809072639.21228-3-suganath-prabu.subramani@broadcom.com Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-08-09 23:07:04 -04:00
Suganath Prabu S	787f2448c2	scsi: mpt3sas: Use firmware recommended queue depth Currently, the mpt3sas driver sets the default queue depth based on the physical interface of the attached device: - SAS : 254 - SATA: 32 - NVMe: 128 The IOC firmware provides a recommended queue depth for each device through SAS IO Unit Page1 for SAS/SATA and PCIe IO Unit Page 1 for NVMe devices. If the host sets the queue depth greater than the firmware recommended value, then the IOC places the I/Os above the recommended queue depth in an internal pending queue. This consumes outstanding host-credit/resources, thereby leading to potential starvation of other devices. To avoid this, use the device depth recommended by the IOC firmware. Link: https://lore.kernel.org/r/20210809072639.21228-2-suganath-prabu.subramani@broadcom.com Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-08-09 23:07:04 -04:00
Sreekanth Reddy	44f88ef3c9	scsi: mpt3sas: Bump driver version to 38.100.00.00 Bump driver version to 38.100.00.00. Link: https://lore.kernel.org/r/20210803065134.19090-1-sreekanth.reddy@broadcom.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-08-09 22:58:08 -04:00
Sreekanth Reddy	432bc7caef	scsi: mpt3sas: Add io_uring iopoll support Enable the driver to work in non-IRQ mode, i.e. there will not be any MSI-X vectors associated with queues dedicated to polling. The IOC hardware is single submission queue and multiple reply queue. However, using the shared host tagset support it is possible to simulate multiple hardware queues. When poll_queues are enabled through the module parameter, the driver will allocate extra reply queues without an MSI-X association. All I/O completion on these queues will be done through the iopoll interface. Link: https://lore.kernel.org/r/20210727081212.2742-1-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-08-09 22:55:50 -04:00
Ewan D. Milne	9977d880f7	scsi: lpfc: Move initialization of phba->poll_list earlier to avoid crash The phba->poll_list is traversed in case of an error in lpfc_sli4_hba_setup(), so it must be initialized earlier in case the error path is taken. [ 490.030738] lpfc 0000:65:00.0: 0:1413 Failed to init iocb list. [ 490.036661] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 490.044485] PGD 0 P4D 0 [ 490.047027] Oops: 0000 [#1] SMP PTI [ 490.050518] CPU: 0 PID: 7 Comm: kworker/0:1 Kdump: loaded Tainted: G I --------- - - 4.18. [ 490.060511] Hardware name: Dell Inc. PowerEdge R440/0WKGTH, BIOS 1.4.8 05/22/2018 [ 490.067994] Workqueue: events work_for_cpu_fn [ 490.072371] RIP: 0010:lpfc_sli4_cleanup_poll_list+0x20/0xb0 [lpfc] [ 490.078546] Code: cf e9 04 f7 fe ff 0f 1f 40 00 0f 1f 44 00 00 41 57 49 89 ff 41 56 41 55 41 54 4d 8d a79 [ 490.097291] RSP: 0018:ffffbd1a463dbcc8 EFLAGS: 00010246 [ 490.102518] RAX: 0000000000008200 RBX: ffff945cdb8c0000 RCX: 0000000000000000 [ 490.109649] RDX: 0000000000018200 RSI: ffff9468d0e16818 RDI: 0000000000000000 [ 490.116783] RBP: ffff945cdb8c1740 R08: 00000000000015c5 R09: 0000000000000042 [ 490.123915] R10: 0000000000000000 R11: ffffbd1a463dbab0 R12: ffff945cdb8c25c0 [ 490.131049] R13: 00000000fffffff4 R14: 0000000000001800 R15: ffff945cdb8c0000 [ 490.138182] FS: 0000000000000000(0000) GS:ffff9468d0e00000(0000) knlGS:0000000000000000 [ 490.146267] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 490.152013] CR2: 0000000000000000 CR3: 000000042ca10002 CR4: 00000000007706f0 [ 490.159146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 490.166277] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 490.173409] PKRU: 55555554 [ 490.176123] Call Trace: [ 490.178598] lpfc_sli4_queue_destroy+0x7f/0x3c0 [lpfc] [ 490.183745] lpfc_sli4_hba_setup+0x1bc7/0x23e0 [lpfc] [ 490.188797] ? kernfs_activate+0x63/0x80 [ 490.192721] ? kernfs_add_one+0xe7/0x130 [ 490.196647] ? __kernfs_create_file+0x80/0xb0 [ 490.201020] ? lpfc_pci_probe_one_s4.isra.48+0x46f/0x9e0 [lpfc] [ 490.206944] lpfc_pci_probe_one_s4.isra.48+0x46f/0x9e0 [lpfc] [ 490.212697] lpfc_pci_probe_one+0x179/0xb70 [lpfc] [ 490.217492] local_pci_probe+0x41/0x90 [ 490.221246] work_for_cpu_fn+0x16/0x20 [ 490.224994] process_one_work+0x1a7/0x360 [ 490.229009] ? create_worker+0x1a0/0x1a0 [ 490.232933] worker_thread+0x1cf/0x390 [ 490.236687] ? create_worker+0x1a0/0x1a0 [ 490.240612] kthread+0x116/0x130 [ 490.243846] ? kthread_flush_work_fn+0x10/0x10 [ 490.248293] ret_from_fork+0x35/0x40 [ 490.251869] Modules linked in: lpfc(+) xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4i [ 490.332609] CR2: 0000000000000000 Link: https://lore.kernel.org/r/20210809150947.18104-1-emilne@redhat.com Fixes: `93a4d6f401` ("scsi: lpfc: Add registration for CPU Offline/Online events") Cc: stable@vger.kernel.org Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-08-09 22:45:51 -04:00
Colin Ian King	da20b58d5b	xen-blkfront: Remove redundant assignment to variable err The variable err is being assigned a value that is never read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20210806110601.11386-1-colin.king@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-09 20:04:46 -06:00
Ming Lei	11431e26c9	blk-iocost: fix lockdep warning on blkcg->lock blkcg->lock depends on q->queue_lock which may depend on another driver lock required in irq context, one example is dm-thin: Chain exists of: &pool->lock#3 --> &q->queue_lock --> &blkcg->lock Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&blkcg->lock); local_irq_disable(); lock(&pool->lock#3); lock(&q->queue_lock); <Interrupt> lock(&pool->lock#3); Fix the issue by using spin_lock_irq(&blkcg->lock) in ioc_weight_write(). Cc: Tejun Heo <tj@kernel.org> Reported-by: Bruno Goncalves <bgoncalv@redhat.com> Link: https://lore.kernel.org/linux-block/CA+QYu4rzz6079ighEanS3Qq_Dmnczcf45ZoJoHKVLVATTo1e4Q@mail.gmail.com/T/#u Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210803070608.1766400-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-09 20:00:26 -06:00
Pavel Begunkov	43597aac1f	io_uring: fix ctx-exit io_rsrc_put_work() deadlock __io_rsrc_put_work() might need ->uring_lock, so nobody should wait for rsrc nodes holding the mutex. However, that's exactly what io_ring_ctx_free() does with io_wait_rsrc_data(). Split it into rsrc wait + dealloc, and move the first one out of the lock. Cc: stable@vger.kernel.org Fixes: `b60c8dce33` ("io_uring: preparation for rsrc tagging") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0130c5c2693468173ec1afab714e0885d2c9c363.1628559783.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-09 19:59:28 -06:00
Jens Axboe	c018db4a57	io_uring: drop ctx->uring_lock before flushing work item Ammar reports that he's seeing a lockdep splat on running test/rsrc_tags from the regression suite: ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5 Tainted: G OE ------------------------------------------------------ kworker/2:4/2684 is trying to acquire lock: ffff88814bb1c0a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_rsrc_put_work+0x13d/0x1a0 but task is already holding lock: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}: __flush_work+0x31b/0x490 io_rsrc_ref_quiesce.part.0.constprop.0+0x35/0xb0 __do_sys_io_uring_register+0x45b/0x1060 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&ctx->uring_lock){+.+.}-{3:3}: __lock_acquire+0x119a/0x1e10 lock_acquire+0xc8/0x2f0 __mutex_lock+0x86/0x740 io_rsrc_put_work+0x13d/0x1a0 process_one_work+0x236/0x530 worker_thread+0x52/0x3b0 kthread+0x135/0x160 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&(&ctx->rsrc_put_work)->work)); lock(&ctx->uring_lock); lock((work_completion)(&(&ctx->rsrc_put_work)->work)); lock(&ctx->uring_lock); * DEADLOCK * 2 locks held by kworker/2:4/2684: #0: ffff88810004d938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 #1: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 stack backtrace: CPU: 2 PID: 2684 Comm: kworker/2:4 Tainted: G OE 5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5 Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015 Workqueue: events io_rsrc_put_work Call Trace: dump_stack_lvl+0x6a/0x9a check_noncircular+0xfe/0x110 __lock_acquire+0x119a/0x1e10 lock_acquire+0xc8/0x2f0 ? io_rsrc_put_work+0x13d/0x1a0 __mutex_lock+0x86/0x740 ? io_rsrc_put_work+0x13d/0x1a0 ? io_rsrc_put_work+0x13d/0x1a0 ? io_rsrc_put_work+0x13d/0x1a0 ? process_one_work+0x1ce/0x530 io_rsrc_put_work+0x13d/0x1a0 process_one_work+0x236/0x530 worker_thread+0x52/0x3b0 ? process_one_work+0x530/0x530 kthread+0x135/0x160 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 which is due to holding the ctx->uring_lock when flushing existing pending work, while the pending work flushing may need to grab the uring lock if we're using IOPOLL. Fix this by dropping the uring_lock a bit earlier as part of the flush. Cc: stable@vger.kernel.org Link: https://github.com/axboe/liburing/issues/404 Tested-by: Ammar Faizi <ammarfaizi2@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-09 19:59:06 -06:00
Hao Xu	47cae0c71f	io-wq: fix IO_WORKER_F_FIXED issue in create_io_worker() There may be cases like: A B spin_lock(wqe->lock) nr_workers is 0 nr_workers++ spin_unlock(wqe->lock) spin_lock(wqe->lock) nr_wokers is 1 nr_workers++ spin_unlock(wqe->lock) create_io_worker() acct->worker is 1 create_io_worker() acct->worker is 1 There should be one worker marked IO_WORKER_F_FIXED, but no one is. Fix this by introduce a new agrument for create_io_worker() to indicate if it is the first worker. Fixes: `3d4e4face9` ("io-wq: fix no lock protection of acct->nr_worker") Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20210808135434.68667-3-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-09 19:59:06 -06:00
Hao Xu	49e7f0c789	io-wq: fix bug of creating io-wokers unconditionally The former patch to add check between nr_workers and max_workers has a bug, which will cause unconditionally creating io-workers. That's because the result of the check doesn't affect the call of create_io_worker(), fix it by bringing in a boolean value for it. Fixes: `21698274da` ("io-wq: fix lack of acct->nr_workers < acct->max_workers judgement") Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20210808135434.68667-2-haoxu@linux.alibaba.com [axboe: drop hunk that isn't strictly needed] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-09 19:59:06 -06:00
Jens Axboe	4956b9eaad	io_uring: rsrc ref lock needs to be IRQ safe Nadav reports running into the below splat on re-enabling softirqs: WARNING: CPU: 2 PID: 1777 at kernel/softirq.c:364 __local_bh_enable_ip+0xaa/0xe0 Modules linked in: CPU: 2 PID: 1777 Comm: umem Not tainted 5.13.1+ #161 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020 RIP: 0010:__local_bh_enable_ip+0xaa/0xe0 Code: a9 00 ff ff 00 74 38 65 ff 0d a2 21 8c 7a e8 ed 1a 20 00 fb 66 0f 1f 44 00 00 5b 41 5c 5d c3 65 8b 05 e6 2d 8c 7a 85 c0 75 9a <0f> 0b eb 96 e8 2d 1f 20 00 eb a5 4c 89 e7 e8 73 4f 0c 00 eb ae 65 RSP: 0018:ffff88812e58fcc8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000 RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffff8898c5ac RBP: ffff88812e58fcd8 R08: ffffffff8575dbbf R09: ffffed1028ef14f9 R10: ffff88814778a7c3 R11: ffffed1028ef14f8 R12: ffffffff85c9e9ae R13: ffff88814778a000 R14: ffff88814778a7b0 R15: ffff8881086db890 FS: 00007fbcfee17700(0000) GS:ffff8881e0300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c0402a5008 CR3: 000000011c1ac003 CR4: 00000000003706e0 Call Trace: _raw_spin_unlock_bh+0x31/0x40 io_rsrc_node_ref_zero+0x13e/0x190 io_dismantle_req+0x215/0x220 io_req_complete_post+0x1b8/0x720 __io_complete_rw.isra.0+0x16b/0x1f0 io_complete_rw+0x10/0x20 where it's clear we end up calling the percpu count release directly from the completion path, as it's in atomic mode and we drop the last ref. For file/block IO, this can be from IRQ context already, and the softirq locking for rsrc isn't enough. Just make the lock fully IRQ safe, and ensure we correctly safe state from the release path as we don't know the full context there. Reported-by: Nadav Amit <nadav.amit@gmail.com> Tested-by: Nadav Amit <nadav.amit@gmail.com> Link: https://lore.kernel.org/io-uring/C187C836-E78B-4A31-B24C-D16919ACA093@gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-09 19:58:59 -06:00
Nick Desaulniers	f12b034afe	scripts/Makefile.clang: default to LLVM_IAS=1 LLVM_IAS=1 controls enabling clang's integrated assembler via -integrated-as. This was an explicit opt in until we could enable assembler support in Clang for more architecures. Now we have support and CI coverage of LLVM_IAS=1 for all architecures except a few more bugs affecting s390 and powerpc. This commit flips the default from opt in via LLVM_IAS=1 to opt out via LLVM_IAS=0. CI systems or developers that were previously doing builds with CC=clang or LLVM=1 without explicitly setting LLVM_IAS must now explicitly opt out via LLVM_IAS=0, otherwise they will be implicitly opted-in. This finally shortens the command line invocation when cross compiling with LLVM to simply: $ make ARCH=arm64 LLVM=1 Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2021-08-10 09:13:25 +09:00
Masahiro Yamada	52cc02b910	kbuild: check CONFIG_AS_IS_LLVM instead of LLVM_IAS LLVM_IAS is the user interface to set the -(no-)integrated-as flag, and it should be used only for that purpose. LLVM_IAS is checked in some places to determine the assembler type, but it is not precise. For example, $ make CC=gcc LLVM_IAS=1 ... will use the GNU assembler (i.e. binutils) since LLVM_IAS=1 is effective only when $(CC) is clang. Of course, 'CC=gcc LLVM_IAS=1' is an odd combination, but the build system can be more robust against such insane input. Commit `ba64beb174` ("kbuild: check the minimum assembler version in Kconfig") introduced CONFIG_AS_IS_GNU/LLVM, which is more precise because Kconfig checks the version string from the assembler in use. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org>	2021-08-10 09:13:25 +09:00
Nick Desaulniers	e08831baa0	Documentation/llvm: update CROSS_COMPILE inferencing As noted by Masahiro, document how we can generally infer CROSS_COMPILE (and the more specific details about --target and --prefix) based on ARCH. Change use of env vars to command line parameters. Suggested-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Fangrui Song <maskray@google.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2021-08-10 09:13:25 +09:00
Nick Desaulniers	231ad7f409	Makefile: infer --target from ARCH for CC=clang We get constant feedback that the command line invocation of make is too long when compiling with LLVM. CROSS_COMPILE is helpful when a toolchain has a prefix of the target triple, or is an absolute path outside of $PATH. Since a Clang binary is generally multi-targeted, we can infer a given target from SRCARCH/ARCH. If CROSS_COMPILE is not set, simply set --target= for CLANG_FLAGS, KBUILD_CFLAGS, and KBUILD_AFLAGS based on $SRCARCH. Previously, we'd cross compile via: $ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- make LLVM=1 LLVM_IAS=1 Now: $ ARCH=arm64 make LLVM=1 LLVM_IAS=1 For native builds (not involving cross compilation) we now explicitly specify a target triple rather than rely on the implicit host triple. Link: https://github.com/ClangBuiltLinux/linux/issues/1399 Suggested-by: Arnd Bergmann <arnd@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Masahiro Yamada <masahiroy@kernel.org> Suggested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Arnd Bergmann <arnd@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2021-08-10 09:13:25 +09:00
Nick Desaulniers	6f5b41a2f5	Makefile: move initial clang flag handling into scripts/Makefile.clang With some of the changes we'd like to make to CROSS_COMPILE, the initial block of clang flag handling which controls things like the target triple, whether or not to use the integrated assembler and how to find GAS, and erroring on unknown warnings is becoming unwieldy. Move it into its own file under scripts/. Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2021-08-10 09:13:25 +09:00
Masahiro Yamada	6072b2c49d	kbuild: warn if a different compiler is used for external module builds It is always safe to use the same compiler for the kernel and external modules, but in reality, some distributions such as Fedora release a different version of GCC from the one used for building the kernel. There was a long discussion about mixing different compilers [1]. I do not repeat it here, but at least, showing a heads up in that case is better than nothing. Linus suggested [2]: And a warning might be more palatable even if different compiler version work fine together. Just a heads up on "it looks like you might be mixing compiler versions" is a valid note, and isn't necessarily wrong. Even when they work well together, maybe you want to have people at least _aware_ of it. This commit shows a warning unless the compiler is exactly the same. warning: the compiler differs from the one used to build the kernel The kernel was built by: gcc (GCC) 11.1.1 20210531 (Red Hat 11.1.1-3) You are using: gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-1) Check the difference, and if it is OK with you, please proceed at your risk. To avoid the locale issue as in commit `bcbcf50f52` ("kbuild: fix ld-version.sh to not be affected by locale"), pass LC_ALL=C to "$(CC) --version". [1] https://lore.kernel.org/linux-hardening/efe6b039a544da8215d5e54aa7c4b6d1986fc2b0.1611607264.git.jpoimboe@redhat.com/ [2] https://lore.kernel.org/lkml/CAHk-=wgjwhDy-y4mQh34L+2aF=n6BjzHdqAW2=8wri5x7O04pA@mail.gmail.com/ Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2021-08-10 09:13:25 +09:00
Masahiro Yamada	0058d07ec6	scripts: make some scripts executable Set the x bit to some scripts to make them directly executable. Especially, scripts/checkdeclares.pl is not hooked by anyone. It should be executable since it is tedious to type 'perl scripts/checkdeclares.pl'. The original patch [1] set the x bit properly, but it was lost when it was merged as commit `21917bded7` ("scripts: a new script for checking duplicate struct declaration"). [1] https://lore.kernel.org/lkml/20210401110943.1010796-1-wanjiabing@vivo.com/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2021-08-10 09:13:25 +09:00
Linus Torvalds	9a73fa375d	Merge branch 'for-5.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "One commit to fix a possible A-A deadlock around u64_stats_sync on 32bit machines caused by updating it without disabling IRQ when it may be read from IRQ context" * 'for-5.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: rstat: fix A-A deadlock on 32bit around u64_stats_sync	2021-08-09 16:47:36 -07:00
Masahiro Yamada	d828563955	kbuild: do not require sub-make for separate output tree builds As explained in commit `3204a7fb98` ("kbuild: prefix $(srctree)/ to some included Makefiles"), I want to stop using --include-dir some day. I already fixed up the top Makefile, but some arch Makefiles (mips, um, x86) still include check-in Makefiles without $(srctree)/. Fix them up so 'need-sub-make := 1' can go away for this case. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2021-08-10 08:23:39 +09:00
Matthias Maennich	a325db2d8f	scripts: merge_config: add strict mode to fail upon any redefinition When merging configuration fragments, it might be of interest to identify mismatches (redefinitions) programmatically. Hence add the option -s (strict mode) to instruct merge_config.sh to bail out in case any redefinition has been detected. With strict mode, warnings are emitted as before, but the script terminates with rc=1. If -y is set to define "builtin having precedence over modules", fragments are still allowed to set =m (while the base config has =y). Strict mode will tolerate that as demotions from =y to =m are ignored when setting -y. Signed-off-by: Matthias Maennich <maennich@google.com> Reviewed-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2021-08-10 08:23:39 +09:00
Allison Henderson	df0826312a	xfs: add attr state machine tracepoints This is a quick patch to add a new xfs_attr_*_return tracepoints. We use these to track when ever a new state is set or -EAGAIN is returned Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-08-09 16:16:40 -07:00
Darrick J. Wong	4bc619833f	xfs: refactor xfs_iget calls from log intent recovery Hoist the code from xfs_bui_item_recover that igets an inode and marks it as being part of log intent recovery. The next patch will want a common function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>	2021-08-09 15:57:59 -07:00
Darrick J. Wong	2b73a2c817	xfs: clear log incompat feature bits when the log is idle When there are no ongoing transactions and the log contents have been checkpointed back into the filesystem, the log performs 'covering', which is to say that it log a dummy transaction to record the fact that the tail has caught up with the head. This is a good time to clear log incompat feature flags, because they are flags that are temporarily set to limit the range of kernels that can replay a dirty log. Since it's possible that some other higher level thread is about to start logging items protected by a log incompat flag, we create a rwsem so that upper level threads can coordinate this with the log. It would probably be more performant to use a percpu rwsem, but the ability to /try/ taking the write lock during covering is critical, and percpu rwsems do not provide that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>	2021-08-09 15:57:59 -07:00
Darrick J. Wong	908ce71e54	xfs: allow setting and clearing of log incompat feature flags Log incompat feature flags in the superblock exist for one purpose: to protect the contents of a dirty log from replay on a kernel that isn't prepared to handle those dirty contents. This means that they can be cleared if (a) we know the log is clean and (b) we know that there aren't any other threads in the system that might be setting or relying upon a log incompat flag. Therefore, clear the log incompat flags when we've finished recovering the log, when we're unmounting cleanly, remounting read-only, or freezing; and provide a function so that subsequent patches can start using this. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>	2021-08-09 15:57:59 -07:00
Dave Chinner	d634525db6	xfs: replace kmem_alloc_large() with kvmalloc() There is no reason for this wrapper existing anymore. All the places that use KM_NOFS allocation are within transaction contexts and hence covered by memalloc_nofs_save/restore contexts. Hence we don't need any special handling of vmalloc for large IOs anymore and so special casing this code isn't necessary. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-08-09 15:57:43 -07:00
Dave Chinner	98fe2c3cef	xfs: remove kmem_alloc_io() Since commit `59bb47985c` ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"), the core slab code now guarantees slab alignment in all situations sufficient for IO purposes (i.e. minimum of 512 byte alignment of >= 512 byte sized heap allocations) we no longer need the workaround in the XFS code to provide this guarantee. Replace the use of kmem_alloc_io() with kmem_alloc() or kmem_alloc_large() appropriately, and remove the kmem_alloc_io() interface altogether. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-08-09 15:57:43 -07:00
Dave Chinner	de2860f463	mm: Add kvrealloc() During log recovery of an XFS filesystem with 64kB directory buffers, rebuilding a buffer split across two log records results in a memory allocation warning from krealloc like this: xfs filesystem being mounted at /mnt/scratch supports timestamps until 2038 (0x7fffffff) XFS (dm-0): Unmounting Filesystem XFS (dm-0): Mounting V5 Filesystem XFS (dm-0): Starting recovery (logdev: internal) ------------[ cut here ]------------ WARNING: CPU: 5 PID: 3435170 at mm/page_alloc.c:3539 get_page_from_freelist+0xdee/0xe40 ..... RIP: 0010:get_page_from_freelist+0xdee/0xe40 Call Trace: ? complete+0x3f/0x50 __alloc_pages+0x16f/0x300 alloc_pages+0x87/0x110 kmalloc_order+0x2c/0x90 kmalloc_order_trace+0x1d/0x90 __kmalloc_track_caller+0x215/0x270 ? xlog_recover_add_to_cont_trans+0x63/0x1f0 krealloc+0x54/0xb0 xlog_recover_add_to_cont_trans+0x63/0x1f0 xlog_recovery_process_trans+0xc1/0xd0 xlog_recover_process_ophdr+0x86/0x130 xlog_recover_process_data+0x9f/0x160 xlog_recover_process+0xa2/0x120 xlog_do_recovery_pass+0x40b/0x7d0 ? __irq_work_queue_local+0x4f/0x60 ? irq_work_queue+0x3a/0x50 xlog_do_log_recovery+0x70/0x150 xlog_do_recover+0x38/0x1d0 xlog_recover+0xd8/0x170 xfs_log_mount+0x181/0x300 xfs_mountfs+0x4a1/0x9b0 xfs_fs_fill_super+0x3c0/0x7b0 get_tree_bdev+0x171/0x270 ? suffix_kstrtoint.constprop.0+0xf0/0xf0 xfs_fs_get_tree+0x15/0x20 vfs_get_tree+0x24/0xc0 path_mount+0x2f5/0xaf0 __x64_sys_mount+0x108/0x140 do_syscall_64+0x3a/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae Essentially, we are taking a multi-order allocation from kmem_alloc() (which has an open coded no fail, no warn loop) and then reallocating it out to 64kB using krealloc(__GFP_NOFAIL) and that is then triggering the above warning. This is a regression caused by converting this code from an open coded no fail/no warn reallocation loop to using __GFP_NOFAIL. What we actually need here is kvrealloc(), so that if contiguous page allocation fails we fall back to vmalloc() and we don't get nasty warnings happening in XFS. Fixes: `771915c4f6` ("xfs: remove kmem_realloc()") Signed-off-by: Dave Chinner <dchinner@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-08-09 15:57:43 -07:00
Sebastian Andrzej Siewior	c5c63b9a6a	cgroup: Replace deprecated CPU-hotplug functions. The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: cgroups@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-08-09 12:53:35 -10:00

... 125 126 127 128 129 ...

1042671 Commits