linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-23 20:51:44 +00:00

Author	SHA1	Message	Date
Christoph Hellwig	5eaee6e9c8	drbd: split out a drbd_discard_supported helper Add a helper to check if discard is supported for a given connection / backing device combination. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com> Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20240306140332.623759-7-philipp.reisner@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-06 08:30:34 -07:00
Christoph Hellwig	e3992e02c9	drbd: don't set max_write_zeroes_sectors in decide_on_discard_support fixup_write_zeroes always overrides the max_write_zeroes_sectors value a little further down the callchain, so don't bother to setup a limit in decide_on_discard_support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com> Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20240306140332.623759-6-philipp.reisner@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-06 08:30:34 -07:00
Christoph Hellwig	e16344e506	drbd: merge drbd_setup_queue_param into drbd_reconsider_queue_parameters drbd_setup_queue_param is only called by drbd_reconsider_queue_parameters and there is no really clear boundary of responsibilities between the two. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com> Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20240306140332.623759-5-philipp.reisner@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-06 08:30:34 -07:00
Christoph Hellwig	2828908d5c	drbd: refactor the backing dev max_segments calculation Factor out a drbd_backing_dev_max_segments helper that checks the backing device limitation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com> Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20240306140332.623759-4-philipp.reisner@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-06 08:30:34 -07:00
Christoph Hellwig	342d81fde2	drbd: refactor drbd_reconsider_queue_parameters Split out a drbd_max_peer_bio_size helper for the peer I/O size, and condense the various checks to a nested min3(..., max())) instead of using a lot of local variables. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240305134041.137006-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-06 08:30:34 -07:00
Christoph Hellwig	aa067325c0	drbd: pass the max_hw_sectors limit to blk_alloc_disk Pass a queue_limits structure with the max_hw_sectors limit to blk_alloc_disk instead of updating the limit on the allocated gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240305134041.137006-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-06 08:30:34 -07:00
Li kunyu	5f2ad31fbb	sed-opal: Remove the ret variable from the function The ret variable in the function has not yet been effective and can be removed. Signed-off-by: Li kunyu <kunyu@nfschina.com> Link: https://lore.kernel.org/r/20240306101444.1244-1-kunyu@nfschina.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-06 08:29:49 -07:00
Li kunyu	2449be8c8c	sed-opal: Remove unnecessary ‘0’ values from ret ret is assigned first, so it does not need to initialize the assignment. Signed-off-by: Li kunyu <kunyu@nfschina.com> Link: https://lore.kernel.org/r/20240306100659.106521-1-kunyu@nfschina.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-06 08:29:43 -07:00
Li zeming	217fcc4807	sed-opal: Remove unnecessary ‘0’ values from err err is assigned first, so it does not need to initialize the assignment. Signed-off-by: Li zeming <zeming@nfschina.com> Link: https://lore.kernel.org/r/20240306100216.69340-1-zeming@nfschina.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-06 08:29:38 -07:00
Li zeming	147fe61334	sed-opal: Remove unnecessary ‘0’ values from error error is assigned first, so it does not need to initialize the assignment. Signed-off-by: Li zeming <zeming@nfschina.com> Link: https://lore.kernel.org/r/20240306095608.26839-1-zeming@nfschina.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-06 08:29:31 -07:00
Ricardo B. Marliere	f8c7511db0	block: make block_class constant Since commit `43a7206b09` ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the block_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240305-class_cleanup-block-v1-1-130bb27b9c72@marliere.net Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-06 08:29:20 -07:00
Jens Axboe	33bffbb727	Merge tag 'md-6.9-20240305' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.9/block Pull MD fixes from Song: "This set fixes two issues: 1. dmraid regression since 6.7 kernels. This issue was initially reported in [1]. This set of fix has been reviewed and tested by md and dm folks. 2. raid5 hang since 6.7 kernel, reported in [2]. We haven't got a better fix for this issue yet. This revert is a workaround. It has been applied to 6.7 stable kernels [3], and proved to be affective. We will look more into this issue for a better fix. [1] https://lore.kernel.org/linux-raid/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/ [2] https://lore.kernel.org/linux-raid/20240123005700.9302-1-dan@danm.net/ [3] 87165c64fe1a in linux-6.7.y branch." * tag 'md-6.9-20240305' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: dm-raid: fix lockdep waring in "pers->hot_add_disk" dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape dm-raid: add a new helper prepare_suspend() in md_personality md/dm-raid: don't call md_reap_sync_thread() directly dm-raid: really frozen sync_thread during suspend md: add a new helper reshape_interrupted() md: export helper md_is_rdwr() md: export helpers to stop sync_thread md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume Revert "Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d""	2024-03-06 08:27:41 -07:00
Christoph Hellwig	fde07a4d74	dasd: use the atomic queue limits API Pass the constant limits directly to blk_mq_alloc_disk, set the nonrot flag there as well, and then use the commit API to change the transfer size and logical block size dependent values. This relies on the assumption that no I/O can be pending before the devices moves into the ready state and doesn't need extra freezing for changes to the queue limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240228133742.806274-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-06 08:27:01 -07:00
Christoph Hellwig	0127a47f58	dasd: move queue setup to common code Most of the code in setup_blk_queue is shared between all disciplines. Move it to common code and leave a method to query the maximum number of transferable blocks, and a flag to indicate discard support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240228133742.806274-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-06 08:27:00 -07:00
Christoph Hellwig	41463f2dfd	dasd: cleamup dasd_state_basic_to_ready Reflow dasd_state_basic_to_ready a bit to make it easier to modify. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240228133742.806274-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-06 08:27:00 -07:00
Tony Battersby	38b43539d6	block: Fix page refcounts for unaligned buffers in __bio_release_pages() Fix an incorrect number of pages being released for buffers that do not start at the beginning of a page. Fixes: `1b151e2435` ("block: Remove special-casing of compound pages") Cc: stable@vger.kernel.org Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Tested-by: Greg Edwards <gedwards@ddn.com> Link: https://lore.kernel.org/r/86e592a9-98d4-4cff-a646-0c0084328356@cybernetics.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-06 08:26:42 -07:00
Douglas Anderson	317f86dc1b	Revert "drm/udl: Add ARGB8888 as a format" This reverts commit `95bf25bb9e`. Apparently there was a previous discussion about emulation of formats and it was decided XRGB8888 was the only format to support for legacy userspace [1]. Remove ARGB8888. Userspace needs to be fixed to accept XRGB8888. [1] https://lore.kernel.org/r/60dc7697-d7a0-4bf4-a22e-32f1bbb792c2@suse.de Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240306063721.1.I4a32475190334e1fa4eef4700ecd2787a43c94b5@changeid	2024-03-06 07:08:24 -08:00
Johan Hovold	47b412c1ea	phy: qcom-qmp-combo: fix type-c switch registration Due to a long-standing issue in driver core, drivers may not probe defer after having registered child devices to avoid triggering a probe deferral loop (see `fbc35b45f9` ("Add documentation on meaning of -EPROBE_DEFER")). Move registration of the typec switch to after looking up clocks and other resources. Note that PHY creation can in theory also trigger a probe deferral when a 'phy' supply is used. This does not seem to affect the QMP PHY driver but the PHY subsystem should be reworked to address this (i.e. by separating initialisation and registration of the PHY). Fixes: `2851117f8f` ("phy: qcom-qmp-combo: Introduce orientation switching") Cc: stable@vger.kernel.org # 6.5 Cc: Bjorn Andersson <quic_bjorande@quicinc.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Acked-by: Vinod Koul <vkoul@kernel.org> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20240217150228.5788-7-johan+linaro@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>	2024-03-06 20:37:37 +05:30
Johan Hovold	d2d7b8e880	phy: qcom-qmp-combo: fix drm bridge registration Due to a long-standing issue in driver core, drivers may not probe defer after having registered child devices to avoid triggering a probe deferral loop (see `fbc35b45f9` ("Add documentation on meaning of -EPROBE_DEFER")). This could potentially also trigger a bug in the DRM bridge implementation which does not expect bridges to go away even if device links may avoid triggering this (when enabled). Move registration of the DRM aux bridge to after looking up clocks and other resources. Note that PHY creation can in theory also trigger a probe deferral when a 'phy' supply is used. This does not seem to affect the QMP PHY driver but the PHY subsystem should be reworked to address this (i.e. by separating initialisation and registration of the PHY). Fixes: `35921910bb` ("phy: qcom: qmp-combo: switch to DRM_AUX_BRIDGE") Fixes: `1904c3f578` ("phy: qcom-qmp-combo: Introduce drm_bridge") Cc: stable@vger.kernel.org # 6.5 Cc: Bjorn Andersson <quic_bjorande@quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Acked-by: Vinod Koul <vkoul@kernel.org> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20240217150228.5788-6-johan+linaro@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>	2024-03-06 20:37:37 +05:30
Keith Busch	7e80eb792b	nvme: clear caller pointer on identify failure The memory allocated for the identification is freed on failure. Set it to NULL so the caller doesn't have a pointer to that freed address. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-03-06 06:29:01 -08:00
Shin'ichiro Kawasaki	8d0d244739	nvme: host: fix double-free of struct nvme_id_ns in ns_update_nuse() When nvme_identify_ns() fails, it frees the pointer to the struct nvme_id_ns before it returns. However, ns_update_nuse() calls kfree() for the pointer even when nvme_identify_ns() fails. This results in KASAN double-free, which was observed with blktests nvme/045 with proposed patches [1] on the kernel v6.8-rc7. Fix the double-free by skipping kfree() when nvme_identify_ns() fails. Link: https://lore.kernel.org/linux-block/20240304161303.19681-1-dwagner@suse.de/ [1] Fixes: `a1a825ab6a` ("nvme: add csi, ms and nuse to sysfs") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2024-03-06 06:02:15 -08:00
Frederic Weisbecker	8ca1836769	timer/migration: Fix quick check reporting late expiry When a CPU is the last active in the hierarchy and it tries to enter into idle, the quick check looking up the next event towards cpuidle heuristics may report a too late expiry, such as in the following scenario: [GRP1:0] migrator = NONE active = NONE nextevt = T0:0, T0:1 / \ [GRP0:0] [GRP0:1] migrator = NONE migrator = NONE active = NONE active = NONE nextevt = T0, T1 nextevt = T2 / \ / \ 0 1 2 3 idle idle idle idle 0) The whole system is idle, and CPU 0 was the last migrator. CPU 0 has a timer (T0), CPU 1 has a timer (T1) and CPU 2 has a timer (T2). The expire order is T0 < T1 < T2. [GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = T0:0(i), T0:1 / \ [GRP0:0] [GRP0:1] migrator = CPU0 migrator = NONE active = CPU0 active = NONE nextevt = T0(i), T1 nextevt = T2 / \ / \ 0 1 2 3 active idle idle idle 1) CPU 0 becomes active. The (i) means a now ignored timer. [GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = T0:1 / \ [GRP0:0] [GRP0:1] migrator = CPU0 migrator = NONE active = CPU0 active = NONE nextevt = T1 nextevt = T2 / \ / \ 0 1 2 3 active idle idle idle 2) CPU 0 handles remote. No timer actually expired but ignored timers have been cleaned out and their sibling's timers haven't been propagated. As a result the top level's next event is T2 and not T1. 3) CPU 0 tries to enter idle without any global timer enqueued and calls tmigr_quick_check(). The expiry of T2 is returned instead of the expiry of T1. When the quick check returns an expiry that is too late, the cpuidle governor may pick up a C-state that is too deep. This may be result into undesired CPU wake up latency if the next timer is actually close enough. Fix this with assuming that expiries aren't sorted top-down while performing the quick check. Pick up instead the earliest encountered one while walking up the hierarchy. `7ee9887703` ("timers: Implement the hierarchical pull model") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240305002822.18130-1-frederic@kernel.org	2024-03-06 15:02:09 +01:00
Animesh Manna	984318aaf7	drm/i915/panelreplay: Move out psr_init_dpcd() from init_connector() Move psr_init_dpcd() from init-connector to connector-detect function. The dpcd probe for checking panel replay capability for external dp connector is causing delay during boot which can be optimized by moving dpcd probe to connector specific detect(). v1: Initial version. v2: Add details in commit description. [Jani] Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10284 Signed-off-by: Animesh Manna <animesh.manna@intel.com> Fixes: `cceeaa312d` ("drm/i915/panelreplay: Enable panel replay dpcd initialization for DP") Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240229043716.4065760-1-animesh.manna@intel.com (cherry picked from commit `1cca19bf29`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2024-03-06 15:41:16 +02:00
Thomas Gleixner	f0551af021	x86/topology: Ignore non-present APIC IDs in a present package Borislav reported that one of his systems has a broken MADT table which advertises eight present APICs and 24 non-present APICs in the same package. The non-present ones are considered hot-pluggable by the topology evaluation code, which is obviously bogus as there is no way to hot-plug within the same package. As the topology evaluation code accounts for hot-pluggable CPUs in a package, the maximum number of cores per package is computed wrong, which in turn causes the uncore performance counter driver to access non-existing MSRs. It will probably confuse other entities which rely on the maximum number of cores and threads per package too. Cure this by ignoring hot-pluggable APIC IDs within a present package. In theory it would be reasonable to just do this unconditionally, but then there is this thing called reality^Wvirtualization which ruins everything. Virtualization is the only existing user of "physical" hotplug and the virtualization tools allow the above scenario. Whether that is actually in use or not is unknown. As it can be argued that the virtualization case is not affected by the issues which exposed the reported problem, allow the bogosity if the kernel determined that it is running in a VM for now. Fixes: `89b0f15f40` ("x86/cpu/topology: Get rid of cpuinfo::x86_max_cores") Reported-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/87a5nbvccx.ffs@tglx	2024-03-06 14:35:30 +01:00
Edmund Raile	575801663c	firewire: ohci: prevent leak of left-over IRQ on unbind Commit `5a95f1ded2` ("firewire: ohci: use devres for requested IRQ") also removed the call to free_irq() in pci_remove(), leading to a leftover irq of devm_request_irq() at pci_disable_msi() in pci_remove() when unbinding the driver from the device remove_proc_entry: removing non-empty directory 'irq/136', leaking at least 'firewire_ohci' Call Trace: ? remove_proc_entry+0x19c/0x1c0 ? __warn+0x81/0x130 ? remove_proc_entry+0x19c/0x1c0 ? report_bug+0x171/0x1a0 ? console_unlock+0x78/0x120 ? handle_bug+0x3c/0x80 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? remove_proc_entry+0x19c/0x1c0 unregister_irq_proc+0xf4/0x120 free_desc+0x3d/0xe0 ? kfree+0x29f/0x2f0 irq_free_descs+0x47/0x70 msi_domain_free_locked.part.0+0x19d/0x1d0 msi_domain_free_irqs_all_locked+0x81/0xc0 pci_free_msi_irqs+0x12/0x40 pci_disable_msi+0x4c/0x60 pci_remove+0x9d/0xc0 [firewire_ohci 01b483699bebf9cb07a3d69df0aa2bee71db1b26] pci_device_remove+0x37/0xa0 device_release_driver_internal+0x19f/0x200 unbind_store+0xa1/0xb0 remove irq with devm_free_irq() before pci_disable_msi() also remove it in fail_msi: of pci_probe() as this would lead to an identical leak Cc: stable@vger.kernel.org Fixes: `5a95f1ded2` ("firewire: ohci: use devres for requested IRQ") Signed-off-by: Edmund Raile <edmund.raile@proton.me> Link: https://lore.kernel.org/r/20240229144723.13047-2-edmund.raile@proton.me Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>	2024-03-06 22:35:22 +09:00
Imre Deak	0848814aa2	drm/i915/dp: Fix connector DSC HW state readout The DSC HW state of DP connectors is read out during driver loading and system resume in intel_modeset_update_connector_atomic_state(). This function is called for all connectors though and so the state of DSI connectors will also get updated incorrectly, triggering a WARN there wrt. the DSC decompression AUX device. Fix the above by moving the DSC state readout to a new DP connector specific sync_state() hook. This is anyway the logical place to update the connector object's state vs. the connector's atomic state. Fixes: `b2608c6b32` ("drm/i915/dp_mst: Enable MST DSC decompression for all streams") Reported-and-tested-by: Drew Davenport <ddavenport@chromium.org> Closes: https://lore.kernel.org/all/Zb0q8IDVXS0HxJyj@chromium.org Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240205132631.1588577-1-imre.deak@intel.com (cherry picked from commit `a62e145981`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2024-03-06 15:34:03 +02:00
Janusz Krzysztofik	26d2b757ff	drm/i915/selftests: Fix dependency of some timeouts on HZ Third argument of i915_request_wait() accepts a timeout value in jiffies. Most users pass either a simple HZ based expression, or a result of msecs_to_jiffies(), or MAX_SCHEDULE_TIMEOUT, or a very small number not exceeding 4 if applicable as that value. However, there is one user -- intel_selftest_wait_for_rq() -- that passes a WAIT_FOR_RESET_TIME symbol, defined as a large constant value that most probably represents a desired timeout in ms. While that usage results in the intended value of timeout on usual x86_64 kernel configurations, it is not portable across different architectures and custom kernel configs. Rename the symbol to clearly indicate intended units and convert it to jiffies before use. Fixes: `3a4bfa091c` ("drm/i915/selftest: Fix workarounds selftest for GuC submission") Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Rahul Kumar Singh <rahul.kumar.singh@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240222113347.648945-2-janusz.krzysztofik@linux.intel.com (cherry picked from commit `6ee3f54b88`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2024-03-06 15:33:57 +02:00
Edward Adam Davis	c055fc00c0	net/rds: fix WARNING in rds_conn_connect_if_down If connection isn't established yet, get_mr() will fail, trigger connection after get_mr(). Fixes: `584a8279a4` ("RDS: RDMA: return appropriate error on rdma map failures") Reported-and-tested-by: syzbot+d4faee732755bba9838e@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-03-06 11:58:42 +00:00
Xiubo Li	321e3c3de5	libceph: init the cursor when preparing sparse read in msgr2 The cursor is no longer initialized in the OSD client, causing the sparse read state machine to fall into an infinite loop. The cursor should be initialized in IN_S_PREPARE_SPARSE_DATA state. [ idryomov: use msg instead of con->in_msg, changelog ] Link: https://tracker.ceph.com/issues/64607 Fixes: `8e46a2d068` ("libceph: just wait for more data to be available on the socket") Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2024-03-06 12:43:01 +01:00
David S. Miller	f287d6aafd	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-03-05 (idpf, ice, i40e, igc, e1000e) This series contains updates to idpf, ice, i40e, igc and e1000e drivers. Emil disables local BH on NAPI schedule for proper handling of softirqs on idpf. Jake stops reporting of virtchannel RSS option which in unsupported on ice. Rand Deeb adds null check to prevent possible null pointer dereference on ice. Michal Schmidt moves DPLL mutex initialization to resolve uninitialized mutex usage for ice. Jesse fixes incorrect variable usage for calculating Tx stats on ice. Ivan Vecera corrects logic for firmware equals check on i40e. Florian Kauer prevents memory corruption for XDP_REDIRECT on igc. Sasha reverts an incorrect use of FIELD_GET which caused a regression for Wake on LAN on e1000e. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-03-06 10:28:02 +00:00
Linus Torvalds	a50026bdb8	iov_iter: get rid of 'copy_mc' flag This flag is only set by one single user: the magical core dumping code that looks up user pages one by one, and then writes them out using their kernel addresses (by using a BVEC_ITER). That actually ends up being a huge problem, because while we do use copy_mc_to_kernel() for this case and it is able to handle the possible machine checks involved, nothing else is really ready to handle the failures caused by the machine check. In particular, as reported by Tong Tiangen, we don't actually support fault_in_iov_iter_readable() on a machine check area. As a result, the usual logic for writing things to a file under a filesystem lock, which involves doing a copy with page faults disabled and then if that fails trying to fault pages in without holding the locks with fault_in_iov_iter_readable() does not work at all. We could decide to always just make the MC copy "succeed" (and filling the destination with zeroes), and that would then create a core dump file that just ignores any machine checks. But honestly, this single special case has been problematic before, and means that all the normal iov_iter code ends up slightly more complex and slower. See for example commit `c9eec08bac` ("iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()") where David Howells re-organized the code just to avoid having to check the 'copy_mc' flags inside the inner iov_iter loops. So considering that we have exactly one user, and that one user is a non-critical special case that doesn't actually ever trigger in real life (Tong found this with manual error injection), the sane solution is to just decide that the onus on handling the machine check lines on that user instead. Ergo, do the copy_mc_to_kernel() in the core dump logic itself, copying the user data to a stable kernel page before writing it out. Fixes: `f1982740f5` ("iov_iter: Convert iterate*() to inline funcs") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Link: https://lore.kernel.org/r/20240305133336.3804360-1-tongtiangen@huawei.com Link: https://lore.kernel.org/all/4e80924d-9c85-f13a-722a-6a5d2b1c225a@huawei.com/ Tested-by: David Howells <dhowells@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reported-by: Tong Tiangen <tongtiangen@huawei.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-03-06 10:52:12 +01:00
Steffen Klassert	2ce0eae694	Merge branch 'Improve packet offload for dual stack' Mike Yu says: ==================== In the XFRM stack, whether a packet is forwarded to the IPv4 or IPv6 stack depends on the family field of the matched SA. This does not completely work for IPsec packet offload in some scenario, for example, sending an IPv6 packet that will be encrypted and encapsulated as an IPv4 packet in HW. Here are the patches to make IPsec packet offload work on the mentioned scenario. ==================== Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-03-06 10:33:24 +01:00
Dan Carpenter	bd17b7c34f	RAS/AMD/FMPM: Fix off by one when unwinding on error Decrement the index variable i before the first iteration when freeing the remaining elements on error. Depending on where this fails it could free something from one element beyond the end of the fru_records[] array. [ bp: Massage commit message. ] Fixes: `6f15e617cc` ("RAS: Introduce a FRU memory poison manager") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/6fdec71a-846b-4cd0-af69-e5f6cd12f4f6@moroto.mountain	2024-03-06 10:22:19 +01:00
Arnd Bergmann	1c7cfb6158	RISC-V firmware drivers for v6.9 A single minor fix for an oversized allocation due to sizeof() misuse by yours truly that came in since I sent my last fixes PR. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRh246EGq/8RLhDjO14tDGHoIJi0gUCZed5RAAKCRB4tDGHoIJi 0qRIAQCeMKuoQNOHXQtC3u5FKGcpsZPctEnbhg2pWK6ZOwdqjwEAsEJLqApFw4cG oB9cR3O3mtN49aOBM3w700Hafykv9Ak= =oU9U -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmXoEEAACgkQYKtH/8kJ Uic1PRAAr/Fc1qOB6gp05NPqXDuHvFGU/qYCMINss8n33mXuyfG3HnR8YBoybEnH mm0cpVn1VqsJbixiM1d07/luRfBZXiMCQw55UNwbU7PVyBHc9at9QnruYX4uFRsZ V6AgTkni1QDDMYANFgvs9dJv+r4Ul1gusqSj/z71an/A6d2fmsVXlFkH7ZXsGhGR PJR5NDLjuqJIR2kGKC23QvrQWY+Kuf90+6FExVOc/3ZnUS5Eg0QrbBf1vd8JYz+o wNR/IeHO9anvoe1tSNJDpHDdCKl+88SiJGjUcMI/H4ncdkGslBuErzBB2cwRHWWC JHD4ZbAB5qFNHaZqUTAuHkAyMdZEPxibDdtW084thD8X7vFJe27JuRXfMJTHWRKt jemhC9mcZ/lEM1neShSZe7jdQLzzjEEdNSDe4dcMihstGu3yHDXuhzzf6AxXh5bU GvihHMQ4ENnNSUgSDC6FcW+HOaakBL728JE6ySAIP/7mSICF9MVasyT+NO0DLTEC 67rmGoealy2c//dmtcslYDQzeFbaNtL4FpLVkc1uFotQ5FFHsXzbwkETAZOWZwbL 0nT5OnpdixwB2389RbAwXVrpstrgjYxRwKehNGa8LgIxKcsYVtROLRiwTs474g6a PsYTW7GfCJg+SsT1A4cZ7sV8lP3cn5WwewZ+0lRjSTmdWgIQ294= =OVem -----END PGP SIGNATURE----- Merge tag 'riscv-firmware-for-v6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/fixes RISC-V firmware drivers for v6.9 A single minor fix for an oversized allocation due to sizeof() misuse by yours truly that came in since I sent my last fixes PR. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> * tag 'riscv-firmware-for-v6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux: firmware: microchip: Fix over-requested allocation size Link: https://lore.kernel.org/r/20240305-vicinity-dumpling-8943ef26f004@spud Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-03-06 07:42:08 +01:00
Arnd Bergmann	415ba4ed59	A few more Qualcomm Arm64 DeviceTree fixes for v6.8 This reduces the link speed of the PCIe bus with WiFi-card connected on the Lenovo ThinkPad X13s and the Qualcomm Compute Reference Device, avoid link errors and initialization issues reported by users. It also reverts the enablement of MPM on MSM8996, which is reported to prevent boards on this platform from booting for some users. -----BEGIN PGP SIGNATURE----- iQJJBAABCAAzFiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAmXn3tYVHGFuZGVyc3Nv bkBrZXJuZWwub3JnAAoJEAsfOT8Nma3FoccP/3W+e1BQWPq6ux2c2yDScu7eH2SV K/Rdrnszt9gbEZgFeXJIo9ZWfq0sOnuRcAgk9WJ8nmwTeV7zaIQRvIdmfJYexRDr Vq/fDzthl4KgX6vTi2AAcAEjFGACEvFWrwXvllCOnStiPrtvg97JY7dFuCLu2RGQ LhhqJYk6pkDlg/m2O2LKJTjVzLYaUaYYqmuMvWN6jfYZ0mYIHAIKfVcbRKXoutqE TTKCRAS8LGBfQ7JIvDHS6Pmedh4fchGDsoOA1GxmbESBjabhKCliSWv/p0K4Yyi6 4fHU9XEhpyjp2jqqKP/pQIjxQwk8PyUCLACOMH8ODJkkcIxarAFdOrf6kGEFA7nh J1yGTOZhZfM1Icsv/7VryRFjGykkCX+peFrIdl51zN9Hxb+2zMO7X/TZjjesZD5f 0olU0V/7K12wMpYkL9oXzqNff3nBe7vJDIqkZIT8ZA9gCe0ya9oCrG0JwyxgRxka G8NZVW9QJpAMEDKiNgvM+GqYYFblxBM5w4kKr8traIBMZM7HlJyEoOvsC91M9YNv EVepO7UEVD3VFbB3HeD6Qdu6ndChviv5Mj2wwqKAEvUH2w+ZknoC5UTNDvv0jxyU RmYRVyQ9+Ng9wd8lWLRV81QH3L7rqTty7Gj+Xpp9jBLYXR6Qg3cO9hs3CRdoChUy oRtwJCSctL0T7tLT =RFVJ -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmXoDAEACgkQYKtH/8kJ Uifk3BAAxfXhYwDb2fI7qQuP10WbDSkTFs4Y7l8xsn1K9gZYMESf2G2IXzBbo/42 xx00+U4KWlWxx9sFY5vEHRXuaPDdNAWEsW6ncUVnchGv1ilQd7Rho0O2DgHZtGY+ XrAAoC8ZbXI2CJYVUVPT303Tu8H48IMtmsWEZckgGfVdvak2o8nKB45QZsXRsamv NsOVx8NEmFEeR7+By81c/SHBXpILvXqj7Nrw029/81o6NWSJowz8DCSMzXOX0byF DusPKGoDlOA0C3QC5NcRLUK6XVlrCvxelUCEH5IWlKAkQkaDDAHQYxTE4EIIvK8a KGqHyC/+5BjzT3PuuDCH15DJvg/scHQvIt/xchS4BbjheW2c7opMtZhWzSTN6INZ /TQMwyKfUdbdGDxGIt7GPsLg12PmeDgmIlNYiL66u2JVQhaXIXa97l1ol9lKs1gw B1VN/HgdkGP9XVev5xr4K1VyKIRJYV0+MEhnUpNRI591tPREy9mSJZXigncIou9e 8qvKKdMgdp8h5ErHehQyvu2yGxVp/v8ARMD5a4TRVNAx+97zftrxDEISt7EYgEDg uZrBsbw0TOXsNJS5dxrGR7PvgYp9aEdBXxQReQSrHRd76nT1/kCgB28hShHz0VoX u6EcZWHhNuHB6HS88nK4W2mEDuGWZCFXOb2yTlTM7pMlsNXcKkQ= =OVcf -----END PGP SIGNATURE----- Merge tag 'qcom-arm64-fixes-for-6.8-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes A few more Qualcomm Arm64 DeviceTree fixes for v6.8 This reduces the link speed of the PCIe bus with WiFi-card connected on the Lenovo ThinkPad X13s and the Qualcomm Compute Reference Device, avoid link errors and initialization issues reported by users. It also reverts the enablement of MPM on MSM8996, which is reported to prevent boards on this platform from booting for some users. * tag 'qcom-arm64-fixes-for-6.8-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: Revert "arm64: dts: qcom: msm8996: Hook up MPM" arm64: dts: qcom: sc8280xp-x13s: limit pcie4 link speed arm64: dts: qcom: sc8280xp-crd: limit pcie4 link speed Link: https://lore.kernel.org/r/20240306031208.4218-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-03-06 07:24:01 +01:00
Tobias Jakobi (Compleo)	b7fb7729c9	net: dsa: microchip: fix register write order in ksz8_ind_write8() This bug was noticed while re-implementing parts of the kernel driver in userspace using spidev. The goal was to enable some of the errata workarounds that Microchip describes in their errata sheet [1]. Both the errata sheet and the regular datasheet of e.g. the KSZ8795 imply that you need to do this for indirect register accesses: - write a 16-bit value to a control register pair (this value consists of the indirect register table, and the offset inside the table) - either read or write an 8-bit value from the data storage register (indicated by REG_IND_BYTE in the kernel) The current implementation has the order swapped. It can be proven, by reading back some indirect register with known content (the EEE register modified in ksz8_handle_global_errata() is one of these), that this implementation does not work. Private discussion with Oleksij Rempel of Pengutronix has revealed that the workaround was apparantly never tested on actual hardware. [1] https://ww1.microchip.com/downloads/aemDocuments/documents/OTH/ProductDocuments/Errata/KSZ87xx-Errata-DS80000687C.pdf Signed-off-by: Tobias Jakobi (Compleo) <tobias.jakobi.compleo@gmail.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Fixes: `7b6e6235b6` ("net: dsa: microchip: ksz8795: handle eee specif erratum") Link: https://lore.kernel.org/r/20240304154135.161332-1-tobias.jakobi.compleo@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 19:10:49 -08:00
Jakub Kicinski	289e922582	dpll: move all dpll<>netdev helpers to dpll code Older versions of GCC really want to know the full definition of the type involved in rcu_assign_pointer(). struct dpll_pin is defined in a local header, net/core can't reach it. Move all the netdev <> dpll code into dpll, where the type is known. Otherwise we'd need multiple function calls to jump between the compilation units. This is the same problem the commit under fixes was trying to address, but with rcu_assign_pointer() not rcu_dereference(). Some of the exports are not needed, networking core can't be a module, we only need exports for the helpers used by drivers. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/all/35a869c8-52e8-177-1d4d-e57578b99b6@linux-m68k.org/ Fixes: `640f41ed33` ("dpll: fix build failure due to rcu_dereference_check() on unknown type") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240305013532.694866-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 18:36:42 -08:00
Toke Høiland-Jørgensen	2487007aa3	cpumap: Zero-initialise xdp_rxq_info struct before running XDP program When running an XDP program that is attached to a cpumap entry, we don't initialise the xdp_rxq_info data structure being used in the xdp_buff that backs the XDP program invocation. Tobias noticed that this leads to random values being returned as the xdp_md->rx_queue_index value for XDP programs running in a cpumap. This means we're basically returning the contents of the uninitialised memory, which is bad. Fix this by zero-initialising the rxq data structure before running the XDP program. Fixes: `9216477449` ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap") Reported-by: Tobias Böhm <tobias@aibor.de> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20240305213132.11955-1-toke@redhat.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-03-05 16:48:53 -08:00
Daniel Borkmann	0bfc0336e1	selftests/bpf: Fix up xdp bonding test wrt feature flags Adjust the XDP feature flags for the bond device when no bond slave devices are attached. After `9b0ed890ac` ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY"), the empty bond device must report 0 as flags instead of NETDEV_XDP_ACT_MASK. # ./vmtest.sh -- ./test_progs -t xdp_bond [...] [ 3.983311] bond1 (unregistering): (slave veth1_1): Releasing backup interface [ 3.995434] bond1 (unregistering): Released all slaves [ 4.022311] bond2: (slave veth2_1): Releasing backup interface #507/1 xdp_bonding/xdp_bonding_attach:OK #507/2 xdp_bonding/xdp_bonding_nested:OK #507/3 xdp_bonding/xdp_bonding_features:OK #507/4 xdp_bonding/xdp_bonding_roundrobin:OK #507/5 xdp_bonding/xdp_bonding_activebackup:OK #507/6 xdp_bonding/xdp_bonding_xor_layer2:OK #507/7 xdp_bonding/xdp_bonding_xor_layer23:OK #507/8 xdp_bonding/xdp_bonding_xor_layer34:OK #507/9 xdp_bonding/xdp_bonding_redirect_multi:OK #507 xdp_bonding:OK Summary: 1/9 PASSED, 0 SKIPPED, 0 FAILED [ 4.185255] bond2 (unregistering): Released all slaves [...] Fixes: `9b0ed890ac` ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Message-ID: <20240305090829.17131-2-daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-03-05 16:19:42 -08:00
Daniel Borkmann	f267f26281	xdp, bonding: Fix feature flags when there are no slave devs anymore Commit `9b0ed890ac` ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY") changed the driver from reporting everything as supported before a device was bonded into having the driver report that no XDP feature is supported until a real device is bonded as it seems to be more truthful given eventually real underlying devices decide what XDP features are supported. The change however did not take into account when all slave devices get removed from the bond device. In this case after `9b0ed890ac`, the driver keeps reporting a feature mask of 0x77, that is, NETDEV_XDP_ACT_MASK & ~NETDEV_XDP_ACT_XSK_ZEROCOPY whereas it should have reported a feature mask of 0. Fix it by resetting XDP feature flags in the same way as if no XDP program is attached to the bond device. This was uncovered by the XDP bond selftest which let BPF CI fail. After adjusting the starting masks on the latter to 0 instead of NETDEV_XDP_ACT_MASK the test passes again together with this fix. Fixes: `9b0ed890ac` ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Cc: Prashant Batra <prbatra.mail@gmail.com> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Message-ID: <20240305090829.17131-1-daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-03-05 16:19:42 -08:00
Alexei Starovoitov	399eca1bd4	Merge branch 'check-bpf_func_state-callback_depth-when-pruning-states' Eduard Zingerman says: ==================== check bpf_func_state->callback_depth when pruning states This patch-set fixes bug in states pruning logic hit in mailing list discussion [0]. The details of the fix are in patch #1. The main idea for the fix belongs to Yonghong Song, mine contribution is merely in review and test cases. There are some changes in verification performance: File Program Insns (DIFF) States (DIFF) ------------------------- ------------- --------------- -------------- pyperf600_bpf_loop.bpf.o on_event +15 (+0.42%) +0 (+0.00%) strobemeta_bpf_loop.bpf.o on_event +857 (+37.95%) +60 (+38.96%) xdp_synproxy_kern.bpf.o syncookie_tc +2892 (+30.39%) +109 (+36.33%) xdp_synproxy_kern.bpf.o syncookie_xdp +2892 (+30.01%) +109 (+36.09%) (when tested on a subset of selftests identified by selftests/bpf/veristat.cfg and Cilium bpf object files from [4]) Changelog: v2 [2] -> v3: - fixes for verifier.c commit message as suggested by Yonghong; - patch-set re-rerouted to 'bpf' tree as suggested in [2]; - patch for test_tcp_custom_syncookie is sent separately to 'bpf-next' [3]. - veristat results updated using 'bpf' tree as baseline and clang 16. v1 [1] -> v2: - patch #2 commit message updated to better reflect verifier behavior with regards to checkpoints tree (suggested by Yonghong); - veristat results added (suggested by Andrii). [0] https://lore.kernel.org/bpf/9b251840-7cb8-4d17-bd23-1fc8071d8eef@linux.dev/ [1] https://lore.kernel.org/bpf/20240212143832.28838-1-eddyz87@gmail.com/ [2] https://lore.kernel.org/bpf/20240216150334.31937-1-eddyz87@gmail.com/ [3] https://lore.kernel.org/bpf/20240222150300.14909-1-eddyz87@gmail.com/ [4] https://github.com/anakryiko/cilium ==================== Link: https://lore.kernel.org/r/20240222154121.6991-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-03-05 16:15:57 -08:00
Eduard Zingerman	5c2bc5e2f8	selftests/bpf: test case for callback_depth states pruning logic The test case was minimized from mailing list discussion [0]. It is equivalent to the following C program: struct iter_limit_bug_ctx { __u64 a; __u64 b; __u64 c; }; static __naked void iter_limit_bug_cb(void) { switch (bpf_get_prandom_u32()) { case 1: ctx->a = 42; break; case 2: ctx->b = 42; break; default: ctx->c = 42; break; } } int iter_limit_bug(struct __sk_buff *skb) { struct iter_limit_bug_ctx ctx = { 7, 7, 7 }; bpf_loop(2, iter_limit_bug_cb, &ctx, 0); if (ctx.a == 42 && ctx.b == 42 && ctx.c == 7) asm volatile("r1 /= 0;":::"r1"); return 0; } The main idea is that each loop iteration changes one of the state variables in a non-deterministic manner. Hence it is premature to prune the states that have two iterations left comparing them to states with one iteration left. E.g. {{7,7,7}, callback_depth=0} can reach state {42,42,7}, while {{7,7,7}, callback_depth=1} can't. [0] https://lore.kernel.org/bpf/9b251840-7cb8-4d17-bd23-1fc8071d8eef@linux.dev/ Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240222154121.6991-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-03-05 16:15:56 -08:00
Eduard Zingerman	e9a8e5a587	bpf: check bpf_func_state->callback_depth when pruning states When comparing current and cached states verifier should consider bpf_func_state->callback_depth. Current state cannot be pruned against cached state, when current states has more iterations left compared to cached state. Current state has more iterations left when it's callback_depth is smaller. Below is an example illustrating this bug, minimized from mailing list discussion [0] (assume that BPF_F_TEST_STATE_FREQ is set). The example is not a safe program: if loop_cb point (1) is followed by loop_cb point (2), then division by zero is possible at point (4). struct ctx { __u64 a; __u64 b; __u64 c; }; static void loop_cb(int i, struct ctx ctx) { / assume that generated code is "fallthrough-first": * if ... == 1 goto * if ... == 2 goto * <default> / switch (bpf_get_prandom_u32()) { case 1: / 1 / ctx->a = 42; return 0; break; case 2: / 2 / ctx->b = 42; return 0; break; default: / 3 / ctx->c = 42; return 0; break; } } SEC("tc") __failure __flag(BPF_F_TEST_STATE_FREQ) int test(struct __sk_buff skb) { struct ctx ctx = { 7, 7, 7 }; bpf_loop(2, loop_cb, &ctx, 0); /* 0 / / assume generated checks are in-order: .a first / if (ctx.a == 42 && ctx.b == 42 && ctx.c == 7) asm volatile("r0 /= 0;":::"r0"); / 4 */ return 0; } Prior to this commit verifier built the following checkpoint tree for this example: .------------------------------------- Checkpoint / State name \| .-------------------------------- Code point number \| \| .---------------------------- Stack state {ctx.a,ctx.b,ctx.c} \| \| \| .------------------- Callback depth in frame #0 v v v v - (0) {7P,7P,7},depth=0 - (3) {7P,7P,7},depth=1 - (0) {7P,7P,42},depth=1 - (3) {7P,7,42},depth=2 - (0) {7P,7,42},depth=2 loop terminates because of depth limit - (4) {7P,7,42},depth=0 predicted false, ctx.a marked precise - (6) exit (a) - (2) {7P,7,42},depth=2 - (0) {7P,42,42},depth=2 loop terminates because of depth limit - (4) {7P,42,42},depth=0 predicted false, ctx.a marked precise - (6) exit (b) - (1) {7P,7P,42},depth=2 - (0) {42P,7P,42},depth=2 loop terminates because of depth limit - (4) {42P,7P,42},depth=0 predicted false, ctx.{a,b} marked precise - (6) exit - (2) {7P,7,7},depth=1 considered safe, pruned using checkpoint (a) (c) - (1) {7P,7P,7},depth=1 considered safe, pruned using checkpoint (b) Here checkpoint (b) has callback_depth of 2, meaning that it would never reach state {42,42,7}. While checkpoint (c) has callback_depth of 1, and thus could yet explore the state {42,42,7} if not pruned prematurely. This commit makes forbids such premature pruning, allowing verifier to explore states sub-tree starting at (c): (c) - (1) {7,7,7P},depth=1 - (0) {42P,7,7P},depth=1 ... - (2) {42,7,7},depth=2 - (0) {42,42,7},depth=2 loop terminates because of depth limit - (4) {42,42,7},depth=0 predicted true, ctx.{a,b,c} marked precise - (5) division by zero [0] https://lore.kernel.org/bpf/9b251840-7cb8-4d17-bd23-1fc8071d8eef@linux.dev/ Fixes: `bb124da69c` ("bpf: keep track of max number of bpf_loop callback iterations") Suggested-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240222154121.6991-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-03-05 16:15:56 -08:00
Linus Torvalds	5847c9777c	cgroup: Fixes for v6.8-rc7 Two cpuset fixes. Both are for bugs in error handling paths and low risk. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZeeUSA4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGaTjAQC+6pLyiH2j2XpncJ2BFID+LA5ljExmJpcRv/yb YAerogEA+QmOz6poIo+VO+qy+uoFxklarGY8fj1wFKXYeNsuJgw= =/5Ll -----END PGP SIGNATURE----- Merge tag 'cgroup-for-6.8-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Two cpuset fixes. Both are for bugs in error handling paths and low risk" * tag 'cgroup-for-6.8-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/cpuset: Fix retval in update_cpumask() cgroup/cpuset: Fix a memory leak in update_exclusive_cpumask()	2024-03-05 14:00:22 -08:00
Linus Torvalds	29cd507cbe	integrity-v6.8-fix -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCZecAcxQcem9oYXJAbGlu dXguaWJtLmNvbQAKCRDLwZzRsCrn5cmbAQCNbLlpCsunBCrU44kJ3BPt3xzSj4lQ LWsWgeKTspNz+AEA1uz/kqOxIlWehtv9urhF3NKWFva2E8Gr8zfxkHpTAgo= =uA3k -----END PGP SIGNATURE----- Merge tag 'integrity-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity fix from Mimi Zohar: "A single fix to eliminate an unnecessary message" * tag 'integrity-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: integrity: eliminate unnecessary "Problem loading X.509 certificate" msg	2024-03-05 13:21:30 -08:00
Song Liu	3a889fdce7	Merge branch 'dmraid-fix-6.9' into md-6.9 This is the second half of fixes for dmraid. The first half is available at [1]. This set contains fixes: - reshape can start unexpected, cause data corruption, patch 1,5,6; - deadlocks that reshape concurrent with IO, patch 8; - a lockdep warning, patch 9; For all the dmraid related tests in lvm2 suite, there is no new regressions compared against 6.6 kernels (which is good baseline before recent regressions). [1] https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/ * dmraid-fix-6.9: dm-raid: fix lockdep waring in "pers->hot_add_disk" dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape dm-raid: add a new helper prepare_suspend() in md_personality md/dm-raid: don't call md_reap_sync_thread() directly dm-raid: really frozen sync_thread during suspend md: add a new helper reshape_interrupted() md: export helper md_is_rdwr() md: export helpers to stop sync_thread md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume	2024-03-05 12:53:55 -08:00
Yu Kuai	95009ae904	dm-raid: fix lockdep waring in "pers->hot_add_disk" The lockdep assert is added by commit `a448af25be` ("md/raid10: remove rcu protection to access rdev from conf") in print_conf(). And I didn't notice that dm-raid is calling "pers->hot_add_disk" without holding 'reconfig_mutex'. "pers->hot_add_disk" read and write many fields that is protected by 'reconfig_mutex', and raid_resume() already grab the lock in other contex. Hence fix this problem by protecting "pers->host_add_disk" with the lock. Fixes: `9092c02d94` ("DM RAID: Add ability to restore transiently failed devices on resume") Fixes: `a448af25be` ("md/raid10: remove rcu protection to access rdev from conf") Cc: stable@vger.kernel.org # v6.7+ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Xiao Ni <xni@redhat.com> Acked-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240305072306.2562024-10-yukuai1@huaweicloud.com	2024-03-05 12:53:33 -08:00
Yu Kuai	41425f96d7	dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape For raid456, if reshape is still in progress, then IO across reshape position will wait for reshape to make progress. However, for dm-raid, in following cases reshape will never make progress hence IO will hang: 1) the array is read-only; 2) MD_RECOVERY_WAIT is set; 3) MD_RECOVERY_FROZEN is set; After commit `c467e97f07` ("md/raid6: use valid sector values to determine if an I/O should wait on the reshape") fix the problem that IO across reshape position doesn't wait for reshape, the dm-raid test shell/lvconvert-raid-reshape.sh start to hang: [root@fedora ~]# cat /proc/979/stack [<0>] wait_woken+0x7d/0x90 [<0>] raid5_make_request+0x929/0x1d70 [raid456] [<0>] md_handle_request+0xc2/0x3b0 [md_mod] [<0>] raid_map+0x2c/0x50 [dm_raid] [<0>] __map_bio+0x251/0x380 [dm_mod] [<0>] dm_submit_bio+0x1f0/0x760 [dm_mod] [<0>] __submit_bio+0xc2/0x1c0 [<0>] submit_bio_noacct_nocheck+0x17f/0x450 [<0>] submit_bio_noacct+0x2bc/0x780 [<0>] submit_bio+0x70/0xc0 [<0>] mpage_readahead+0x169/0x1f0 [<0>] blkdev_readahead+0x18/0x30 [<0>] read_pages+0x7c/0x3b0 [<0>] page_cache_ra_unbounded+0x1ab/0x280 [<0>] force_page_cache_ra+0x9e/0x130 [<0>] page_cache_sync_ra+0x3b/0x110 [<0>] filemap_get_pages+0x143/0xa30 [<0>] filemap_read+0xdc/0x4b0 [<0>] blkdev_read_iter+0x75/0x200 [<0>] vfs_read+0x272/0x460 [<0>] ksys_read+0x7a/0x170 [<0>] __x64_sys_read+0x1c/0x30 [<0>] do_syscall_64+0xc6/0x230 [<0>] entry_SYSCALL_64_after_hwframe+0x6c/0x74 This is because reshape can't make progress. For md/raid, the problem doesn't exist because register new sync_thread doesn't rely on the IO to be done any more: 1) If array is read-only, it can switch to read-write by ioctl/sysfs; 2) md/raid never set MD_RECOVERY_WAIT; 3) If MD_RECOVERY_FROZEN is set, mddev_suspend() doesn't hold 'reconfig_mutex', hence it can be cleared and reshape can continue by sysfs api 'sync_action'. However, I'm not sure yet how to avoid the problem in dm-raid yet. This patch on the one hand make sure raid_message() can't change sync_thread() through raid_message() after presuspend(), on the other hand detect the above 3 cases before wait for IO do be done in dm_suspend(), and let dm-raid requeue those IO. Cc: stable@vger.kernel.org # v6.7+ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Xiao Ni <xni@redhat.com> Acked-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240305072306.2562024-9-yukuai1@huaweicloud.com	2024-03-05 12:53:33 -08:00
Yu Kuai	5625ff8b72	dm-raid: add a new helper prepare_suspend() in md_personality There are no functional changes for now, prepare to fix a deadlock for dm-raid456. Cc: stable@vger.kernel.org # v6.7+ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Xiao Ni <xni@redhat.com> Acked-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240305072306.2562024-8-yukuai1@huaweicloud.com	2024-03-05 12:53:33 -08:00
Yu Kuai	cd32b27a66	md/dm-raid: don't call md_reap_sync_thread() directly Currently md_reap_sync_thread() is called from raid_message() directly without holding 'reconfig_mutex', this is definitely unsafe because md_reap_sync_thread() can change many fields that is protected by 'reconfig_mutex'. However, hold 'reconfig_mutex' here is still problematic because this will cause deadlock, for example, commit `130443d60b` ("md: refactor idle/frozen_sync_thread() to fix deadlock"). Fix this problem by using stop_sync_thread() to unregister sync_thread, like md/raid did. Fixes: `be83651f00` ("DM RAID: Add message/status support for changing sync action") Cc: stable@vger.kernel.org # v6.7+ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Xiao Ni <xni@redhat.com> Acked-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240305072306.2562024-7-yukuai1@huaweicloud.com	2024-03-05 12:53:32 -08:00

... 2 3 4 5 6 ...

1252364 Commits