linux

Author	SHA1	Message	Date
Kajetan Puchalski	6ab4b19900	cpuidle: Add cpu_idle_miss trace event Add a trace event for cpuidle to track missed (too deep or too shallow) wakeups. After each wakeup, CPUIdle already computes whether the entered state was optimal, above or below the desired one and updates the relevant counters. This patch makes it possible to trace those events in addition to just reading the counters. The patterns of types and percentages of misses across different workloads appear to be very consistent. This makes the trace event very useful for comparing the relative correctness of different CPUIdle governors for different types of workloads, or for finding the optimal governor for a given device. Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-08-03 17:50:58 +02:00
Rafael J. Wysocki	f6e0b468da	OPP updates for 5.20-rc1 - Make dev_pm_opp_set_regulators() accept NULL terminated list (Viresh Kumar). - Add dev_pm_opp_set_config() and friends and migrate other users/helpers to using them (Viresh Kumar). - Add support for multiple clocks for a device (Viresh Kumar and Krzysztof Kozlowski). - Configure resources before adding OPP table for Venus (Stanimir Varbanov). - Keep reference count up for opp->np and opp_table->np while they are still in use (Liang He). - Minor cleanups (Viresh Kumar and Yang Li). -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmLoprIACgkQ0rkcPK6B Ehz2nBAApgYUDGkEjWcJufIxW1mH77uonzmWUV2jQBEcCvYnjdwhJ0RpQUT75Xnk hTYJ5v9UKwOVl+puPguUe7UzSmWcsI9AzJCj0Vr/LBiln+sawoI51lqOaNjCJkmZ VytQJB23DNsYJAG/0xM42+syu+IONJ4vCP/9m35sWlevfFihbfQsEK+iEKsseVgd sEwPvHyixLWyeaoAf+6apOBP2Lf+/3R8h6Iv0U8n8jOzUpQQ5r/RSDyZeARP7gze 64aXvsvr7D0Mc9GpevDJKGtPFbRNfq5I4Lg5MOZ8NQVjXOqlWJil3oYEnKQxIH0Y EEzcrSuWi3SEeHrQfj+GFs/D7z2ZHqmbg7yb4P7zSeqLvG+7Ey9aYOXOg5LykrYk 1rZQzenLMF91RnhdRLI22SJngokOYZjWBFp62mPqmEYtx2VsYQlxqGtJoCHYDRx3 QRp0ZYJBnHQMt7saiIRFdAAIz7/G5lkiUplVzqAWe7AEpUG3Y7kvIqfwi69s3I5S ERSf3qqx3dUGFXYoxwglEwaf8ZvKQnPOzOLmbyc9Hrj2MclfKf9vW+0/4J6iiDlu ITpsqEWUhtEjwCt3lbM6PWNRrCJHi6YkKw0sORxEWR639cqckmk6ZAuaRPeOob6a nZ/UvwU2LMRG1zZyrsB3bbUkijJ019RPySmjCXApLsoNT1qpUr0= =NHBQ -----END PGP SIGNATURE----- Merge tag 'opp-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull operating performance points (OPP) updates for 5.20-rc1 from Viresh Kumar: "- Make dev_pm_opp_set_regulators() accept NULL terminated list (Viresh Kumar). - Add dev_pm_opp_set_config() and friends and migrate other users/helpers to using them (Viresh Kumar). - Add support for multiple clocks for a device (Viresh Kumar and Krzysztof Kozlowski). - Configure resources before adding OPP table for Venus (Stanimir Varbanov). - Keep reference count up for opp->np and opp_table->np while they are still in use (Liang He). - Minor cleanups (Viresh Kumar and Yang Li)." * tag 'opp-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (43 commits) venus: pm_helpers: Fix warning in OPP during probe OPP: Don't drop opp->np reference while it is still in use OPP: Don't drop opp_table->np reference while it is still in use OPP: Remove dev{m}_pm_opp_of_add_table_noclk() PM / devfreq: tegra30: Register config_clks helper OPP: Allow config_clks helper for single clk case OPP: Provide a simple implementation to configure multiple clocks OPP: Assert clk_count == 1 for single clk helpers OPP: Add key specific assert() method to key finding helpers OPP: Compare bandwidths for all paths in _opp_compare_key() OPP: Allow multiple clocks for a device dt-bindings: opp: accept array of frequencies OPP: Make dev_pm_opp_set_opp() independent of frequency OPP: Reuse _opp_compare_key() in _opp_add_static_v2() OPP: Remove rate_not_available parameter to _opp_add() OPP: Use consistent names for OPP table instances OPP: Use generic key finding helpers for bandwidth key OPP: Use generic key finding helpers for level key OPP: Add generic key finding helpers and use them for freq APIs OPP: Remove dev_pm_opp_find_freq_ceil_by_volt() ...	2022-08-03 17:49:38 +02:00
Rafael J. Wysocki	7912c9c6a6	Cpufreq/arm updates for 5.20-rc1 - Fix return error code in mtk_cpu_dvfs_info_init (Yang Yingliang). - Minor cleanups and support for new boards for Qcom cpufreq drivers (Bryan O'Donoghue, Konrad Dybcio, Pierre Gondois, and Yicong Yang). - Fix sparse warnings for Tegra driver (Viresh Kumar). -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmLoqRYACgkQ0rkcPK6B EhzcNA/6A52UpkbXwfd63B+hCeLPTjjvnn+Z1FeRUG+ymbEDcMYQ7Z+ae3clKQ86 MmkK9UkYae50riCz9lv10UihI5JtxVkyCiCvgASxmRAvKrbODqSOu0UmMKR+CwJ/ GryoB8xCL/aAencyb3qin4fZHlwdjqBP7mgRQtoRbxjT0liFonauoG84DPhWhgMi i22V83j4pZpDIPyfNN9AvBOiaGXx3n3zG0ODx1E7Y7P9Eg/3n1AdRPST3C4Y4y3X 79WbwBur13NKi9Ww5jw5guRKehNC8l79Ly80UQ4pRg2y4RtmyABX9qteOfvbj9Ix JFWV+EdDLhU+mk1hHMx3GzFDWsaUYS+ZrS9oDREZ06LkN1ZDf5OFOP8nxr7SRvSD 3ZYlVA2IClt1Q//GsRiGy7kJK1qFtWVVWEE/NHS27W6VorugoHiyfNx8jYYBaOez bNyrKklYlIFsgkRoVG0e2/2ZrzCeTVZLjsFl9992pnoDG8Fv9fBUpsHQajLkolJ4 KSZ35sxSvMzREKtuy4g8XeV2l9cES5FZ6B2iGqVo2BuoJdyqMfcQqtOyVbXAlkOz pXSPxVuNr5w9uVQhuE0cSAiqe6joMgE5jT+5A2dIThC34nRzDLTrmQohR31nDZRj nCsb3Pkee6j8DVwGVSNFuVJ0xvOnw1Sk8RCidiKZ9+Otu/blqaM= =fX64 -----END PGP SIGNATURE----- Merge tag 'cpufreq-arm-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull cpufreq/ARM updates for 5.20-rc1 from Viresh Kumar: "- Fix return error code in mtk_cpu_dvfs_info_init (Yang Yingliang). - Minor cleanups and support for new boards for Qcom cpufreq drivers (Bryan O'Donoghue, Konrad Dybcio, Pierre Gondois, and Yicong Yang). - Fix sparse warnings for Tegra driver (Viresh Kumar)." * tag 'cpufreq-arm-updates-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: tegra194: Staticize struct tegra_cpufreq_soc instances dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM6375 compatible dt-bindings: opp: Add msm8939 to the compatible list dt-bindings: opp: Add missing compat devices dt-bindings: opp: opp-v2-kryo-cpu: Fix example binding checks cpufreq: Change order of online() CB and policy->cpus modification cpufreq: qcom-hw: Remove deprecated irq_set_affinity_hint() call cpufreq: qcom-hw: Disable LMH irq when disabling policy cpufreq: qcom-hw: Reset cancel_throttle when policy is re-enabled cpufreq: qcom-cpufreq-hw: use HZ_PER_KHZ macro in units.h cpufreq: mediatek: fix error return code in mtk_cpu_dvfs_info_init()	2022-08-03 17:47:45 +02:00
Peng Fan	8a8dc2b959	mailbox: imx: clear pending interrupts During MU initialization, there maybe pending GSR and RSR pending interrupt, clear them to avoid unexpected kernel dump when requesting mailbox channel Reviewed-by: Jacky Bai <ping.bai@nxp.com> Reviewed-by: Ye Li <ye.li@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>	2022-08-03 09:48:13 -05:00
Zheyu Ma	40bf722f80	video: fbdev: i740fb: Check the argument of i740_calc_vclk() Since the user can control the arguments of the ioctl() from the user space, under special arguments that may result in a divide-by-zero bug. If the user provides an improper 'pixclock' value that makes the argumet of i740_calc_vclk() less than 'I740_RFREQ_FIX', it will cause a divide-by-zero bug in: drivers/video/fbdev/i740fb.c:353 p_best = min(15, ilog2(I740_MAX_VCO_FREQ / (freq / I740_RFREQ_FIX))); The following log can reveal it: divide error: 0000 [#1] PREEMPT SMP KASAN PTI RIP: 0010:i740_calc_vclk drivers/video/fbdev/i740fb.c:353 [inline] RIP: 0010:i740fb_decode_var drivers/video/fbdev/i740fb.c:646 [inline] RIP: 0010:i740fb_set_par+0x163f/0x3b70 drivers/video/fbdev/i740fb.c:742 Call Trace: fb_set_var+0x604/0xeb0 drivers/video/fbdev/core/fbmem.c:1034 do_fb_ioctl+0x234/0x670 drivers/video/fbdev/core/fbmem.c:1110 fb_ioctl+0xdd/0x130 drivers/video/fbdev/core/fbmem.c:1189 Fix this by checking the argument of i740_calc_vclk() first. Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>	2022-08-03 15:15:18 +02:00
Zheyu Ma	2f1c4523f7	video: fbdev: arkfb: Fix a divide-by-zero bug in ark_set_pixclock() Since the user can control the arguments of the ioctl() from the user space, under special arguments that may result in a divide-by-zero bug in: drivers/video/fbdev/arkfb.c:784: ark_set_pixclock(info, (hdiv * info->var.pixclock) / hmul); with hdiv=1, pixclock=1 and hmul=2 you end up with (1*1)/2 = (int) 0. and then in: drivers/video/fbdev/arkfb.c:504: rv = dac_set_freq(par->dac, 0, 1000000000 / pixclock); we'll get a division-by-zero. The following log can reveal it: divide error: 0000 [#1] PREEMPT SMP KASAN PTI RIP: 0010:ark_set_pixclock drivers/video/fbdev/arkfb.c:504 [inline] RIP: 0010:arkfb_set_par+0x10fc/0x24c0 drivers/video/fbdev/arkfb.c:784 Call Trace: fb_set_var+0x604/0xeb0 drivers/video/fbdev/core/fbmem.c:1034 do_fb_ioctl+0x234/0x670 drivers/video/fbdev/core/fbmem.c:1110 fb_ioctl+0xdd/0x130 drivers/video/fbdev/core/fbmem.c:1189 Fix this by checking the argument of ark_set_pixclock() first. Fixes: `681e14730c` ("arkfb: new framebuffer driver for ARK Logic cards") Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>	2022-08-03 15:13:15 +02:00
Siddh Raman Pant	625395c4a0	x86/numa: Use cpumask_available instead of hardcoded NULL check GCC-12 started triggering a new warning: arch/x86/mm/numa.c: In function ‘cpumask_of_node’: arch/x86/mm/numa.c:916:39: warning: the comparison will always evaluate as ‘false’ for the address of ‘node_to_cpumask_map’ will never be NULL [-Waddress] 916 \| if (node_to_cpumask_map[node] == NULL) { \| ^~ node_to_cpumask_map is of type cpumask_var_t[]. When CONFIG_CPUMASK_OFFSTACK is set, cpumask_var_t is typedef'd to a pointer for dynamic allocation, else to an array of one element. The "wicked game" can be checked on line 700 of include/linux/cpumask.h. The original code in debug_cpumask_set_cpu() and cpumask_of_node() were probably written by the original authors with CONFIG_CPUMASK_OFFSTACK=y (i.e. dynamic allocation) in mind, checking if the cpumask was available via a direct NULL check. When CONFIG_CPUMASK_OFFSTACK is not set, GCC gives the above warning while compiling the kernel. Fix that by using cpumask_available(), which does the NULL check when CONFIG_CPUMASK_OFFSTACK is set, otherwise returns true. Use it wherever such checks are made. Conditional definitions of cpumask_available() can be found along with the definition of cpumask_var_t. Check the cpumask.h reference mentioned above. Fixes: `c032ef60d1` ("cpumask: convert node_to_cpumask_map[] to cpumask_var_t") Fixes: `de2d9445f1` ("x86: Unify node_to_cpumask_map handling between 32 and 64bit") Signed-off-by: Siddh Raman Pant <code@siddh.me> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20220731160913.632092-1-code@siddh.me	2022-08-03 11:44:57 +02:00
Ben Dooks	87514b2c24	sched/rt: Fix Sparse warnings due to undefined rt.c declarations There are several symbols defined in kernel/sched/sched.h but get wrapped in CONFIG_CGROUP_SCHED, even though dummy versions get built in rt.c and therefore trigger Sparse warnings: kernel/sched/rt.c:309:6: warning: symbol 'unregister_rt_sched_group' was not declared. Should it be static? kernel/sched/rt.c:311:6: warning: symbol 'free_rt_sched_group' was not declared. Should it be static? kernel/sched/rt.c:313:5: warning: symbol 'alloc_rt_sched_group' was not declared. Should it be static? Fix this by moving them outside the CONFIG_CGROUP_SCHED block. [ mingo: Refreshed to the latest scheduler tree, tweaked changelog. ] Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20220721145155.358366-1-ben-linux@fluff.org	2022-08-03 11:22:37 +02:00
Hans-Christian Noren Egtvedt	2fb0ec4ae5	video:backlight: remove reference to AVR32 architecture in ltv350qv The AVR32 architecture does no longer exist in the Linux kernel, hence remove a reference to also non-existing Linux BSP CD from Atmel. Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>	2022-08-03 11:11:26 +02:00
Hans-Christian Noren Egtvedt	4492b0c089	video: remove support for non-existing atmel,at32ap-lcdc in atmel_lcdfb The AVR32 architecture has been removed from the kernel in commit `26202873bb`, hence clean out the atmel,at32ap-lcdc parts in the atmel_lcdfb.c video driver. AVR32 architecture never supported device tree, hence this code was not used by anybody. Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>	2022-08-03 11:11:26 +02:00
Hans-Christian Noren Egtvedt	93dd2f713a	usb:udc: remove reference to AVR32 architecture in Atmel USBA Kconfig The AVR32 architecture does no longer exist in the Linux kernel, hence remove a reference to it in Kconfig help text to avoid confusion. Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>	2022-08-03 11:11:26 +02:00
Hans-Christian Noren Egtvedt	0a2fd172b4	sound:spi: remove reference to AVR32 in Atmel AT73C213 DAC driver The AVR32 architecture does no longer exist in the Linux kernel, hence remove a reference to it in Kconfig help text to avoid confusion. Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>	2022-08-03 11:11:26 +02:00
Hans-Christian Noren Egtvedt	8bfdfbb258	net: remove cdns,at32ap7000-macb device tree entry The AVR32 architecture has been removed from the kernel in commit `26202873bb`, hence clean out the cdns,at32ap7000-macb compatible entry in Cadence macb Ethernet driver. AVR32 architecture never supported device tree, hence this code was not used by anybody. Updated documentation to match the default entry, no users of cdns,at32ap7000-macb in the kernel tree. Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>	2022-08-03 11:03:03 +02:00
Hans-Christian Noren Egtvedt	62bf2fa70b	misc: update maintainer email address and description for atmel-ssc I have changed my overall maintainer email address to the samfundet.no domain, hence update the atmel-ssc module to reflect that. Also remove the AVR32 reference, since the AVR32 architecture no longer exist in the Linux kernel. Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>	2022-08-03 11:03:03 +02:00
Hans-Christian Noren Egtvedt	53291cb23c	mfd: remove reference to AVR32 architecture in atmel-smc.c The AVR32 architecture does no longer exist in the Linux kernel, hence remove a reference to it in comments to avoid confusion. Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>	2022-08-03 11:03:03 +02:00
Hans-Christian Noren Egtvedt	300a596590	dma:dw: remove reference to AVR32 architecture in core.c The AVR32 architecture does no longer exist in the Linux kernel, hence remove a reference to it in comments to avoid confusion. Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>	2022-08-03 11:03:02 +02:00
Ingo Molnar	dcca34754a	exit: Fix typo in comment: s/sub-theads/sub-threads Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2022-08-03 10:44:54 +02:00
Waiman Long	b6e8d40d43	sched, cpuset: Fix dl_cpu_busy() panic due to empty cs->cpus_allowed With cgroup v2, the cpuset's cpus_allowed mask can be empty indicating that the cpuset will just use the effective CPUs of its parent. So cpuset_can_attach() can call task_can_attach() with an empty mask. This can lead to cpumask_any_and() returns nr_cpu_ids causing the call to dl_bw_of() to crash due to percpu value access of an out of bound CPU value. For example: [80468.182258] BUG: unable to handle page fault for address: ffffffff8b6648b0 : [80468.191019] RIP: 0010:dl_cpu_busy+0x30/0x2b0 : [80468.207946] Call Trace: [80468.208947] cpuset_can_attach+0xa0/0x140 [80468.209953] cgroup_migrate_execute+0x8c/0x490 [80468.210931] cgroup_update_dfl_csses+0x254/0x270 [80468.211898] cgroup_subtree_control_write+0x322/0x400 [80468.212854] kernfs_fop_write_iter+0x11c/0x1b0 [80468.213777] new_sync_write+0x11f/0x1b0 [80468.214689] vfs_write+0x1eb/0x280 [80468.215592] ksys_write+0x5f/0xe0 [80468.216463] do_syscall_64+0x5c/0x80 [80468.224287] entry_SYSCALL_64_after_hwframe+0x44/0xae Fix that by using effective_cpus instead. For cgroup v1, effective_cpus is the same as cpus_allowed. For v2, effective_cpus is the real cpumask to be used by tasks within the cpuset anyway. Also update task_can_attach()'s 2nd argument name to cs_effective_cpus to reflect the change. In addition, a check is added to task_can_attach() to guard against the possibility that cpumask_any_and() may return a value >= nr_cpu_ids. Fixes: `7f51412a41` ("sched/deadline: Fix bandwidth check/update when migrating tasks between exclusive cpusets") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20220803015451.2219567-1-longman@redhat.com	2022-08-03 10:34:26 +02:00
Lee Jones	fe201f6fa4	MAINTAINERS: Use Lee Jones' kernel.org address for Backlight submissions Going forward, I'll be using my kernel.org for upstream work. Signed-off-by: Lee Jones <lee.jones@linaro.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Lee Jones <lee@kernel.org>	2022-08-03 08:26:30 +01:00
Paolo Abeni	7c6327c77d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: net/ax25/af_ax25.c `d7c4c9e075` ("ax25: fix incorrect dev_tracker usage") `d62607c3fe` ("net: rename reference+tracking helpers") drivers/net/netdevsim/fib.c `180a6a3ee6` ("netdevsim: fib: Fix reference count leak on route deletion failure") `012ec02ae4` ("netdevsim: convert driver to use unlocked devlink API during init/fini") Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-03 09:04:55 +02:00
Michael Ellerman	4cfa6ff24a	powerpc/64e: Fix kexec build error When building ppc64_book3e_allmodconfig the build fails with: arch/powerpc/kexec/file_load_64.c:1063:14: error: implicit declaration of function ‘firmware_has_feature’ 1063 \| if (!firmware_has_feature(FW_FEATURE_LPAR)) \| ^~~~~~~~~~~~~~~~~~~~ Add a direct include of asm/firmware.h to fix the error. Fixes: `b1fc44eaa9` ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220803063152.1249270-1-mpe@ellerman.id.au	2022-08-03 16:32:18 +10:00
Douglas Anderson	0fec518018	tty: serial: qcom-geni-serial: Fix %lu -> %u in print statements When we multiply an unsigned int by a u32 we still end up with an unsigned int. That means we should specify "%u" not "%lu" in the format code. NOTE: this fix was chosen instead of somehow promoting the value to "unsigned long" since the max baud rate from the earlier call to uart_get_baud_rate() is 4000000 and the max sampling rate is 32. 4000000 * 32 = 0x07a12000, not even close to overflowing 32-bits. Fixes: `c474c77571` ("tty: serial: qcom-geni-serial: Fix get_clk_div_rate() which otherwise could return a sub-optimal clock rate.") Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20220802132250.1.Iea061e14157a17e114dbe2eca764568a02d6b889@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-03 08:23:35 +02:00
Christophe JAILLET	6f63d04473	doc: sfp-phylink: Fix a broken reference The commit in Fixes: has changed a .txt file into a .yaml file. Update the documentation accordingly. While at it add some `` around some file names to improve the output. Fixes: `70991f1e68` ("dt-bindings: net: convert sff,sfp to dtschema") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/be3c7e87ca7f027703247eccfe000b8e34805094.1659247114.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-02 21:45:07 -07:00
Dave Airlie	5493ee1919	Merge tag 'amd-drm-next-5.20-2022-07-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.20-2022-07-29: amdgpu: - Misc spelling and grammar fixes - DC whitespace cleanups - ACP smatch fix - GFX 11.0 updates - PSP 13.0 updates - VCN 4.0 updates - DC FP fix for PPC64 - Misc bug fixes amdkfd: - SVM fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220729202742.6636-1-alexander.deucher@amd.com	2022-08-03 14:00:19 +10:00
Jeremy Bongio	d95efb14c0	ext4: add ioctls to get/set the ext4 superblock uuid This fixes a race between changing the ext4 superblock uuid and operations like mounting, resizing, changing features, etc. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jeremy Bongio <bongiojp@gmail.com> Link: https://lore.kernel.org/r/20220721224422.438351-1-bongiojp@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:26 -04:00
Kiselev, Oleg	69cb8e9d8c	ext4: avoid resizing to a partial cluster size This patch avoids an attempt to resize the filesystem to an unaligned cluster boundary. An online resize to a size that is not integral to cluster size results in the last iteration attempting to grow the fs by a negative amount, which trips a BUG_ON and leaves the fs with a corrupted in-memory superblock. Signed-off-by: Oleg Kiselev <okiselev@amazon.com> Link: https://lore.kernel.org/r/0E92A0AB-4F16-4F1A-94B7-702CC6504FDE@amazon.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:26 -04:00
Kiselev, Oleg	026d0d27c4	ext4: reduce computation of overhead during resize This patch avoids doing an O(n**2)-complexity walk through every flex group. Instead, it uses the already computed overhead information for the newly allocated space, and simply adds it to the previously calculated overhead stored in the superblock. This drastically reduces the time taken to resize very large bigalloc filesystems (from 3+ hours for a 64TB fs down to milliseconds). Signed-off-by: Oleg Kiselev <okiselev@amazon.com> Link: https://lore.kernel.org/r/CE4F359F-4779-45E6-B6A9-8D67FDFF5AE2@amazon.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Zhihao Cheng	4a734f0869	jbd2: fix assertion 'jh->b_frozen_data == NULL' failure when journal aborted Following process will fail assertion 'jh->b_frozen_data == NULL' in jbd2_journal_dirty_metadata(): jbd2_journal_commit_transaction unlink(dir/a) jh->b_transaction = trans1 jh->b_jlist = BJ_Metadata journal->j_running_transaction = NULL trans1->t_state = T_COMMIT unlink(dir/b) handle->h_trans = trans2 do_get_write_access jh->b_modified = 0 jh->b_frozen_data = frozen_buffer jh->b_next_transaction = trans2 jbd2_journal_dirty_metadata is_handle_aborted is_journal_aborted // return false --> jbd2 abort <-- while (commit_transaction->t_buffers) if (is_journal_aborted) jbd2_journal_refile_buffer __jbd2_journal_refile_buffer WRITE_ONCE(jh->b_transaction, jh->b_next_transaction) WRITE_ONCE(jh->b_next_transaction, NULL) __jbd2_journal_file_buffer(jh, BJ_Reserved) J_ASSERT_JH(jh, jh->b_frozen_data == NULL) // assertion failure ! The reproducer (See detail in [Link]) reports: ------------[ cut here ]------------ kernel BUG at fs/jbd2/transaction.c:1629! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 2 PID: 584 Comm: unlink Tainted: G W 5.19.0-rc6-00115-g4a57a8400075-dirty #697 RIP: 0010:jbd2_journal_dirty_metadata+0x3c5/0x470 RSP: 0018:ffffc90000be7ce0 EFLAGS: 00010202 Call Trace: <TASK> __ext4_handle_dirty_metadata+0xa0/0x290 ext4_handle_dirty_dirblock+0x10c/0x1d0 ext4_delete_entry+0x104/0x200 __ext4_unlink+0x22b/0x360 ext4_unlink+0x275/0x390 vfs_unlink+0x20b/0x4c0 do_unlinkat+0x42f/0x4c0 __x64_sys_unlink+0x37/0x50 do_syscall_64+0x35/0x80 After journal aborting, __jbd2_journal_refile_buffer() is executed with holding @jh->b_state_lock, we can fix it by moving 'is_handle_aborted()' into the area protected by @jh->b_state_lock. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216251 Fixes: `470decc613` ("[PATCH] jbd2: initial copy of files from jbd") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20220715125152.4022726-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Lukas Czerner	1e1c2b86ef	ext4: block range must be validated before use in ext4_mb_clear_bb() Block range to free is validated in ext4_free_blocks() using ext4_inode_block_valid() and then it's passed to ext4_mb_clear_bb(). However in some situations on bigalloc file system the range might be adjusted after the validation in ext4_free_blocks() which can lead to troubles on corrupted file systems such as one found by syzkaller that resulted in the following BUG kernel BUG at fs/ext4/ext4.h:3319! PREEMPT SMP NOPTI CPU: 28 PID: 4243 Comm: repro Kdump: loaded Not tainted 5.19.0-rc6+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014 RIP: 0010:ext4_free_blocks+0x95e/0xa90 Call Trace: <TASK> ? lock_timer_base+0x61/0x80 ? __es_remove_extent+0x5a/0x760 ? __mod_timer+0x256/0x380 ? ext4_ind_truncate_ensure_credits+0x90/0x220 ext4_clear_blocks+0x107/0x1b0 ext4_free_data+0x15b/0x170 ext4_ind_truncate+0x214/0x2c0 ? _raw_spin_unlock+0x15/0x30 ? ext4_discard_preallocations+0x15a/0x410 ? ext4_journal_check_start+0xe/0x90 ? __ext4_journal_start_sb+0x2f/0x110 ext4_truncate+0x1b5/0x460 ? __ext4_journal_start_sb+0x2f/0x110 ext4_evict_inode+0x2b4/0x6f0 evict+0xd0/0x1d0 ext4_enable_quotas+0x11f/0x1f0 ext4_orphan_cleanup+0x3de/0x430 ? proc_create_seq_private+0x43/0x50 ext4_fill_super+0x295f/0x3ae0 ? snprintf+0x39/0x40 ? sget_fc+0x19c/0x330 ? ext4_reconfigure+0x850/0x850 get_tree_bdev+0x16d/0x260 vfs_get_tree+0x25/0xb0 path_mount+0x431/0xa70 __x64_sys_mount+0xe2/0x120 do_syscall_64+0x5b/0x80 ? do_user_addr_fault+0x1e2/0x670 ? exc_page_fault+0x70/0x170 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fdf4e512ace Fix it by making sure that the block range is properly validated before used every time it changes in ext4_free_blocks() or ext4_mb_clear_bb(). Link: https://syzkaller.appspot.com/bug?id=5266d464285a03cee9dbfda7d2452a72c3c2ae7c Reported-by: syzbot+15cd994e273307bf5cfa@syzkaller.appspotmail.com Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Tadeusz Struk <tadeusz.struk@linaro.org> Tested-by: Tadeusz Struk <tadeusz.struk@linaro.org> Link: https://lore.kernel.org/r/20220714165903.58260-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	307af6c879	mbcache: automatically delete entries from cache on freeing Use the fact that entries with elevated refcount are not removed from the hash and just move removal of the entry from the hash to the entry freeing time. When doing this we also change the generic code to hold one reference to the cache entry, not two of them, which makes code somewhat more obvious. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-10-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	75896339e4	mbcache: Remove mb_cache_entry_delete() Nobody uses mb_cache_entry_delete() anymore. Remove it. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-9-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	1189d8ec51	ext2: avoid deleting xattr block that is being reused Currently when we decide to reuse xattr block we detect the case when the last reference to xattr block is being dropped at the same time and cancel the reuse attempt. Convert ext2 to a new scheme when as soon as matching mbcache entry is found, we wait with dropping the last xattr block reference until mbcache entry reference is dropped (meaning either the xattr block reference is increased or we decided not to reuse the block). Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-8-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	b67798d551	ext2: unindent codeblock in ext2_xattr_set() Replace one else in ext2_xattr_set() with a goto. This makes following code changes simpler to follow. No functional changes. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220712105436.32204-7-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	90ae40d243	ext2: factor our freeing of xattr block reference Free of xattr block reference is opencode in two places. Factor it out into a separate function and use it. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-6-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	65f8b80053	ext4: fix race when reusing xattr blocks When ext4_xattr_block_set() decides to remove xattr block the following race can happen: CPU1 CPU2 ext4_xattr_block_set() ext4_xattr_release_block() new_bh = ext4_xattr_block_cache_find() lock_buffer(bh); ref = le32_to_cpu(BHDR(bh)->h_refcount); if (ref == 1) { ... mb_cache_entry_delete(); unlock_buffer(bh); ext4_free_blocks(); ... ext4_forget(..., bh, ...); jbd2_journal_revoke(..., bh); ext4_journal_get_write_access(..., new_bh, ...) do_get_write_access() jbd2_journal_cancel_revoke(..., new_bh); Later the code in ext4_xattr_block_set() finds out the block got freed and cancels reusal of the block but the revoke stays canceled and so in case of block reuse and journal replay the filesystem can get corrupted. If the race works out slightly differently, we can also hit assertions in the jbd2 code. Fix the problem by making sure that once matching mbcache entry is found, code dropping the last xattr block reference (or trying to modify xattr block in place) waits until the mbcache entry reference is dropped. This way code trying to reuse xattr block is protected from someone trying to drop the last reference to xattr block. Reported-and-tested-by: Ritesh Harjani <ritesh.list@gmail.com> CC: stable@vger.kernel.org Fixes: `82939d7999` ("ext4: convert to mbcache2") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-5-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	fd48e9acdf	ext4: unindent codeblock in ext4_xattr_block_set() Remove unnecessary else (and thus indentation level) from a code block in ext4_xattr_block_set(). It will also make following code changes easier. No functional changes. CC: stable@vger.kernel.org Fixes: `82939d7999` ("ext4: convert to mbcache2") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	6bc0d63dad	ext4: remove EA inode entry from mbcache on inode eviction Currently we remove EA inode from mbcache as soon as its xattr refcount drops to zero. However there can be pending attempts to reuse the inode and thus refcount handling code has to handle the situation when refcount increases from zero anyway. So save some work and just keep EA inode in mbcache until it is getting evicted. At that moment we are sure following iget() of EA inode will fail anyway (or wait for eviction to finish and load things from the disk again) and so removing mbcache entry at that moment is fine and simplifies the code a bit. CC: stable@vger.kernel.org Fixes: `82939d7999` ("ext4: convert to mbcache2") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	3dc96bba65	mbcache: add functions to delete entry if unused Add function mb_cache_entry_delete_or_get() to delete mbcache entry if it is unused and also add a function to wait for entry to become unused - mb_cache_entry_wait_unused(). We do not share code between the two deleting function as one of them will go away soon. CC: stable@vger.kernel.org Fixes: `82939d7999` ("ext4: convert to mbcache2") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Jan Kara	5831891418	mbcache: don't reclaim used entries Do not reclaim entries that are currently used by somebody from a shrinker. Firstly, these entries are likely useful. Secondly, we will need to keep such entries to protect pending increment of xattr block refcount. CC: stable@vger.kernel.org Fixes: `82939d7999` ("ext4: convert to mbcache2") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220712105436.32204-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:25 -04:00
Lukas Czerner	b8a04fe77e	ext4: make sure ext4_append() always allocates new block ext4_append() must always allocate a new block, otherwise we run the risk of overwriting existing directory block corrupting the directory tree in the process resulting in all manner of problems later on. Add a sanity check to see if the logical block is already allocated and error out if it is. Cc: stable@kernel.org Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20220704142721.157985-2-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:17 -04:00
Lukas Czerner	65f8ea4cd5	ext4: check if directory block is within i_size Currently ext4 directory handling code implicitly assumes that the directory blocks are always within the i_size. In fact ext4_append() will attempt to allocate next directory block based solely on i_size and the i_size is then appropriately increased after a successful allocation. However, for this to work it requires i_size to be correct. If, for any reason, the directory inode i_size is corrupted in a way that the directory tree refers to a valid directory block past i_size, we could end up corrupting parts of the directory tree structure by overwriting already used directory blocks when modifying the directory. Fix it by catching the corruption early in __ext4_read_dirblock(). Addresses Red-Hat-Bugzilla: #2070205 CVE: CVE-2022-1184 Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20220704142721.157985-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:17 -04:00
Ojaswin Mujoo	3fa5d23e68	ext4: reflect mb_optimize_scan value in options file Add support to display the mb_optimize_scan value in /proc/fs/ext4/<dev>/options file. The option is only displayed when the value is non default. Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20220704054603.21462-1-ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:17 -04:00
Ye Bin	b24e77ef1c	ext4: avoid remove directory when directory is corrupted Now if check directoy entry is corrupted, ext4_empty_dir may return true then directory will be removed when file system mounted with "errors=continue". In order not to make things worse just return false when directory is corrupted. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220622090223.682234-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:17 -04:00
Jiang Jian	c64a92992e	ext4: aligned '' in comments The '' in the comment is not aligned. Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com> Link: https://lore.kernel.org/r/20220621061531.19669-1-jiangjian@cdjrlc.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:17 -04:00
Bagas Sanjaya	442ec1e5bb	Documentation: ext4: fix cell spacing of table heading on blockmap table Commit `3103084afc` ("ext4, doc: remove unnecessary escaping") removes redundant underscore escaping, however the cell spacing in heading row of blockmap table became not aligned anymore, hence triggers malformed table warning: Documentation/filesystems/ext4/blockmap.rst:3: WARNING: Malformed table. +---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| i.i_block Offset \| Where It Points \| <snipped>... The warning caused the table not being loaded. Realign the heading row cell by adding missing space at the first cell to fix the warning. Fixes: `3103084afc` ("ext4, doc: remove unnecessary escaping") Cc: stable@kernel.org Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Wang Jianjian <wangjianjian3@huawei.com> Cc: linux-ext4@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20220619072938.7334-1-bagasdotme@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:02 -04:00
Li Lingfeng	07ea7a617d	ext4: recover csum seed of tmp_inode after migrating to extents When migrating to extents, the checksum seed of temporary inode need to be replaced by inode's, otherwise the inode checksums will be incorrect when swapping the inodes data. However, the temporary inode can not match it's checksum to itself since it has lost it's own checksum seed. mkfs.ext4 -F /dev/sdc mount /dev/sdc /mnt/sdc xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/sdc/testfile chattr -e /mnt/sdc/testfile chattr +e /mnt/sdc/testfile umount /dev/sdc fsck -fn /dev/sdc ======== ... Pass 1: Checking inodes, blocks, and sizes Inode 13 passes checks, but checksum does not match inode. Fix? no ... ======== The fix is simple, save the checksum seed of temporary inode, and recover it after migrating to extents. Fixes: `e81c9302a6` ("ext4: set csum seed in tmp inode while migrating to extents") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220617062515.2113438-1-lilingfeng3@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:56:02 -04:00
Ye Bin	51ae846cff	ext4: fix warning in ext4_iomap_begin as race between bmap and write We got issue as follows: ------------[ cut here ]------------ WARNING: CPU: 3 PID: 9310 at fs/ext4/inode.c:3441 ext4_iomap_begin+0x182/0x5d0 RIP: 0010:ext4_iomap_begin+0x182/0x5d0 RSP: 0018:ffff88812460fa08 EFLAGS: 00010293 RAX: ffff88811f168000 RBX: 0000000000000000 RCX: ffffffff97793c12 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: ffff88812c669160 R08: ffff88811f168000 R09: ffffed10258cd20f R10: ffff88812c669077 R11: ffffed10258cd20e R12: 0000000000000001 R13: 00000000000000a4 R14: 000000000000000c R15: ffff88812c6691ee FS: 00007fd0d6ff3740(0000) GS:ffff8883af180000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd0d6dda290 CR3: 0000000104a62000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: iomap_apply+0x119/0x570 iomap_bmap+0x124/0x150 ext4_bmap+0x14f/0x250 bmap+0x55/0x80 do_vfs_ioctl+0x952/0xbd0 __x64_sys_ioctl+0xc6/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Above issue may happen as follows: bmap write bmap ext4_bmap iomap_bmap ext4_iomap_begin ext4_file_write_iter ext4_buffered_write_iter generic_perform_write ext4_da_write_begin ext4_da_write_inline_data_begin ext4_prepare_inline_data ext4_create_inline_data ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA); if (WARN_ON_ONCE(ext4_has_inline_data(inode))) ->trigger bug_on To solved above issue hold inode lock in ext4_bamp. Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20220617013935.397596-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2022-08-02 23:56:02 -04:00
Baokun Li	fd7e672ea9	ext4: correct the misjudgment in ext4_iget_extra_inode Use the EXT4_INODE_HAS_XATTR_SPACE macro to more accurately determine whether the inode have xattr space. Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220616021358.2504451-5-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:55:55 -04:00
Baokun Li	c9fd167d57	ext4: correct max_inline_xattr_value_size computing If the ext4 inode does not have xattr space, 0 is returned in the get_max_inline_xattr_value_size function. Otherwise, the function returns a negative value when the inode does not contain EXT4_STATE_XATTR. Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220616021358.2504451-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:52:44 -04:00
Baokun Li	67d7d8ad99	ext4: fix use-after-free in ext4_xattr_set_entry Hulk Robot reported a issue: ================================================================== BUG: KASAN: use-after-free in ext4_xattr_set_entry+0x18ab/0x3500 Write of size 4105 at addr ffff8881675ef5f4 by task syz-executor.0/7092 CPU: 1 PID: 7092 Comm: syz-executor.0 Not tainted 4.19.90-dirty #17 Call Trace: [...] memcpy+0x34/0x50 mm/kasan/kasan.c:303 ext4_xattr_set_entry+0x18ab/0x3500 fs/ext4/xattr.c:1747 ext4_xattr_ibody_inline_set+0x86/0x2a0 fs/ext4/xattr.c:2205 ext4_xattr_set_handle+0x940/0x1300 fs/ext4/xattr.c:2386 ext4_xattr_set+0x1da/0x300 fs/ext4/xattr.c:2498 __vfs_setxattr+0x112/0x170 fs/xattr.c:149 __vfs_setxattr_noperm+0x11b/0x2a0 fs/xattr.c:180 __vfs_setxattr_locked+0x17b/0x250 fs/xattr.c:238 vfs_setxattr+0xed/0x270 fs/xattr.c:255 setxattr+0x235/0x330 fs/xattr.c:520 path_setxattr+0x176/0x190 fs/xattr.c:539 __do_sys_lsetxattr fs/xattr.c:561 [inline] __se_sys_lsetxattr fs/xattr.c:557 [inline] __x64_sys_lsetxattr+0xc2/0x160 fs/xattr.c:557 do_syscall_64+0xdf/0x530 arch/x86/entry/common.c:298 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x459fe9 RSP: 002b:00007fa5e54b4c08 EFLAGS: 00000246 ORIG_RAX: 00000000000000bd RAX: ffffffffffffffda RBX: 000000000051bf60 RCX: 0000000000459fe9 RDX: 00000000200003c0 RSI: 0000000020000180 RDI: 0000000020000140 RBP: 000000000051bf60 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000001009 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffc73c93fc0 R14: 000000000051bf60 R15: 00007fa5e54b4d80 [...] ================================================================== Above issue may happen as follows: ------------------------------------- ext4_xattr_set ext4_xattr_set_handle ext4_xattr_ibody_find >> s->end < s->base >> no EXT4_STATE_XATTR >> xattr_check_inode is not executed ext4_xattr_ibody_set ext4_xattr_set_entry >> size_t min_offs = s->end - s->base >> UAF in memcpy we can easily reproduce this problem with the following commands: mkfs.ext4 -F /dev/sda mount -o debug_want_extra_isize=128 /dev/sda /mnt touch /mnt/file setfattr -n user.cat -v `seq -s z 4096\|tr -d '[:digit:]'` /mnt/file In ext4_xattr_ibody_find, we have the following assignment logic: header = IHDR(inode, raw_inode) = raw_inode + EXT4_GOOD_OLD_INODE_SIZE + i_extra_isize is->s.base = IFIRST(header) = header + sizeof(struct ext4_xattr_ibody_header) is->s.end = raw_inode + s_inode_size In ext4_xattr_set_entry min_offs = s->end - s->base = s_inode_size - EXT4_GOOD_OLD_INODE_SIZE - i_extra_isize - sizeof(struct ext4_xattr_ibody_header) last = s->first free = min_offs - ((void *)last - s->base) - sizeof(__u32) = s_inode_size - EXT4_GOOD_OLD_INODE_SIZE - i_extra_isize - sizeof(struct ext4_xattr_ibody_header) - sizeof(__u32) In the calculation formula, all values except s_inode_size and i_extra_size are fixed values. When i_extra_size is the maximum value s_inode_size - EXT4_GOOD_OLD_INODE_SIZE, min_offs is -4 and free is -8. The value overflows. As a result, the preceding issue is triggered when memcpy is executed. Therefore, when finding xattr or setting xattr, check whether there is space for storing xattr in the inode to resolve this issue. Cc: stable@kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220616021358.2504451-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-08-02 23:52:34 -04:00

... 5 6 7 8 9 ...

1120386 Commits