e0caaf75d4
5291 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Thomas Gleixner
|
25ce693ef7 |
platform-msi: Let the core code handle sysfs groups
Set the domain info flag and remove the local sysfs code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211210221814.109408832@linutronix.de |
||
Thomas Gleixner
|
077aeadb6c |
platform-msi: Allocate MSI device data on first use
Allocate the MSI device data on first invocation of the allocation function for platform MSI private data. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211210221813.805529729@linutronix.de |
||
Thomas Gleixner
|
34fff62827 |
device: Move MSI related data into a struct
The only unconditional part of MSI data in struct device is the irqdomain pointer. Everything else can be allocated on demand. Create a data structure and move the irqdomain pointer into it. The other MSI specific parts are going to be removed from struct device in later steps. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20211210221813.617178827@linutronix.de |
||
Mateusz Jończyk
|
0dd8d6cb9e |
rtc: Check return value from mc146818_get_time()
There are 4 users of mc146818_get_time() and none of them was checking the return value from this function. Change this. Print the appropriate warnings in callers of mc146818_get_time() instead of in the function mc146818_get_time() itself, in order not to add strings to rtc-mc146818-lib.c, which is kind of a library. The callers of alpha_rtc_read_time() and cmos_read_time() may use the contents of (struct rtc_time *) even when the functions return a failure code. Therefore, set the contents of (struct rtc_time *) to 0x00, which looks more sensible then 0xff and aligns with the (possibly stale?) comment in cmos_read_time: /* * If pm_trace abused the RTC for storage, set the timespec to 0, * which tells the caller that this RTC value is unusable. */ For consistency, do this in mc146818_get_time(). Note: hpet_rtc_interrupt() may call mc146818_get_time() many times a second. It is very unlikely, though, that the RTC suddenly stops working and mc146818_get_time() would consistently fail. Only compile-tested on alpha. Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: linux-alpha@vger.kernel.org Cc: x86@kernel.org Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20211210200131.153887-4-mat.jonczyk@o2.pl |
||
Jarkko Sakkinen
|
50468e4313 |
x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node
== Problem == The amount of SGX memory on a system is determined by the BIOS and it varies wildly between systems. It can be as small as dozens of MB's and as large as many GB's on servers. Just like how applications need to know how much regular RAM is available, enclave builders need to know how much SGX memory an enclave can consume. == Solution == Introduce a new sysfs file: /sys/devices/system/node/nodeX/x86/sgx_total_bytes to enumerate the amount of SGX memory available in each NUMA node. This serves the same function for SGX as /proc/meminfo or /sys/devices/system/node/nodeX/meminfo does for normal RAM. 'sgx_total_bytes' is needed today to help drive the SGX selftests. SGX-specific swap code is exercised by creating overcommitted enclaves which are larger than the physical SGX memory on the system. They currently use a CPUID-based approach which can diverge from the actual amount of SGX memory available. 'sgx_total_bytes' ensures that the selftests can work efficiently and do not attempt stupid things like creating a 100,000 MB enclave on a system with 128 MB of SGX memory. == Implementation Details == Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an arch specific attribute group, and add an attribute for the amount of SGX memory in bytes to each NUMA node: == ABI Design Discussion == As opposed to the per-node ABI, a single, global ABI was considered. However, this would prevent enclaves from being able to size themselves so that they fit on a single NUMA node. Essentially, a single value would rule out NUMA optimizations for enclaves. Create a new "x86/" directory inside each "nodeX/" sysfs directory. 'sgx_total_bytes' is expected to be the first of at least a few sgx-specific files to be placed in the new directory. Just scanning /proc/meminfo, these are the no-brainers that we have for RAM, but we need for SGX: MemTotal: xxxx kB // sgx_total_bytes (implemented here) MemFree: yyyy kB // sgx_free_bytes SwapTotal: zzzz kB // sgx_swapped_bytes So, at *least* three. I think we will eventually end up needing something more along the lines of a dozen. A new directory (as opposed to being in the nodeX/ "root") directory avoids cluttering the root with several "sgx_*" files. Place the new file in a new "nodeX/x86/" directory because SGX is highly x86-specific. It is very unlikely that any other architecture (or even non-Intel x86 vendor) will ever implement SGX. Using "sgx/" as opposed to "x86/" was also considered. But, there is a real chance this can get used for other arch-specific purposes. [ dhansen: rewrite changelog ] Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211116162116.93081-2-jarkko@kernel.org |
||
Thomas Gleixner
|
cd119b09a8 |
PCI/MSI: Move msi_lock to struct pci_dev
It's only required for PCI/MSI. So no point in having it in every struct device. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211206210224.925241961@linutronix.de |
||
Daniel Scally
|
c097af1d0a |
device property: Check fwnode->secondary when finding properties
fwnode_property_get_reference_args() searches for named properties against a fwnode_handle, but these could instead be against the fwnode's secondary. If the property isn't found against the primary, check the secondary to see if it's there instead. Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Daniel Scally <djrscally@gmail.com> Link: https://lore.kernel.org/r/20211128232455.39332-1-djrscally@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Ira Weiny
|
e1b5186810 |
Documentation/auxiliary_bus: Move the text into the code
The code and documentation are more difficult to maintain when kept separately. This is further compounded when the standard structure documentation infrastructure is not used. Move the documentation into the code, use the standard documentation infrastructure, add current documented functions, and reference the text in the rst file. Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20211202044305.4006853-8-ira.weiny@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Ira Weiny
|
8a2d6ffe77 |
Documentation/auxiliary_bus: Clarify the release of devices from find device
auxiliary_find_device() takes a proper get_device() reference on the device before returning the matched device. Users of this call should be informed that they need to properly release this reference with put_device(). Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20211202044305.4006853-7-ira.weiny@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Ira Weiny
|
05021dca78 |
Documentation/auxiliary_bus: Clarify __auxiliary_driver_register
__auxiliary_driver_register is not intended to be called directly unless a custom name is required. Add documentation for this fact. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20211202044305.4006853-5-ira.weiny@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Ira Weiny
|
b247703873 |
Documentation/auxiliary_bus: Clarify auxiliary_device creation
The documentation for creating an auxiliary device is a 3 step not a 2 step process. Specifically the requirements of setting the name, id, dev.release, and dev.parent fields was not clear as a precursor to the '2 step' process documented. Clarify by declaring this a 3 step process starting with setting the fields of struct auxiliary_device correctly. Also add some sample code and tie the change into the rest of the documentation. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20211202044305.4006853-2-ira.weiny@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Heiko Carstens
|
f1045056c7 |
topology/sysfs: rework book and drawer topology ifdefery
Provide default defines for the topology_book_[id|cpumask] and topology_drawer_[id|cpumask] macros just like for each other topology level. This way all topology levels are handled in a similar way. Still the the book and drawer levels are only used on s390, and also the sysfs attributes are only created on s390. However other architectures may opt in if wanted. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20211129130309.3256168-4-hca@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Heiko Carstens
|
e795707703 |
topology/sysfs: export cluster attributes only if an architectures has support
The cluster_id and cluster_cpus topology sysfs attributes have been
added with commit
|
||
Heiko Carstens
|
2c4dcd7fd5 |
topology/sysfs: export die attributes only if an architectures has support
The die_id and die_cpus topology sysfs attributes have been added with commit |
||
Cai Huoqing
|
2043727c28 |
driver core: platform: Make use of the helper function dev_err_probe()
When possible using dev_err_probe() helps to properly deal with the PROBE_DEFER error, the benefit is that DEFER issue will be logged in the devices_deferred debugfs file. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Link: https://lore.kernel.org/r/20211105071509.969-1-caihuoqing@baidu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Heikki Krogerus
|
2338e7bcef |
device property: Remove device_add_properties() API
There are no more users for it. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
Heikki Krogerus
|
982b94ba09 |
driver core: Don't call device_remove_properties() from device_del()
All the drivers that relied on device_del() to call device_remove_properties() have now been converted to either use device_create_managed_software_node() instead of device_add_properties(), or to register the software node completely separately from the device. This will make it finally possible to share and reuse the software nodes that hold the additional device properties. Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
Lukasz Luba
|
7e97b3dc25 |
arch_topology: Remove unused topology_set_thermal_pressure() and related
There is no need of this function (and related) since code has been converted to use the new arch_update_thermal_pressure() API. The old code can be removed. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> |
||
Lukasz Luba
|
c214f12416 |
arch_topology: Introduce thermal pressure update function
The thermal pressure is a mechanism which is used for providing information about reduced CPU performance to the scheduler. Usually code has to convert the value from frequency units into capacity units, which are understandable by the scheduler. Create a common conversion code which can be just used via a handy API. Internally, the topology_update_thermal_pressure() operates on frequency in MHz and max CPU frequency is taken from 'freq_factor' (per-cpu). Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> |
||
Ansuel Smith
|
02d6fdecb9
|
regmap: allow to define reg_update_bits for no bus configuration
Some device requires a special handling for reg_update_bits and can't use the normal regmap read write logic. An example is when locking is handled by the device and rmw operations requires to do atomic operations. Allow to declare a dedicated function in regmap_config for reg_update_bits in no bus configuration. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Link: https://lore.kernel.org/r/20211104150040.1260-1-ansuelsmth@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> |
||
Wang ShaoBo
|
4cc4cc28ec |
arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology()
When testing cpu online and offline, warning happened like this: [ 146.746743] WARNING: CPU: 92 PID: 974 at kernel/sched/topology.c:2215 build_sched_domains+0x81c/0x11b0 [ 146.749988] CPU: 92 PID: 974 Comm: kworker/92:2 Not tainted 5.15.0 #9 [ 146.750402] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 1.79 08/21/2021 [ 146.751213] Workqueue: events cpuset_hotplug_workfn [ 146.751629] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 146.752048] pc : build_sched_domains+0x81c/0x11b0 [ 146.752461] lr : build_sched_domains+0x414/0x11b0 [ 146.752860] sp : ffff800040a83a80 [ 146.753247] x29: ffff800040a83a80 x28: ffff20801f13a980 x27: ffff20800448ae00 [ 146.753644] x26: ffff800012a858e8 x25: ffff800012ea48c0 x24: 0000000000000000 [ 146.754039] x23: ffff800010ab7d60 x22: ffff800012f03758 x21: 000000000000005f [ 146.754427] x20: 000000000000005c x19: ffff004080012840 x18: ffffffffffffffff [ 146.754814] x17: 3661613030303230 x16: 30303078303a3239 x15: ffff800011f92b48 [ 146.755197] x14: ffff20be3f95cef6 x13: 2e6e69616d6f642d x12: 6465686373204c4c [ 146.755578] x11: ffff20bf7fc83a00 x10: 0000000000000040 x9 : 0000000000000000 [ 146.755957] x8 : 0000000000000002 x7 : ffffffffe0000000 x6 : 0000000000000002 [ 146.756334] x5 : 0000000090000000 x4 : 00000000f0000000 x3 : 0000000000000001 [ 146.756705] x2 : 0000000000000080 x1 : ffff800012f03860 x0 : 0000000000000001 [ 146.757070] Call trace: [ 146.757421] build_sched_domains+0x81c/0x11b0 [ 146.757771] partition_sched_domains_locked+0x57c/0x978 [ 146.758118] rebuild_sched_domains_locked+0x44c/0x7f0 [ 146.758460] rebuild_sched_domains+0x2c/0x48 [ 146.758791] cpuset_hotplug_workfn+0x3fc/0x888 [ 146.759114] process_one_work+0x1f4/0x480 [ 146.759429] worker_thread+0x48/0x460 [ 146.759734] kthread+0x158/0x168 [ 146.760030] ret_from_fork+0x10/0x20 [ 146.760318] ---[ end trace 82c44aad6900e81a ]--- For some architectures like risc-v and arm64 which use common code clear_cpu_topology() in shutting down CPUx, When CONFIG_SCHED_CLUSTER is set, cluster_sibling in cpu_topology of each sibling adjacent to CPUx is missed clearing, this causes checking failed in topology_span_sane() and rebuilding topology failure at end when CPU online. Different sibling's cluster_sibling in cpu_topology[] when CPU92 offline (CPU 92, 93, 94, 95 are in one cluster): Before revision: CPU [92] [93] [94] [95] cluster_sibling [92] [92-95] [92-95] [92-95] After revision: CPU [92] [93] [94] [95] cluster_sibling [92] [93-95] [93-95] [93-95] Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Barry Song <song.bao.hua@hisilicon.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20211110095856.469360-1-bobo.shaobowang@huawei.com |
||
Linus Torvalds
|
d422555f32 |
More power management updates for 5.16-rc1
- Fix 2 intel_pstate driver regressions related to the HWP interrupt handling added recently (Srinivas Pandruvada). - Fix intel_pstate driver regression introduced during the 5.11 cycle and causing HWP desired performance to be mishandled in some cases when switching driver modes and during system suspend and shutdown (Rafael Wysocki). - Fix system-wide device suspend and resume locking to avoid deadlocks when device objects are deleted during a system-wide PM transition (Rafael Wysocki). - Modify system-wide suspend of devices to prevent cpuidle drivers based on runtime PM from misbehaving during the "no IRQ" phase of it (Ulf Hansson). - Fix return value of _opp_add_static_v2() helper (YueHaibing). - Fix required-opp handle count (Pavankumar Kondeti). - Add resource managed OPP helpers, update dev_pm_opp_attach_genpd(), update their devfreq users, and make minor DT binding change (Dmitry Osipenko). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmGL0/4SHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxaeMP/09xsXf0LP2LhP9YY77kvdrMONjyh1nj XuelrhuWl1Cz2X+gh3d7zXrjbuPW0lZF8Syc8GY1Ct8o3omQ8jnyiH1LIBfkDVx7 WqVk8rGMMBXNpg3Os+MMADm3s0woW8c7IF+5cYwxL03OaOGH8uHiZxS9ZTLocH6H sOh9EAy72b/vLCq4aAzrG8ux10k9t8+2A6EuxzKrkO8rz/egtLjb8KznCER4n4nx lJ5Us+Kpo9TwuQ4pwujcvk7M9U2boHeKEk8ItiQUQwpW1VJsEUg1konz/UziE3Nf 2vHZIellKs3RB/cVzFZsIJJt0etDss6YtivAMdpgBS+ciiZYgyjqKY+whZ/eBrue 7nn79kSIhi0TVsVnM02HUYpz8oWHo0P0xqZr0t8IzOU+xvJF1e3fb/AkiQbGhYPs OXUTDqdOHFPplymRy6yrq3NqXyfqYVEKMHXWsQ+uY6FhOvVQTlkBdWscF9inoSDf LCmoc+xYLIl9R79syq9BqmVDE9cKLTZowmkexKI7MNu32ZQrPrKNdjMbYvvU0Sxf 248RsZokiobnmXZdEupIEKXmI4ANyZFbeU0CL22MY3B14YcRqN7xPWXz2a/6fXWL h+DS8GCO6j3fgdrywUwA9/LyxR4SOJYEzrC3m34atufMm9LFYII6ShTh96DDPQV6 i5hcVlC5wXAb =lduU -----END PGP SIGNATURE----- Merge tag 'pm-5.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These fix three intel_pstate driver regressions, fix locking in the core code suspending and resuming devices during system PM transitions, fix the handling of cpuidle drivers based on runtime PM during system-wide suspend, fix two issues in the operating performance points (OPP) framework and resource-managed helpers to it. Specifics: - Fix two intel_pstate driver regressions related to the HWP interrupt handling added recently (Srinivas Pandruvada). - Fix intel_pstate driver regression introduced during the 5.11 cycle and causing HWP desired performance to be mishandled in some cases when switching driver modes and during system suspend and shutdown (Rafael Wysocki). - Fix system-wide device suspend and resume locking to avoid deadlocks when device objects are deleted during a system-wide PM transition (Rafael Wysocki). - Modify system-wide suspend of devices to prevent cpuidle drivers based on runtime PM from misbehaving during the "no IRQ" phase of it (Ulf Hansson). - Fix return value of _opp_add_static_v2() helper (YueHaibing). - Fix required-opp handle count (Pavankumar Kondeti). - Add resource managed OPP helpers, update dev_pm_opp_attach_genpd(), update their devfreq users, and make minor DT binding change (Dmitry Osipenko)" * tag 'pm-5.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: sleep: Avoid calling put_device() under dpm_list_mtx cpufreq: intel_pstate: Clear HWP Status during HWP Interrupt enable cpufreq: intel_pstate: Fix unchecked MSR 0x773 access cpufreq: intel_pstate: Clear HWP desired on suspend/shutdown and offline PM: sleep: Fix runtime PM based cpuidle support dt-bindings: opp: Allow multi-worded OPP entry name opp: Fix return in _opp_add_static_v2() PM / devfreq: tegra30: Check whether clk_round_rate() returns zero rate PM / devfreq: tegra30: Use resource-managed helpers PM / devfreq: Add devm_devfreq_add_governor() opp: Add more resource-managed variants of dev_pm_opp_of_add_table() opp: Change type of dev_pm_opp_attach_genpd(names) argument opp: Fix required-opps phandle array count check |
||
Linus Torvalds
|
512b7931ad |
Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton: "257 patches. Subsystems affected by this patch series: scripts, ocfs2, vfs, and mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache, gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools, memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm, vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram, cleanups, kfence, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits) mm/damon: remove return value from before_terminate callback mm/damon: fix a few spelling mistakes in comments and a pr_debug message mm/damon: simplify stop mechanism Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions Docs/admin-guide/mm/damon/start: simplify the content Docs/admin-guide/mm/damon/start: fix a wrong link Docs/admin-guide/mm/damon/start: fix wrong example commands mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on mm/damon: remove unnecessary variable initialization Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM) selftests/damon: support watermarks mm/damon/dbgfs: support watermarks mm/damon/schemes: activate schemes based on a watermarks mechanism tools/selftests/damon: update for regions prioritization of schemes mm/damon/dbgfs: support prioritization weights mm/damon/vaddr,paddr: support pageout prioritization mm/damon/schemes: prioritize regions within the quotas mm/damon/selftests: support schemes quotas mm/damon/dbgfs: support quotas of schemes ... |
||
David Hildenbrand
|
50f9481ed9 |
mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE
CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE. Link: https://lkml.kernel.org/r/20210929143600.49379-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Shuah Khan <skhan@linuxfoundation.org> [kselftest] Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Alex Shi <alexs@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Mike Rapoport
|
4421cca0a3 |
memblock: use memblock_free for freeing virtual pointers
Rename memblock_free_ptr() to memblock_free() and use memblock_free() when freeing a virtual pointer so that memblock_free() will be a counterpart of memblock_alloc() The callers are updated with the below semantic patch and manual addition of (void *) casting to pointers that are represented by unsigned long variables. @@ identifier vaddr; expression size; @@ ( - memblock_phys_free(__pa(vaddr), size); + memblock_free(vaddr, size); | - memblock_free_ptr(vaddr, size); + memblock_free(vaddr, size); ) [sfr@canb.auug.org.au: fixup] Link: https://lkml.kernel.org/r/20211018192940.3d1d532f@canb.auug.org.au Link: https://lkml.kernel.org/r/20210930185031.18648-7-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Mike Rapoport
|
3ecc68349b |
memblock: rename memblock_free to memblock_phys_free
Since memblock_free() operates on a physical range, make its name reflect it and rename it to memblock_phys_free(), so it will be a logical counterpart to memblock_phys_alloc(). The callers are updated with the below semantic patch: @@ expression addr; expression size; @@ - memblock_free(addr, size); + memblock_phys_free(addr, size); Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Mike Rapoport
|
fa27717110 |
memblock: drop memblock_free_early_nid() and memblock_free_early()
memblock_free_early_nid() is unused and memblock_free_early() is an alias for memblock_free(). Replace calls to memblock_free_early() with calls to memblock_free() and remove memblock_free_early() and memblock_free_early_nid(). Link: https://lkml.kernel.org/r/20210930185031.18648-4-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Mike Rapoport
|
5787ea5bed |
arch_numa: simplify numa_distance allocation
Patch series "memblock: cleanup memblock_free interface", v2. This is the fix for memblock freeing APIs mismatch [1]. The first patch is a cleanup of numa_distance allocation in arch_numa I've spotted during the conversion. The second patch is a fix for Xen memory freeing on some of the error paths. [1] https://lore.kernel.org/all/CAHk-=wj9k4LZTz+svCxLYs5Y1=+yKrbAUArH1+ghyG3OLd8VVg@mail.gmail.com This patch (of 6): Memory allocation of numa_distance uses memblock_phys_alloc_range() without actual range limits, converts the returned physical address to virtual and then only uses the virtual address for further initialization. Simplify this by replacing memblock_phys_alloc_range() with memblock_alloc(). Link: https://lkml.kernel.org/r/20210930185031.18648-1-rppt@kernel.org Link: https://lkml.kernel.org/r/20210930185031.18648-2-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Juergen Gross <jgross@suse.com> Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Kefeng Wang
|
09cea61950 |
arm64: support page mapping percpu first chunk allocator
Percpu embedded first chunk allocator is the firstly option, but it could fails on ARM64, eg, percpu: max_distance=0x5fcfdc640000 too large for vmalloc space 0x781fefff0000 percpu: max_distance=0x600000540000 too large for vmalloc space 0x7dffb7ff0000 percpu: max_distance=0x5fff9adb0000 too large for vmalloc space 0x5dffb7ff0000 then we could get WARNING: CPU: 15 PID: 461 at vmalloc.c:3087 pcpu_get_vm_areas+0x488/0x838 and the system could not boot successfully. Let's implement page mapping percpu first chunk allocator as a fallback to the embedding allocator to increase the robustness of the system. Link: https://lkml.kernel.org/r/20210910053354.26721-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Marco Elver <elver@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Rafael J. Wysocki
|
2aa36604e8 |
PM: sleep: Avoid calling put_device() under dpm_list_mtx
It is generally unsafe to call put_device() with dpm_list_mtx held, because the given device's release routine may carry out an action depending on that lock which then may deadlock, so modify the system-wide suspend and resume of devices to always drop dpm_list_mtx before calling put_device() (and adjust white space somewhat while at it). For instance, this prevents the following splat from showing up in the kernel log after a system resume in certain configurations: [ 3290.969514] ====================================================== [ 3290.969517] WARNING: possible circular locking dependency detected [ 3290.969519] 5.15.0+ #2420 Tainted: G S [ 3290.969523] ------------------------------------------------------ [ 3290.969525] systemd-sleep/4553 is trying to acquire lock: [ 3290.969529] ffff888117ab1138 ((wq_completion)hci0#2){+.+.}-{0:0}, at: flush_workqueue+0x87/0x4a0 [ 3290.969554] but task is already holding lock: [ 3290.969556] ffffffff8280fca8 (dpm_list_mtx){+.+.}-{3:3}, at: dpm_resume+0x12e/0x3e0 [ 3290.969571] which lock already depends on the new lock. [ 3290.969573] the existing dependency chain (in reverse order) is: [ 3290.969575] -> #3 (dpm_list_mtx){+.+.}-{3:3}: [ 3290.969583] __mutex_lock+0x9d/0xa30 [ 3290.969591] device_pm_add+0x2e/0xe0 [ 3290.969597] device_add+0x4d5/0x8f0 [ 3290.969605] hci_conn_add_sysfs+0x43/0xb0 [bluetooth] [ 3290.969689] hci_conn_complete_evt.isra.71+0x124/0x750 [bluetooth] [ 3290.969747] hci_event_packet+0xd6c/0x28a0 [bluetooth] [ 3290.969798] hci_rx_work+0x213/0x640 [bluetooth] [ 3290.969842] process_one_work+0x2aa/0x650 [ 3290.969851] worker_thread+0x39/0x400 [ 3290.969859] kthread+0x142/0x170 [ 3290.969865] ret_from_fork+0x22/0x30 [ 3290.969872] -> #2 (&hdev->lock){+.+.}-{3:3}: [ 3290.969881] __mutex_lock+0x9d/0xa30 [ 3290.969887] hci_event_packet+0xba/0x28a0 [bluetooth] [ 3290.969935] hci_rx_work+0x213/0x640 [bluetooth] [ 3290.969978] process_one_work+0x2aa/0x650 [ 3290.969985] worker_thread+0x39/0x400 [ 3290.969993] kthread+0x142/0x170 [ 3290.969999] ret_from_fork+0x22/0x30 [ 3290.970004] -> #1 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}: [ 3290.970013] process_one_work+0x27d/0x650 [ 3290.970020] worker_thread+0x39/0x400 [ 3290.970028] kthread+0x142/0x170 [ 3290.970033] ret_from_fork+0x22/0x30 [ 3290.970038] -> #0 ((wq_completion)hci0#2){+.+.}-{0:0}: [ 3290.970047] __lock_acquire+0x15cb/0x1b50 [ 3290.970054] lock_acquire+0x26c/0x300 [ 3290.970059] flush_workqueue+0xae/0x4a0 [ 3290.970066] drain_workqueue+0xa1/0x130 [ 3290.970073] destroy_workqueue+0x34/0x1f0 [ 3290.970081] hci_release_dev+0x49/0x180 [bluetooth] [ 3290.970130] bt_host_release+0x1d/0x30 [bluetooth] [ 3290.970195] device_release+0x33/0x90 [ 3290.970201] kobject_release+0x63/0x160 [ 3290.970211] dpm_resume+0x164/0x3e0 [ 3290.970215] dpm_resume_end+0xd/0x20 [ 3290.970220] suspend_devices_and_enter+0x1a4/0xba0 [ 3290.970229] pm_suspend+0x26b/0x310 [ 3290.970236] state_store+0x42/0x90 [ 3290.970243] kernfs_fop_write_iter+0x135/0x1b0 [ 3290.970251] new_sync_write+0x125/0x1c0 [ 3290.970257] vfs_write+0x360/0x3c0 [ 3290.970263] ksys_write+0xa7/0xe0 [ 3290.970269] do_syscall_64+0x3a/0x80 [ 3290.970276] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3290.970284] other info that might help us debug this: [ 3290.970285] Chain exists of: (wq_completion)hci0#2 --> &hdev->lock --> dpm_list_mtx [ 3290.970297] Possible unsafe locking scenario: [ 3290.970299] CPU0 CPU1 [ 3290.970300] ---- ---- [ 3290.970302] lock(dpm_list_mtx); [ 3290.970306] lock(&hdev->lock); [ 3290.970310] lock(dpm_list_mtx); [ 3290.970314] lock((wq_completion)hci0#2); [ 3290.970319] *** DEADLOCK *** [ 3290.970321] 7 locks held by systemd-sleep/4553: [ 3290.970325] #0: ffff888103bcd448 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xa7/0xe0 [ 3290.970341] #1: ffff888115a14488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x103/0x1b0 [ 3290.970355] #2: ffff888100f719e0 (kn->active#233){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x10c/0x1b0 [ 3290.970369] #3: ffffffff82661048 (autosleep_lock){+.+.}-{3:3}, at: state_store+0x12/0x90 [ 3290.970384] #4: ffffffff82658ac8 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend+0x9f/0x310 [ 3290.970399] #5: ffffffff827f2a48 (acpi_scan_lock){+.+.}-{3:3}, at: acpi_suspend_begin+0x4c/0x80 [ 3290.970416] #6: ffffffff8280fca8 (dpm_list_mtx){+.+.}-{3:3}, at: dpm_resume+0x12e/0x3e0 [ 3290.970428] stack backtrace: [ 3290.970431] CPU: 3 PID: 4553 Comm: systemd-sleep Tainted: G S 5.15.0+ #2420 [ 3290.970438] Hardware name: Dell Inc. XPS 13 9380/0RYJWW, BIOS 1.5.0 06/03/2019 [ 3290.970441] Call Trace: [ 3290.970446] dump_stack_lvl+0x44/0x57 [ 3290.970454] check_noncircular+0x105/0x120 [ 3290.970468] ? __lock_acquire+0x15cb/0x1b50 [ 3290.970474] __lock_acquire+0x15cb/0x1b50 [ 3290.970487] lock_acquire+0x26c/0x300 [ 3290.970493] ? flush_workqueue+0x87/0x4a0 [ 3290.970503] ? __raw_spin_lock_init+0x3b/0x60 [ 3290.970510] ? lockdep_init_map_type+0x58/0x240 [ 3290.970519] flush_workqueue+0xae/0x4a0 [ 3290.970526] ? flush_workqueue+0x87/0x4a0 [ 3290.970544] ? drain_workqueue+0xa1/0x130 [ 3290.970552] drain_workqueue+0xa1/0x130 [ 3290.970561] destroy_workqueue+0x34/0x1f0 [ 3290.970572] hci_release_dev+0x49/0x180 [bluetooth] [ 3290.970624] bt_host_release+0x1d/0x30 [bluetooth] [ 3290.970687] device_release+0x33/0x90 [ 3290.970695] kobject_release+0x63/0x160 [ 3290.970705] dpm_resume+0x164/0x3e0 [ 3290.970710] ? dpm_resume_early+0x251/0x3b0 [ 3290.970718] dpm_resume_end+0xd/0x20 [ 3290.970723] suspend_devices_and_enter+0x1a4/0xba0 [ 3290.970737] pm_suspend+0x26b/0x310 [ 3290.970746] state_store+0x42/0x90 [ 3290.970755] kernfs_fop_write_iter+0x135/0x1b0 [ 3290.970764] new_sync_write+0x125/0x1c0 [ 3290.970777] vfs_write+0x360/0x3c0 [ 3290.970785] ksys_write+0xa7/0xe0 [ 3290.970794] do_syscall_64+0x3a/0x80 [ 3290.970803] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3290.970811] RIP: 0033:0x7f41b1328164 [ 3290.970819] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 4a d2 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83 [ 3290.970824] RSP: 002b:00007ffe6ae21b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 3290.970831] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f41b1328164 [ 3290.970836] RDX: 0000000000000004 RSI: 000055965e651070 RDI: 0000000000000004 [ 3290.970839] RBP: 000055965e651070 R08: 000055965e64f390 R09: 00007f41b1e3d1c0 [ 3290.970843] R10: 000000000000000a R11: 0000000000000246 R12: 0000000000000004 [ 3290.970846] R13: 0000000000000001 R14: 000055965e64f2b0 R15: 0000000000000004 Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
Ulf Hansson
|
a2bd7be12b |
PM: sleep: Fix runtime PM based cpuidle support
In the cpuidle-psci case, runtime PM in combination with the generic PM domain (genpd), may be used when entering/exiting a shared idlestate. More precisely, genpd relies on runtime PM to be enabled for the attached device (in this case it belongs to a CPU), to properly manage the reference counting of its PM domain. This works fine most of the time, but during system suspend in dpm_suspend_late(), the PM core disables runtime PM for all devices. Beyond this point, calls to pm_runtime_get_sync() to runtime resume a device may fail and therefore it could also mess up the reference counting in genpd. To fix this problem, let's call wake_up_all_idle_cpus() in dpm_suspend_late(), prior to disabling runtime PM. In this way a device that belongs to a CPU, becomes runtime resumed through cpuidle-psci and stays like that, because the runtime PM usage count has been bumped in device_prepare(). Diagnosed-by: Maulik Shah <mkshah@codeaurora.org> Suggested-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
Linus Torvalds
|
95faf6ba65 |
Driver core changes for 5.16-rc1
Here is the big set of driver core changes for 5.16-rc1. All of these have been in linux-next for a while now with no reported problems. Included in here are: - big update and cleanup of the sysfs abi documentation files and scripts from Mauro. We are almost at the place where we can properly check that the running kernel's sysfs abi is documented fully. - firmware loader updates - dyndbg updates - kernfs cleanups and fixes from Christoph - device property updates - component fix - other minor driver core cleanups and fixes Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYYPbjQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ync9gCfXKMUI1GAnCfJWAwTdTcd18q5akoAoMw32/AH 0yh5TjAWFyFd7xz5d7qs =itsC -----END PGP SIGNATURE----- Merge tag 'driver-core-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 5.16-rc1. All of these have been in linux-next for a while now with no reported problems. Included in here are: - big update and cleanup of the sysfs abi documentation files and scripts from Mauro. We are almost at the place where we can properly check that the running kernel's sysfs abi is documented fully. - firmware loader updates - dyndbg updates - kernfs cleanups and fixes from Christoph - device property updates - component fix - other minor driver core cleanups and fixes" * tag 'driver-core-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (122 commits) device property: Drop redundant NULL checks x86/build: Tuck away built-in firmware under FW_LOADER vmlinux.lds.h: wrap built-in firmware support under FW_LOADER firmware_loader: move struct builtin_fw to the only place used x86/microcode: Use the firmware_loader built-in API firmware_loader: remove old DECLARE_BUILTIN_FIRMWARE() firmware_loader: formalize built-in firmware API component: do not leave master devres group open after bind dyndbg: refine verbosity 1-4 summary-detail gpiolib: acpi: Replace custom code with device_match_acpi_handle() i2c: acpi: Replace custom function with device_match_acpi_handle() driver core: Provide device_match_acpi_handle() helper dyndbg: fix spurious vNpr_info change dyndbg: no vpr-info on empty queries dyndbg: vpr-info on remove-module complete, not starting device property: Add missed header in fwnode.h Documentation: dyndbg: Improve cli param examples dyndbg: Remove support for ddebug_query param dyndbg: make dyndbg a known cli param dyndbg: show module in vpr-info in dd-exec-queries ... |
||
Linus Torvalds
|
833db72142 |
Power management updates for 5.16-rc1
- Add support for inefficient operating performance points to the Energy Model and modify cpufreq to use them properly (Vincent Donnefort). - Rearrange the DTPM framework code to simplify it and make it easier to follow (Daniel Lezcano). - Fix power intialization in DTPM (Daniel Lezcano). - Add CPU load consideration when estimating the instaneous power consumption in DTPM (Daniel Lezcano). - Fix cpu->pstate.turbo_freq initialization in intel_pstate (Zhang Rui). - Make intel_pstate process HWP Guaranteed change notifications from the processor (Srinivas Pandruvada). - Fix typo in cpufreq.h (Rafael Wysocki). - Fix tegra driver to handle BPMP errors properly (Mikko Perttunen). - Fix the parameter usage of the newly added perf-domain API (Hector Yuan). - Minor cleanups to cppc, vexpress and s3c244x drivers (Han Wang, Guenter Roeck, and Arnd Bergmann). - Fix kobject memory leaks in cpuidle error paths (Anel Orazgaliyeva). - Make intel_idle enable interrupts before entering C1 on some Xeon processor models (Artem Bityutskiy). - Clean up hib_wait_io() (Falla Coulibaly). - Fix sparse warnings in hibernation-related code (Anders Roxell). - Use vzalloc() and kzalloc() instead of their open-coded equivalents in hibernation-related code (Cai Huoqing). - Prevent user space from crashing the kernel by attempting to restore the system state from a swap partition in use (Ye Bin). - Do not let "syscore" devices runtime-suspend during system PM transitions (Rafael Wysocki). - Do not pause cpuidle in the suspend-to-idle path (Rafael Wysocki). - Pause cpuidle later and resume it earlier during system PM transitions (Rafael Wysocki). - Make system suspend code use valid_state() consistently (Rafael Wysocki). - Add support for enabling wakeup IRQs after invoking the ->runtime_suspend() callback and make two drivers use it (Chunfeng Yun). - Make the association of ACPI device objects with PCI devices more straightforward and simplify the code doing that for all devices in general (Rafael Wysocki). - Eliminate struct pci_platform_pm_ops and handle the both of its users (PCI and Intel MID) directly in the PCI bus code (Rafael Wysocki). - Simplify and clarify ACPI PCI device PM helpers (Rafael Wysocki). - Fix ordering of operations in pci_back_from_sleep() (Rafael Wysocki). - Make exynos-ppmu use hyphens in DT properties (Krzysztof Kozlowski). - Simplify parsing event-type from DT in exynos-ppmu (Krzysztof Kozlowski). - Strengthen check for freq_table in devfreq (Samuel Holland). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmGBkiQSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxDZwP/1kyhVdk+FSMMwEfhn8NriOkE8drJ3nB Y/PWI93oPhgA7ITMwBv4ufLcxx6uYu5bctVDx3XG2lqvFC30t1fzTosJiwireRaS WC5p7ZTqAECxd1LiZWPjL5EzHmtbyzrz3/AxSxOEN1K66qENBd6q5QldKZylxhY3 bawCQjabz6CLXvOjzdv8M8A7Fmd8LTcjHI5Y3/IOcDydPJqyN4/rDCoqft3/xcNB kwN6de73aQwB3AQYufS/VAiNN4XOOkhPwF4QfULmAlnMdCsov6YzahtMB2+oG7O4 G3DF/OVFrONr3GPMMuMJSC6GXyFiBuW8FRva4W9HpY0MA8xVGLPUpwlpaFVaX1+c vAYcRBTyJvOWgRap8+q+UKTlkj37pAgHp7kRiaO1wkVnKxJB1w40OSJZO1nnsExe 3qeCJHOJ9r+S/FsSPKCmws8vr0XQH5wPXY639Kmj9OI/t3gXGrfy3cXm9pa+gSh0 eMyHxtCp5ItT7V2FMpYh+wn+wfe5h//sK3tESZs+h6FKwJG1hYIbG4+F3ztIgzHp t0rT3JXZIkY41KREGFhCMS9+wnLugOik21w9O0qVZfn/dJtDe73Kely7rY/EA3Mw H4aBJDD19BvbIKqaTguxJXEc9zJI737fy/Ze4rrzTDkbXU8qVmjvFoEl2i/Ef4o2 b6aiDdz3V/CW =L2Jo -----END PGP SIGNATURE----- Merge tag 'pm-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These make the power management of PCI devices with ACPI companions more straightforwad, add support for inefficient operating performance points to the Energy model and make cpufreq handle them as appropriate, rearrange the handling of cpuidle during system PM transitions, update a few cpufreq drivers and intel_idle, fix assorded issues and clean up code in multiple places. Specifics: - Add support for inefficient operating performance points to the Energy Model and modify cpufreq to use them properly (Vincent Donnefort). - Rearrange the DTPM framework code to simplify it and make it easier to follow (Daniel Lezcano). - Fix power intialization in DTPM (Daniel Lezcano). - Add CPU load consideration when estimating the instaneous power consumption in DTPM (Daniel Lezcano). - Fix cpu->pstate.turbo_freq initialization in intel_pstate (Zhang Rui). - Make intel_pstate process HWP Guaranteed change notifications from the processor (Srinivas Pandruvada). - Fix typo in cpufreq.h (Rafael Wysocki). - Fix tegra driver to handle BPMP errors properly (Mikko Perttunen). - Fix the parameter usage of the newly added perf-domain API (Hector Yuan). - Minor cleanups to cppc, vexpress and s3c244x drivers (Han Wang, Guenter Roeck, and Arnd Bergmann). - Fix kobject memory leaks in cpuidle error paths (Anel Orazgaliyeva). - Make intel_idle enable interrupts before entering C1 on some Xeon processor models (Artem Bityutskiy). - Clean up hib_wait_io() (Falla Coulibaly). - Fix sparse warnings in hibernation-related code (Anders Roxell). - Use vzalloc() and kzalloc() instead of their open-coded equivalents in hibernation-related code (Cai Huoqing). - Prevent user space from crashing the kernel by attempting to restore the system state from a swap partition in use (Ye Bin). - Do not let "syscore" devices runtime-suspend during system PM transitions (Rafael Wysocki). - Do not pause cpuidle in the suspend-to-idle path (Rafael Wysocki). - Pause cpuidle later and resume it earlier during system PM transitions (Rafael Wysocki). - Make system suspend code use valid_state() consistently (Rafael Wysocki). - Add support for enabling wakeup IRQs after invoking the ->runtime_suspend() callback and make two drivers use it (Chunfeng Yun). - Make the association of ACPI device objects with PCI devices more straightforward and simplify the code doing that for all devices in general (Rafael Wysocki). - Eliminate struct pci_platform_pm_ops and handle the both of its users (PCI and Intel MID) directly in the PCI bus code (Rafael Wysocki). - Simplify and clarify ACPI PCI device PM helpers (Rafael Wysocki). - Fix ordering of operations in pci_back_from_sleep() (Rafael Wysocki). - Make exynos-ppmu use hyphens in DT properties (Krzysztof Kozlowski). - Simplify parsing event-type from DT in exynos-ppmu (Krzysztof Kozlowski). - Strengthen check for freq_table in devfreq (Samuel Holland)" * tag 'pm-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (49 commits) cpufreq: Fix parameter in parse_perf_domain() usb: mtu3: enable wake-up interrupt after runtime_suspend called usb: xhci-mtk: enable wake-up interrupt after runtime_suspend called PM / wakeirq: support enabling wake-up irq after runtime_suspend called PM / devfreq: Strengthen check for freq_table devfreq: exynos-ppmu: simplify parsing event-type from DT devfreq: exynos-ppmu: use node names with hyphens cpufreq: intel_pstate: Fix cpu->pstate.turbo_freq initialization PM: suspend: Use valid_state() consistently PM: sleep: Pause cpuidle later and resume it earlier during system transitions PM: suspend: Do not pause cpuidle in the suspend-to-idle path PM: sleep: Do not let "syscore" devices runtime-suspend during system transitions PM: hibernate: Get block device exclusively in swsusp_check() powercap/drivers/dtpm: Fix power limit initialization powercap/drivers/dtpm: Scale the power with the load powercap/drivers/dtpm: Use container_of instead of a private data field powercap/drivers/dtpm: Simplify the dtpm table powercap/drivers/dtpm: Encapsulate even more the code PM: hibernate: swap: Use vzalloc() and kzalloc() PM: hibernate: fix sparse warnings ... |
||
Rafael J. Wysocki
|
b62b306469 |
Merge branch 'pm-sleep'
Merge updates related to system sleep for 5.16-rc1: - Clean up hib_wait_io() (Falla Coulibaly). - Fix sparse warnings in hibernation-related code (Anders Roxell). - Use vzalloc() and kzalloc() instead of their open-coded equivalents in hibernation-related code (Cai Huoqing). - Prevent user space from crashing the kernel by attempting to restore the system state from a swap partition in use (Ye Bin). - Do not let "syscore" devices runtime-suspend during system PM transitions (Rafael Wysocki). - Do not pause cpuidle in the suspend-to-idle path (Rafael Wysocki). - Pause cpuidle later and resume it earlier during system PM transitions (Rafael Wysocki). - Make system suspend code use valid_state() consistently (Rafael Wysocki). - Add support for enabling wakeup IRQs after invoking the ->runtime_suspend() callback and make two drivers use it (Chunfeng Yun). * pm-sleep: usb: mtu3: enable wake-up interrupt after runtime_suspend called usb: xhci-mtk: enable wake-up interrupt after runtime_suspend called PM / wakeirq: support enabling wake-up irq after runtime_suspend called PM: suspend: Use valid_state() consistently PM: sleep: Pause cpuidle later and resume it earlier during system transitions PM: suspend: Do not pause cpuidle in the suspend-to-idle path PM: sleep: Do not let "syscore" devices runtime-suspend during system transitions PM: hibernate: Get block device exclusively in swsusp_check() PM: hibernate: swap: Use vzalloc() and kzalloc() PM: hibernate: fix sparse warnings Revert "PM: sleep: Do not assume that "mem" is always present" PM: hibernate: Remove blk_status_to_errno in hib_wait_io PM: sleep: Do not assume that "mem" is always present |
||
Linus Torvalds
|
fc02cb2b37 |
Core:
- Remove socket skb caches - Add a SO_RESERVE_MEM socket op to forward allocate buffer space and avoid memory accounting overhead on each message sent - Introduce managed neighbor entries - added by control plane and resolved by the kernel for use in acceleration paths (BPF / XDP right now, HW offload users will benefit as well) - Make neighbor eviction on link down controllable by userspace to work around WiFi networks with bad roaming implementations - vrf: Rework interaction with netfilter/conntrack - fq_codel: implement L4S style ce_threshold_ect1 marking - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap() BPF: - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging as implemented in LLVM14 - Introduce bpf_get_branch_snapshot() to capture Last Branch Records - Implement variadic trace_printk helper - Add a new Bloomfilter map type - Track <8-byte scalar spill and refill - Access hw timestamp through BPF's __sk_buff - Disallow unprivileged BPF by default - Document BPF licensing Netfilter: - Introduce egress hook for looking at raw outgoing packets - Allow matching on and modifying inner headers / payload data - Add NFT_META_IFTYPE to match on the interface type either from ingress or egress Protocols: - Multi-Path TCP: - increase default max additional subflows to 2 - rework forward memory allocation - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS - MCTP flow support allowing lower layer drivers to configure msg muxing as needed - Automatic Multicast Tunneling (AMT) driver based on RFC7450 - HSR support the redbox supervision frames (IEC-62439-3:2018) - Support for the ip6ip6 encapsulation of IOAM - Netlink interface for CAN-FD's Transmitter Delay Compensation - Support SMC-Rv2 eliminating the current same-subnet restriction, by exploiting the UDP encapsulation feature of RoCE adapters - TLS: add SM4 GCM/CCM crypto support - Bluetooth: initial support for link quality and audio/codec offload Driver APIs: - Add a batched interface for RX buffer allocation in AF_XDP buffer pool - ethtool: Add ability to control transceiver modules' power mode - phy: Introduce supported interfaces bitmap to express MAC capabilities and simplify PHY code - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks New drivers: - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89) - Ethernet driver for ASIX AX88796C SPI device (x88796c) Drivers: - Broadcom PHYs - support 72165, 7712 16nm PHYs - support IDDQ-SR for additional power savings - PHY support for QCA8081, QCA9561 PHYs - NXP DPAA2: support for IRQ coalescing - NXP Ethernet (enetc): support for software TCP segmentation - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of Gigabit-capable IP found on RZ/G2L SoC - Intel 100G Ethernet - support for eswitch offload of TC/OvS flow API, including offload of GRE, VxLAN, Geneve tunneling - support application device queues - ability to assign Rx and Tx queues to application threads - PTP and PPS (pulse-per-second) extensions - Broadcom Ethernet (bnxt) - devlink health reporting and device reload extensions - Mellanox Ethernet (mlx5) - offload macvlan interfaces - support HW offload of TC rules involving OVS internal ports - support HW-GRO and header/data split - support application device queues - Marvell OcteonTx2: - add XDP support for PF - add PTP support for VF - Qualcomm Ethernet switch (qca8k): support for QCA8328 - Realtek Ethernet DSA switch (rtl8366rb) - support bridge offload - support STP, fast aging, disabling address learning - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch - Mellanox Ethernet/IB switch (mlxsw) - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping) - offload root TBF qdisc as port shaper - support multiple routing interface MAC address prefixes - support for IP-in-IP with IPv6 underlay - MediaTek WiFi (mt76) - mt7921 - ASPM, 6GHz, SDIO and testmode support - mt7915 - LED and TWT support - Qualcomm WiFi (ath11k) - include channel rx and tx time in survey dump statistics - support for 80P80 and 160 MHz bandwidths - support channel 2 in 6 GHz band - spectral scan support for QCN9074 - support for rx decapsulation offload (data frames in 802.3 format) - Qualcomm phone SoC WiFi (wcn36xx) - enable Idle Mode Power Save (IMPS) to reduce power consumption during idle - Bluetooth driver support for MediaTek MT7922 and MT7921 - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and Realtek 8822C/8852A - Microsoft vNIC driver (mana) - support hibernation and kexec - Google vNIC driver (gve) - support for jumbo frames - implement Rx page reuse Refactor: - Make all writes to netdev->dev_addr go thru helpers, so that we can add this address to the address rbtree and handle the updates - Various TCP cleanups and optimizations including improvements to CPU cache use - Simplify the gnet_stats, Qdisc stats' handling and remove qdisc->running sequence counter - Driver changes and API updates to address devlink locking deficiencies Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGAzX4ACgkQMUZtbf5S IrvW3g//Q0ZLrOuHK9pZ8sCXMMhDj8qL6ajm0otMddHWA/+1UglwVBKFhsajfxOf wJ/5LZis+XKLpLqKTU5chKVfn39HuDGe/D3l+egi01Gv5BW0+XzEhagfyR5tJX5z wsGG5CXO/we/laVSzRiFtwwVEKHKN20YC+tIQwYOYP5Wy3q4G7qDsFhT7GqgsGCS n74QUEAIB5Tz0ODWFqLtbsySzIurXrskibwt5T9bvAAlPw/lCU68mmG+NVJ7VddO lBbNkLMOo8yW9Ci20H09SrYd4jZTmMARo9tsFO1tAvAMk7qpn0Wd8pnOYTjFFoMD +qjiFSVMh7E0JGb8Y7NCvwaB99suAK5rfGP68Xwe62DfP7vYWEx4pZGxBP19F4ld 6Kn1ME33BX9rUF9tBecf0bdKfJUwB2Q2Xou/b9laG04bwiqsc9iG5FQq1C46lnLZ QdzNiS1My4dJMczkWt66HF3Kx30ibwHfvKMIHjf4PqkzEatkv6Y6SBZ57KXL+Lde 0BQSFhbf0tm2Gf55etzrczLElI3uqHSFWUNZZ2Bt6WmzO1e6tpV9nAtRWF4C/dFg QDpLJtOOOY65uq+qz09zoPfv2lem868SrCAuFrVn99bEpYjx/CGNFDeEI02l6jyr 84eUxd364UcbIk3fc+eTGdXHLQNVk30G0AHVBBxaWNIidwfqXeE= =srde -----END PGP SIGNATURE----- Merge tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Remove socket skb caches - Add a SO_RESERVE_MEM socket op to forward allocate buffer space and avoid memory accounting overhead on each message sent - Introduce managed neighbor entries - added by control plane and resolved by the kernel for use in acceleration paths (BPF / XDP right now, HW offload users will benefit as well) - Make neighbor eviction on link down controllable by userspace to work around WiFi networks with bad roaming implementations - vrf: Rework interaction with netfilter/conntrack - fq_codel: implement L4S style ce_threshold_ect1 marking - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap() BPF: - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging as implemented in LLVM14 - Introduce bpf_get_branch_snapshot() to capture Last Branch Records - Implement variadic trace_printk helper - Add a new Bloomfilter map type - Track <8-byte scalar spill and refill - Access hw timestamp through BPF's __sk_buff - Disallow unprivileged BPF by default - Document BPF licensing Netfilter: - Introduce egress hook for looking at raw outgoing packets - Allow matching on and modifying inner headers / payload data - Add NFT_META_IFTYPE to match on the interface type either from ingress or egress Protocols: - Multi-Path TCP: - increase default max additional subflows to 2 - rework forward memory allocation - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS - MCTP flow support allowing lower layer drivers to configure msg muxing as needed - Automatic Multicast Tunneling (AMT) driver based on RFC7450 - HSR support the redbox supervision frames (IEC-62439-3:2018) - Support for the ip6ip6 encapsulation of IOAM - Netlink interface for CAN-FD's Transmitter Delay Compensation - Support SMC-Rv2 eliminating the current same-subnet restriction, by exploiting the UDP encapsulation feature of RoCE adapters - TLS: add SM4 GCM/CCM crypto support - Bluetooth: initial support for link quality and audio/codec offload Driver APIs: - Add a batched interface for RX buffer allocation in AF_XDP buffer pool - ethtool: Add ability to control transceiver modules' power mode - phy: Introduce supported interfaces bitmap to express MAC capabilities and simplify PHY code - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks New drivers: - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89) - Ethernet driver for ASIX AX88796C SPI device (x88796c) Drivers: - Broadcom PHYs - support 72165, 7712 16nm PHYs - support IDDQ-SR for additional power savings - PHY support for QCA8081, QCA9561 PHYs - NXP DPAA2: support for IRQ coalescing - NXP Ethernet (enetc): support for software TCP segmentation - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of Gigabit-capable IP found on RZ/G2L SoC - Intel 100G Ethernet - support for eswitch offload of TC/OvS flow API, including offload of GRE, VxLAN, Geneve tunneling - support application device queues - ability to assign Rx and Tx queues to application threads - PTP and PPS (pulse-per-second) extensions - Broadcom Ethernet (bnxt) - devlink health reporting and device reload extensions - Mellanox Ethernet (mlx5) - offload macvlan interfaces - support HW offload of TC rules involving OVS internal ports - support HW-GRO and header/data split - support application device queues - Marvell OcteonTx2: - add XDP support for PF - add PTP support for VF - Qualcomm Ethernet switch (qca8k): support for QCA8328 - Realtek Ethernet DSA switch (rtl8366rb) - support bridge offload - support STP, fast aging, disabling address learning - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch - Mellanox Ethernet/IB switch (mlxsw) - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping) - offload root TBF qdisc as port shaper - support multiple routing interface MAC address prefixes - support for IP-in-IP with IPv6 underlay - MediaTek WiFi (mt76) - mt7921 - ASPM, 6GHz, SDIO and testmode support - mt7915 - LED and TWT support - Qualcomm WiFi (ath11k) - include channel rx and tx time in survey dump statistics - support for 80P80 and 160 MHz bandwidths - support channel 2 in 6 GHz band - spectral scan support for QCN9074 - support for rx decapsulation offload (data frames in 802.3 format) - Qualcomm phone SoC WiFi (wcn36xx) - enable Idle Mode Power Save (IMPS) to reduce power consumption during idle - Bluetooth driver support for MediaTek MT7922 and MT7921 - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and Realtek 8822C/8852A - Microsoft vNIC driver (mana) - support hibernation and kexec - Google vNIC driver (gve) - support for jumbo frames - implement Rx page reuse Refactor: - Make all writes to netdev->dev_addr go thru helpers, so that we can add this address to the address rbtree and handle the updates - Various TCP cleanups and optimizations including improvements to CPU cache use - Simplify the gnet_stats, Qdisc stats' handling and remove qdisc->running sequence counter - Driver changes and API updates to address devlink locking deficiencies" * tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2122 commits) Revert "net: avoid double accounting for pure zerocopy skbs" selftests: net: add arp_ndisc_evict_nocarrier net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter net: arp: introduce arp_evict_nocarrier sysctl parameter libbpf: Deprecate AF_XDP support kbuild: Unify options for BTF generation for vmlinux and modules selftests/bpf: Add a testcase for 64-bit bounds propagation issue. bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit. bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off. net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.c net: avoid double accounting for pure zerocopy skbs tcp: rename sk_wmem_free_skb netdevsim: fix uninit value in nsim_drv_configure_vfs() selftests/bpf: Fix also no-alu32 strobemeta selftest bpf: Add missing map_delete_elem method to bloom filter map selftests/bpf: Add bloom map success test for userspace calls bpf: Add alignment padding for "map_extra" + consolidate holes bpf: Bloom filter map naming fixups selftests/bpf: Add test cases for struct_ops prog bpf: Add dummy BPF STRUCT_OPS for test purpose ... |
||
Linus Torvalds
|
d2cdb12231 |
regmap: Update for v5.16
This update has a single change which will use the maximum transfer and message sizes advertised by SPI controllers to configure limits within the regmap core, ensuring better interoperation. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmF/6F8ACgkQJNaLcl1U h9Calgf+PaI6wr17nU0/m4N50Vjat0bL7iBuEQYPKFWu963M0QasPZEKteJV7sa+ 8XVxcBh43E2unQkY+iMfldmBIEfZJwmsaWecdxSair6cWcm8MT6agOOkl6KjiCHM AQiPI2+UbXDHnPociBmyA+MqI3hWSPZIMirVC+3WAFGLs8wHuRe0G97egkcPh88a doQ5f2Ei7eCgP4uy+p93S4QWXbsbQVm/yszw0BH0gaqnBylwUfQtU1vdlLyBXW4y 9bB1MnzCdlLbKf4bQd7NcjB2bA8lgAuZ3t5rJoRjZtnf+//fRTB1cu/UkLRFlZme vsrt60wo52r2v5tiLKA4wrgcvEFR+A== =8MR4 -----END PGP SIGNATURE----- Merge tag 'regmap-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap update from Mark Brown: "A single change to use the maximum transfer and message sizes advertised by SPI controllers to configure limits within the regmap core, ensuring better interoperation" * tag 'regmap-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: spi: Set regmap max raw r/w from max_transfer_size |
||
Linus Torvalds
|
9a7e0a90a4 |
Scheduler updates:
- Revert the printk format based wchan() symbol resolution as it can leak the raw value in case that the symbol is not resolvable. - Make wchan() more robust and work with all kind of unwinders by enforcing that the task stays blocked while unwinding is in progress. - Prevent sched_fork() from accessing an invalid sched_task_group - Improve asymmetric packing logic - Extend scheduler statistics to RT and DL scheduling classes and add statistics for bandwith burst to the SCHED_FAIR class. - Properly account SCHED_IDLE entities - Prevent a potential deadlock when initial priority is assigned to a newly created kthread. A recent change to plug a race between cpuset and __sched_setscheduler() introduced a new lock dependency which is now triggered. Break the lock dependency chain by moving the priority assignment to the thread function. - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems. - Improve idle balancing in general and especially for NOHZ enabled systems. - Provide proper interfaces for live patching so it does not have to fiddle with scheduler internals. - Add cluster aware scheduling support. - A small set of tweaks for RT (irqwork, wait_task_inactive(), various scheduler options and delaying mmdrop) - The usual small tweaks and improvements all over the place -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/OUkTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoR/5D/9ikdGNpKg9osNqJ3GjAmxsK6kVkB29 iFe2k8pIpWDToWQf/wQRGih4Yj3Cl49QSnZcPIibh2/12EB1qrrW6iSPJkInz8Ec /1LS5/Vewn2OyoxyXZjdvGC5gTXEodSbIazASvX7nvdMeI4gsAsL5etzrMJirT/t aymqvr7zovvywrwMTQJrGjUMo9l4ewE8tafMNNhRu1BHU1U4ojM9yvThyRAAcmp7 3Xy49A+Yq3IgrvYI4u8FMK5Zh08KaxSFjiLhePGm/bF+wSfYmWop2TP1jY05W2Uo ti8hfbJMUoFRYuMxAiEldkItnc0wV4M9PtWZZ/x+B71bs65Y4Zjt9cW+rxJv2+m1 vzV31EsQwGnOti072dzWN4c/cZqngVXAjaNtErvDwJUr+Tw1ayv9KUvuodMQqZY6 mu68bFUO2kV9EMe1CBOv51Uy1RGHyLj3rlNqrkw+Xp5ISE9Ad2vhUEiRp5bQx5Ci V/XFhGZkGUluh0vccrdFlNYZwhj8cZEzkOPCnPSeZ+bq8SyZE6xuHH/lTP1CJCOy s800rW1huM+kgV+zRN8adDkGXibAk9N3RtVGnQXmuEy8gB9LZmQg+JeM2wsc9B+6 i0gdqZnsjNAfoK+BBAG4holxptSL8/eOJsFH8ZNIoxQ+iqooyPx9tFX7yXnRTBQj d2qWG7UvoseT+g== =fgtS -----END PGP SIGNATURE----- Merge tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - Revert the printk format based wchan() symbol resolution as it can leak the raw value in case that the symbol is not resolvable. - Make wchan() more robust and work with all kind of unwinders by enforcing that the task stays blocked while unwinding is in progress. - Prevent sched_fork() from accessing an invalid sched_task_group - Improve asymmetric packing logic - Extend scheduler statistics to RT and DL scheduling classes and add statistics for bandwith burst to the SCHED_FAIR class. - Properly account SCHED_IDLE entities - Prevent a potential deadlock when initial priority is assigned to a newly created kthread. A recent change to plug a race between cpuset and __sched_setscheduler() introduced a new lock dependency which is now triggered. Break the lock dependency chain by moving the priority assignment to the thread function. - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems. - Improve idle balancing in general and especially for NOHZ enabled systems. - Provide proper interfaces for live patching so it does not have to fiddle with scheduler internals. - Add cluster aware scheduling support. - A small set of tweaks for RT (irqwork, wait_task_inactive(), various scheduler options and delaying mmdrop) - The usual small tweaks and improvements all over the place * tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits) sched/fair: Cleanup newidle_balance sched/fair: Remove sysctl_sched_migration_cost condition sched/fair: Wait before decaying max_newidle_lb_cost sched/fair: Skip update_blocked_averages if we are defering load balance sched/fair: Account update_blocked_averages in newidle_balance cost x86: Fix __get_wchan() for !STACKTRACE sched,x86: Fix L2 cache mask sched/core: Remove rq_relock() sched: Improve wake_up_all_idle_cpus() take #2 irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support. sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ sched: Add cluster scheduler level for x86 sched: Add cluster scheduler level in core and related Kconfig for ARM64 topology: Represent clusters of CPUs within a die sched: Disable -Wunused-but-set-variable sched: Add wrapper for get_wchan() to keep task blocked x86: Fix get_wchan() to support the ORC unwinder proc: Use task_is_running() for wchan in /proc/$pid/stat ... |
||
Jakub Kicinski
|
7df621a3ee |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h |
||
Linus Torvalds
|
8685de2ed8 |
regmap: Fix for v5.15
This fixes a potential double free when handling an out of memory error inserting a node into an rbtree regcache. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmF6lHQACgkQJNaLcl1U h9DNugf/WXDIhnn1ToxWpe80tGZFgTcyXN1T6QfCiP5iKbOsYHFYrITDUhnbCrV+ 4z5YngU8Jw2bz6nx72Yt3c4bwQRl6ZESgH6Rupr1nClQO06k8bLySLVWJPH1V3y4 ofzdSc7x6JVHaVLj09NR6N3rrOKbmMxS8Ny0RMr7Futu0cPog0/7f57zG/Mkq9Fw vBsLvtpwt9cAOPRl7lxiU3NXKVdWYyIXxOCGW7z+e7u/HK7W5ByELrgXo1BFLXST dUks5d5qKHmExB/7plxv8mxOQlNHVimTxXpt2sL9Z9FoQR3OCs5kfG+eaIDyx5UW Y10gaxu61sRO6g9EqUPhfzOkiAYk5Q== =hMOi -----END PGP SIGNATURE----- Merge tag 'regmap-fix-v5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fix from Mark Brown: "This fixes a potential double free when handling an out of memory error inserting a node into an rbtree regcache" * tag 'regmap-fix-v5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: Fix possible double-free in regcache_rbtree_exit() |
||
Chunfeng Yun
|
259714100d |
PM / wakeirq: support enabling wake-up irq after runtime_suspend called
When the dedicated wake IRQ is level trigger, and it uses the device's low-power status as the wakeup source, that means if the device is not in low-power state, the wake IRQ will be triggered if enabled; For this case, need enable the wake IRQ after running the device's ->runtime_suspend() which make it enter low-power state. e.g. Assume the wake IRQ is a low level trigger type, and the wakeup signal comes from the low-power status of the device. The wakeup signal is low level at running time (0), and becomes high level when the device enters low-power state (runtime_suspend (1) is called), a wakeup event at (2) make the device exit low-power state, then the wakeup signal also becomes low level. ------------------ | ^ ^| ---------------- | | -------------- |<---(0)--->|<--(1)--| (3) (2) (4) if enable the wake IRQ before running runtime_suspend during (0), a wake IRQ will arise, it causes resume immediately; it works if enable wake IRQ ( e.g. at (3) or (4)) after running ->runtime_suspend(). This patch introduces a new status WAKE_IRQ_DEDICATED_REVERSE to optionally support enabling wake IRQ after running ->runtime_suspend(). Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
Andy Shevchenko
|
27e0bcd029 |
device property: Drop redundant NULL checks
In cases when functions are called via fwnode operations, we already know that this is software node we are dealing with, hence no need to check if it's NULL, it can't be, Reported-by: YE Chengfeng <cyeaa@connect.ust.hk> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20211026162954.89811-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Rafael J. Wysocki
|
23f62d7ab2 |
PM: sleep: Pause cpuidle later and resume it earlier during system transitions
Commit
|
||
Rafael J. Wysocki
|
8d89835b04 |
PM: suspend: Do not pause cpuidle in the suspend-to-idle path
It is pointless to pause cpuidle in the suspend-to-idle path, because it is going to be resumed in the same path later and pausing it does not serve any particular purpose in that case. Rework the code to avoid doing that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> |
||
Sean Anderson
|
65aa371ea5 |
net: Convert more users of mdiobus_* to mdiodev_*
This converts users of mdiobus to mdiodev using the following semantic patch: @@ identifier mdiodev; expression regnum; @@ - mdiobus_read(mdiodev->bus, mdiodev->addr, regnum) + mdiodev_read(mdiodev, regnum) @@ identifier mdiodev; expression regnum, val; @@ - mdiobus_write(mdiodev->bus, mdiodev->addr, regnum, val) + mdiodev_write(mdiodev, regnum, val) Signed-off-by: Sean Anderson <sean.anderson@seco.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Lucas Tanure
|
f231ff38b7
|
regmap: spi: Set regmap max raw r/w from max_transfer_size
Set regmap raw read/write from spi max_transfer_size so regmap_raw_read/write can split the access into chunks Signed-off-by: Lucas Tanure <tanureal@opensource.cirrus.com> Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> [André: fix build warning] Signed-off-by: André Almeida <andrealmeid@collabora.com> Link: https://lore.kernel.org/r/20211021132721.13669-1-andrealmeid@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org> |
||
Rafael J. Wysocki
|
928265e360 |
PM: sleep: Do not let "syscore" devices runtime-suspend during system transitions
There is no reason to allow "syscore" devices to runtime-suspend
during system-wide PM transitions, because they are subject to the
same possible failure modes as any other devices in that respect.
Accordingly, change device_prepare() and device_complete() to call
pm_runtime_get_noresume() and pm_runtime_put(), respectively, for
"syscore" devices too.
Fixes:
|
||
Luis Chamberlain
|
e2e2c0f20f |
firmware_loader: move struct builtin_fw to the only place used
Now that x86 doesn't abuse picking at internals to the firmware loader move out the built-in firmware struct to its only user. Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211021155843.1969401-5-mcgrof@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Luis Chamberlain
|
48d09e9787 |
firmware_loader: formalize built-in firmware API
Formalize the built-in firmware with a proper API. This can later be used by other callers where all they need is built-in firmware. We export the firmware_request_builtin() call for now only under the TEST_FIRMWARE symbol namespace as there are no direct modular users for it. If they pop up they are free to export it generally. Built-in code always gets access to the callers and we'll demonstrate a hidden user which has been lurking in the kernel for a while and the reason why using a proper API was better long term. Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211021155843.1969401-2-mcgrof@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
David S. Miller
|
bdfa75ad70 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Lots of simnple overlapping additions. With a build fix from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Kai Vehmanen
|
c87761db21 |
component: do not leave master devres group open after bind
In current code, the devres group for aggregate master is left open after call to component_master_add_*(). This leads to problems when the master does further managed allocations on its own. When any participating driver calls component_del(), this leads to immediate release of resources. This came up when investigating a page fault occurring with i915 DRM driver unbind with 5.15-rc1 kernel. The following sequence occurs: i915_pci_remove() -> intel_display_driver_unregister() -> i915_audio_component_cleanup() -> component_del() -> component.c:take_down_master() -> hdac_component_master_unbind() [via master->ops->unbind()] -> devres_release_group(master->parent, NULL) With older kernels this has not caused issues, but with audio driver moving to use managed interfaces for more of its allocations, this no longer works. Devres log shows following to occur: component_master_add_with_match() [ 126.886032] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000323ccdc5 devm_component_match_release (24 bytes) [ 126.886045] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000865cdb29 grp< (0 bytes) [ 126.886049] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 grp< (0 bytes) audio driver completes its PCI probe() [ 126.892238] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 pcim_iomap_release (48 bytes) component_del() called() at DRM/i915 unbind() [ 137.579422] i915 0000:00:02.0: DEVRES REL 00000000ef44c293 grp< (0 bytes) [ 137.579445] snd_hda_intel 0000:00:1f.3: DEVRES REL 00000000865cdb29 grp< (0 bytes) [ 137.579458] snd_hda_intel 0000:00:1f.3: DEVRES REL 000000001b480725 pcim_iomap_release (48 bytes) So the "devres_release_group(master->parent, NULL)" ends up freeing the pcim_iomap allocation. Upon next runtime resume, the audio driver will cause a page fault as the iomap alloc was released without the driver knowing about it. Fix this issue by using the "struct master" pointer as identifier for the devres group, and by closing the devres group after the master->ops->bind() call is done. This allows devres allocations done by the driver acting as master to be isolated from the binding state of the aggregate driver. This modifies the logic originally introduced in commit |
||
Andy Shevchenko
|
a164ff53cb |
driver core: Provide device_match_acpi_handle() helper
We have a couple of users of this helper, make it available for them. The prototype for the helper is specifically crafted in order to be easily used with bus_find_device() call. That's why its location is in the driver core rather than ACPI. Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20211014134756.39092-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Greg Kroah-Hartman
|
b5bc8ac25a |
Merge 5.15-rc6 into driver-core-next
We need the driver-core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Linus Torvalds
|
cf52ad5ff1 |
Driver core fixes for 5.15-rc6
Here are some small driver core fixes for 5.15-rc6, all of which have been in linux-next for a while with no reported issues. They include: - kernfs negative dentry bugfix - simple pm bus fixes to resolve reported issues Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYWvzFg8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ymfLQCfSCP698AAvoCgG0fOfLakFkw80h0AoKIVm3lk t0GUdnplU18CjnO5M1Zj =+dh9 -----END PGP SIGNATURE----- Merge tag 'driver-core-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some small driver core fixes for 5.15-rc6, all of which have been in linux-next for a while with no reported issues. They include: - kernfs negative dentry bugfix - simple pm bus fixes to resolve reported issues" * tag 'driver-core-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: drivers: bus: Delete CONFIG_SIMPLE_PM_BUS drivers: bus: simple-pm-bus: Add support for probing simple bus only devices driver core: Reject pointless SYNC_STATE_ONLY device links kernfs: don't create a negative dentry if inactive node exists |
||
Jonathan Cameron
|
c5e22feffd |
topology: Represent clusters of CPUs within a die
Both ACPI and DT provide the ability to describe additional layers of topology between that of individual cores and higher level constructs such as the level at which the last level cache is shared. In ACPI this can be represented in PPTT as a Processor Hierarchy Node Structure [1] that is the parent of the CPU cores and in turn has a parent Processor Hierarchy Nodes Structure representing a higher level of topology. For example Kunpeng 920 has 6 or 8 clusters in each NUMA node, and each cluster has 4 cpus. All clusters share L3 cache data, but each cluster has local L3 tag. On the other hand, each clusters will share some internal system bus. +-----------------------------------+ +---------+ | +------+ +------+ +--------------------------+ | | | CPU0 | | cpu1 | | +-----------+ | | | +------+ +------+ | | | | | | +----+ L3 | | | | +------+ +------+ cluster | | tag | | | | | CPU2 | | CPU3 | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | | +-----------------------------------+ | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | | | L3 | | | | +------+ +------+ +----+ tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | L3 | | data | +-----------------------------------+ | | | +------+ +------+ | +-----------+ | | | | | | | | | | | | | +------+ +------+ +----+ L3 | | | | | | tag | | | | +------+ +------+ | | | | | | | | | | | +-----------+ | | | +------+ +------+ +--------------------------+ | +-----------------------------------| | | +-----------------------------------| | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | +----+ L3 | | | | +------+ +------+ | | tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | | +-----------------------------------+ | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | | | L3 | | | | +------+ +------+ +---+ tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | | +-----------------------------------+ | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | | | L3 | | | | +------+ +------+ +--+ tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | +---------+ +-----------------------------------+ That means spreading tasks among clusters will bring more bandwidth while packing tasks within one cluster will lead to smaller cache synchronization latency. So both kernel and userspace will have a chance to leverage this topology to deploy tasks accordingly to achieve either smaller cache latency within one cluster or an even distribution of load among clusters for higher throughput. This patch exposes cluster topology to both kernel and userspace. Libraried like hwloc will know cluster by cluster_cpus and related sysfs attributes. PoC of HWLOC support at [2]. Note this patch only handle the ACPI case. Special consideration is needed for SMT processors, where it is necessary to move 2 levels up the hierarchy from the leaf nodes (thus skipping the processor core level). Note that arm64 / ACPI does not provide any means of identifying a die level in the topology but that may be unrelate to the cluster level. [1] ACPI Specification 6.3 - section 5.2.29.1 processor hierarchy node structure (Type 0) [2] https://github.com/hisilicon/hwloc/tree/linux-cluster Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210924085104.44806-2-21cnbao@gmail.com |
||
Jakub Kicinski
|
e15f5972b8 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
tools/testing/selftests/net/ioam6.sh |
||
Yang Yingliang
|
55e6d80378
|
regmap: Fix possible double-free in regcache_rbtree_exit()
In regcache_rbtree_insert_to_block(), when 'present' realloc failed,
the 'blk' which is supposed to assign to 'rbnode->block' will be freed,
so 'rbnode->block' points a freed memory, in the error handling path of
regcache_rbtree_init(), 'rbnode->block' will be freed again in
regcache_rbtree_exit(), KASAN will report double-free as follows:
BUG: KASAN: double-free or invalid-free in kfree+0xce/0x390
Call Trace:
slab_free_freelist_hook+0x10d/0x240
kfree+0xce/0x390
regcache_rbtree_exit+0x15d/0x1a0
regcache_rbtree_init+0x224/0x2c0
regcache_init+0x88d/0x1310
__regmap_init+0x3151/0x4a80
__devm_regmap_init+0x7d/0x100
madera_spi_probe+0x10f/0x333 [madera_spi]
spi_probe+0x183/0x210
really_probe+0x285/0xc30
To fix this, moving up the assignment of rbnode->block to immediately after
the reallocation has succeeded so that the data structure stays valid even
if the second reallocation fails.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes:
|
||
Linus Torvalds
|
fa58787605 |
linux-kselftest-kunit-fixes-5.15-rc6
This KUnit fixes update for Linux 5.15-rc6 consists of: - Fixes to address the structleak plugin causing the stack frame size to grow immensely when used with KUnit. Fixes include adding a new makefile to disable structleak and using it from KUnit iio, device property, thunderbolt, and bitfield tests to disable it. - KUnit framework reference count leak in kfree_at_end - KUnit tool fix to resolve conflict between --json and --raw_output and generate correct test output in either case. - kernel-doc warnings due to mismatched arg names -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmFkqr8ACgkQCwJExA0N QxyziBAAn+02Iq8Kd1p9/+ajr78iQKmjIpSHH51R6wPwOBJg0RgusyuheJ/pccl1 OOt7sAIdoBWlbZjkwi9SOoFabdvtAO6i7SWpw1R0AdWT5iG4ewe17ODIJiGQ6l9E 4fNayU+XFunXSqxxEsiSJYO5ztoyDDNOqWJ1a43r+9j6ApsHPH3aKwtx4hFutaJA TAUvSOJOMv93SWTHdQh1ihHMVBkhvdnnpSvsRzJFzkEmI5rmL+p7HcbuM0Zk/gyD 9obygEx8QsJ4pg1AJs8qAj1YbV599qbVhBYYr3EewWscyWME4RrIiXPGImcYKN3/ 99S5pOVTI+wp12r5c8WOdXta+Qe+1RvT9wvTPS+msGQakQhIq0eNePtO1AwIoIor S+36JbJdW1rOepg1WssL55F6+re//GYdluSUGwmwYCL3eUEIMpfRqeamg7BKQH3j gfukFJvLzzdLiyeYRg2f/G3Vxwa18shlIJgz+6ADCmemura3waeelTG0zFWiNsd5 XYXXhrxV8eq+Cj5ee/iBddKIgn2+QjFtDmZtsLtiZts3aV7zlugYm2bAYCN1Yxsu ZDvymnZUozoOCRFvBe60Eo4DSTDTnmAaRRnuX5L/x1ybpAUvwZQxzyNDIuvfZKQ9 JB8LrxAJDz7G4jj/9OY5IeacoXNKqDVZYK+Kb9L7hAS08ITk6vU= =08Ra -----END PGP SIGNATURE----- Merge tag 'linux-kselftest-kunit-fixes-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kunit fixes from Shuah Khan: - Fixes to address the structleak plugin causing the stack frame size to grow immensely when used with KUnit. Fixes include adding a new makefile to disable structleak and using it from KUnit iio, device property, thunderbolt, and bitfield tests to disable it. - KUnit framework reference count leak in kfree_at_end - KUnit tool fix to resolve conflict between --json and --raw_output and generate correct test output in either case. - kernel-doc warnings due to mismatched arg names * tag 'linux-kselftest-kunit-fixes-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: fix kernel-doc warnings due to mismatched arg names bitfield: build kunit tests without structleak plugin thunderbolt: build kunit tests without structleak plugin device property: build kunit tests without structleak plugin iio/test-format: build kunit tests without structleak plugin gcc-plugins/structleak: add makefile var for disabling structleak kunit: fix reference count leak in kfree_at_end kunit: tool: better handling of quasi-bool args (--json, --raw_output) |
||
Jakub Kicinski
|
9fe1155233 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Jakub Kicinski
|
433baf0719 |
device property: move mac addr helpers to eth.c
Move the mac address helpers out, eth.c already contains a bunch of similar helpers. Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Brendan Higgins
|
6a1e2d93d5 |
device property: build kunit tests without structleak plugin
The structleak plugin causes the stack frame size to grow immensely when used with KUnit: ../drivers/base/test/property-entry-test.c:492:1: warning: the frame size of 2832 bytes is larger than 2048 bytes [-Wframe-larger-than=] ../drivers/base/test/property-entry-test.c:322:1: warning: the frame size of 2080 bytes is larger than 2048 bytes [-Wframe-larger-than=] ../drivers/base/test/property-entry-test.c:250:1: warning: the frame size of 4976 bytes is larger than 2048 bytes [-Wframe-larger-than=] ../drivers/base/test/property-entry-test.c:115:1: warning: the frame size of 3280 bytes is larger than 2048 bytes [-Wframe-larger-than=] Turn it off in this file. Signed-off-by: Brendan Higgins <brendanhiggins@google.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> |
||
Saravana Kannan
|
f729a592ad |
driver core: Reject pointless SYNC_STATE_ONLY device links
SYNC_STATE_ONLY device links intentionally allow cycles because cyclic
sync_state() dependencies are valid and necessary.
However a SYNC_STATE_ONLY device link where the consumer and the supplier
are the same device is pointless because the device link would be deleted
as soon as the device probes (because it's also the consumer) and won't
affect when the sync_state() callback is called. It's a waste of CPU cycles
and memory to create this device link. So reject any attempts to create
such a device link.
Fixes:
|
||
Luis Chamberlain
|
0f8d7ccc2e |
firmware_loader: add a sanity check for firmware_request_builtin()
Right now firmware_request_builtin() is used internally only and so we have control over the callers. But if we want to expose that API more broadly we should ensure the firmware pointer is valid. This doesn't fix any known issue, it just prepares us to later expose this API to other users. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20210917182226.3532898-4-mcgrof@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Luis Chamberlain
|
7c4fd90741 |
firmware_loader: split built-in firmware call
There are two ways the firmware_loader can use the built-in firmware: with or without the pre-allocated buffer. We already have one explicit use case for each of these, and so split them up so that it is clear what the intention is on the caller side. This also paves the way so that eventually other callers outside of the firmware loader can uses these if and when needed. While at it, adopt the firmware prefix for the routine names. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20210917182226.3532898-3-mcgrof@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Luis Chamberlain
|
f7a07f7b96 |
firmware_loader: fix pre-allocated buf built-in firmware use
The firmware_loader can be used with a pre-allocated buffer through the use of the API calls: o request_firmware_into_buf() o request_partial_firmware_into_buf() If the firmware was built-in and present, our current check for if the built-in firmware fits into the pre-allocated buffer does not return any errors, and we proceed to tell the caller that everything worked fine. It's a lie and no firmware would end up being copied into the pre-allocated buffer. So if the caller trust the result it may end up writing a bunch of 0's to a device! Fix this by making the function that checks for the pre-allocated buffer return non-void. Since the typical use case is when no pre-allocated buffer is provided make this return successfully for that case. If the built-in firmware does *not* fit into the pre-allocated buffer size return a failure as we should have been doing before. I'm not aware of users of the built-in firmware using the API calls with a pre-allocated buffer, as such I doubt this fixes any real life issue. But you never know... perhaps some oddball private tree might use it. In so far as upstream is concerned this just fixes our code for correctness. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20210917182226.3532898-2-mcgrof@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Mianhan Liu
|
30b7ecf731 |
drivers/base/component.c: remove superfluous header files from component.c
component.c hasn't use any macro or function declared in linux/kref.h. Thus, these files can be removed from component.c safely without affecting the compilation of the drivers/base/ module Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Link: https://lore.kernel.org/r/20210928193849.28717-1-liumh1@shanghaitech.edu.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Mianhan Liu
|
b39214911a |
drivers/base/arch_topology.c: remove superfluous header
arch_topology.c hasn't use any macro or function declared in linux/percpu.h, linux/smp.h and linux/string.h. Thus, these files can be removed from arch_topology.c safely without affecting the compilation of the drivers/base/ module Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Link: https://lore.kernel.org/r/20210928193138.24192-1-liumh1@shanghaitech.edu.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Max Gurtovoy
|
d460d7f7bb |
driver core: use NUMA_NO_NODE during device_initialize
Don't use (-1) constant for setting initial device node. Instead, use the generic NUMA_NO_NODE definition to indicate that "no node id specified". Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/20211004133453.18881-1-mgurtovoy@nvidia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Yang Yingliang
|
df0a181494 |
driver core: Fix possible memory leak in device_link_add()
I got memory leak as follows:
unreferenced object 0xffff88801f0b2200 (size 64):
comm "i2c-lis2hh12-21", pid 5455, jiffies 4294944606 (age 15.224s)
hex dump (first 32 bytes):
72 65 67 75 6c 61 74 6f 72 3a 72 65 67 75 6c 61 regulator:regula
74 6f 72 2e 30 2d 2d 69 32 63 3a 31 2d 30 30 31 tor.0--i2c:1-001
backtrace:
[<00000000bf5b0c3b>] __kmalloc_track_caller+0x19f/0x3a0
[<0000000050da42d9>] kvasprintf+0xb5/0x150
[<000000004bbbed13>] kvasprintf_const+0x60/0x190
[<00000000cdac7480>] kobject_set_name_vargs+0x56/0x150
[<00000000bf83f8e8>] dev_set_name+0xc0/0x100
[<00000000cc1cf7e3>] device_link_add+0x6b4/0x17c0
[<000000009db9faed>] _regulator_get+0x297/0x680
[<00000000845e7f2b>] _devm_regulator_get+0x5b/0xe0
[<000000003958ee25>] st_sensors_power_enable+0x71/0x1b0 [st_sensors]
[<000000005f450f52>] st_accel_i2c_probe+0xd9/0x150 [st_accel_i2c]
[<00000000b5f2ab33>] i2c_device_probe+0x4d8/0xbe0
[<0000000070fb977b>] really_probe+0x299/0xc30
[<0000000088e226ce>] __driver_probe_device+0x357/0x500
[<00000000c21dda32>] driver_probe_device+0x4e/0x140
[<000000004e650441>] __device_attach_driver+0x257/0x340
[<00000000cf1891b8>] bus_for_each_drv+0x166/0x1e0
When device_register() returns an error, the name allocated in dev_set_name()
will be leaked, the put_device() should be used instead of kfree() to give up
the device reference, then the name will be freed in kobject_cleanup() and the
references of consumer and supplier will be decreased in device_link_release_fn().
Fixes:
|
||
Greg Kroah-Hartman
|
bb76c82358 |
Merge 5.15-rc4 into driver-core-next
We need the driver core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Linus Torvalds
|
84928ce3bb |
Driver core fixes for 5.15-rc4
Here are some driver core and kernfs fixes for reported issues for 5.15-rc4. These fixes include: - kernfs positive dentry bugfix - debugfs_create_file_size error path fix - cpumask sysfs file bugfix to preserve the user/kernel abi (has been reported multiple times.) - devlink fixes for mdiobus devices as reported by the subsystem maintainers. Also included in here are some devlink debugging changes to make it easier for people to report problems when asked. They have already helped with the mdiobus and other subsystems reporting issues. All of these have been linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYVmC1A8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykfCQCgtQ0kdPoPdUG6mPCn45rrbJQLBY4AnRL5JyhO zB60l6C2EHHnLnRxMnQq =gygB -----END PGP SIGNATURE----- Merge tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some driver core and kernfs fixes for reported issues for 5.15-rc4. These fixes include: - kernfs positive dentry bugfix - debugfs_create_file_size error path fix - cpumask sysfs file bugfix to preserve the user/kernel abi (has been reported multiple times.) - devlink fixes for mdiobus devices as reported by the subsystem maintainers. Also included in here are some devlink debugging changes to make it easier for people to report problems when asked. They have already helped with the mdiobus and other subsystems reporting issues. All of these have been linux-next for a while with no reported issues" * tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: kernfs: also call kernfs_set_rev() for positive dentry driver core: Add debug logs when fwnode links are added/deleted driver core: Create __fwnode_link_del() helper function driver core: Set deferred probe reason when deferred by driver core net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus parents driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD driver core: fw_devlink: Improve handling of cyclic dependencies cpumask: Omit terminating null byte in cpumap_print_{list,bitmask}_to_buf debugfs: debugfs_create_file_size(): use IS_ERR to check for error |
||
Saravana Kannan
|
ebd6823af3 |
driver core: Add debug logs when fwnode links are added/deleted
This will help with debugging fw_devlink issues. Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20210915172808.620546-4-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Saravana Kannan
|
76f130810b |
driver core: Create __fwnode_link_del() helper function
The same code is repeated in multiple locations. Create a helper function for it. Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20210915172808.620546-3-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Saravana Kannan
|
68223eeec7 |
driver core: Set deferred probe reason when deferred by driver core
When the driver core defers the probe of a device, set the deferred probe reason so that it's easier to debug. The deferred probe reason is available in debugfs under devices_deferred. Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20210915172808.620546-2-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Linus Torvalds
|
47d7e65d64 |
Device properties framework fix for 5.15-rc3
Fix software node refcount imbalance on device removal (Laurentiu Tudor). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmFOBoUSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxcgQP/RJGlD5rm2W3LSTi0MpdHm88PWTH/6JL XYqGDJ42DOS6+pm7KiCoI+XZ7plXtT9FL3kPwgj4+RSHfZQV7r5DVOFwXUa5lpkG IDeU8H/DXil/9VZSioxCN7HSE1mUsMj1eZM4odf9eZBiccLxdn050k7by4ZVt0Dx 3OnBJ8Tv+4jp27XuuEiOFku2m2QA0myK0Fg7tHQ8D6UoBgfxawVMkFsDrNFOBEsx xKmGXml7VFsEbX6+FeirJCRLTGln2EOLbdHcwMbCzaOzgEp9MvuLm96PlLANyohk rZoAoasXmjMMgHk2tOC5nkEtxINVKWqM18HQqyzBLtx9fn/YP94HXXC00SzsZAwJ NQ8vpwtbHzHI8QydS8lJ9Sp8PLQGcHRrbut5Is1oXF3f8uNViyE1IHmuPh/AW0Mg UjQ/cqZ1yWDoYYaJ2Ar2G6uwFyfMmgPhoNOlUmW1niZqEm3iMA1VRd2EbSXiAbpu ExLzNh7dljDoVgP6Vzhqlj6oyV4JYZCHIuMjBixmheVb282pqanQaxsk877rTQiR KAk9d/V7GluXvNnkVQA1j8k8ONqEx4M2dxgn6tIFMGhT7eQeq/tjlljl7x3C1OW5 iTQPmXzEr1mLPJ8HwdNLDFDbNZ1HEsLUS+tb6nz+1uwvyvhnJqEX2ewbx5nRUdrY XBMVa46lxeh8 =PFOV -----END PGP SIGNATURE----- Merge tag 'devprop-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device properties framework fix from Rafael Wysocki: "Fix software node refcount imbalance on device removal (Laurentiu Tudor)" * tag 'devprop-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: software node: balance refcount for managed software nodes |
||
Saravana Kannan
|
5501765a02 |
driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
If a parent device is also a supplier to a child device, fw_devlink=on by
design delays the probe() of the child device until the probe() of the
parent finishes successfully.
However, some drivers of such parent devices (where parent is also a
supplier) expect the child device to finish probing successfully as soon as
they are added using device_add() and before the probe() of the parent
device has completed successfully. One example of such a case is discussed
in the link mentioned below.
Add a flag to make fw_devlink=on not enforce these supplier-consumer
relationships, so these drivers can continue working.
Link: https://lore.kernel.org/netdev/CAGETcx_uj0V4DChME-gy5HGKTYnxLBX=TH2rag29f_p=UcG+Tg@mail.gmail.com/
Fixes:
|
||
Douglas Anderson
|
7065f92255 |
driver core: Clarify that dev_err_probe() is OK even w/out -EPROBE_DEFER
There is some debate about whether it's deemed acceptable to call dev_err_probe() if you know that the error code can never be -EPROBE_DEFER. Clarify in the function comments that this is OK. Specifically this makes us able to transform code like this: ret = do_something_that_cant_defer(); if (ret < 0) { dev_err(dev, "The foo failed to bar (%pe)\n", ERR_PTR(ret)); return ret; } to code like this: ret = do_something_that_cant_defer(); if (ret < 0) return dev_err_probe(dev, ret, "The foo failed to bar\n"); It is also possible that in the future folks might want a CONFIG option to strip out all probe error strings to save space (keeping non-probe errors) with the argument that probe errors rarely happen after bringup. Having probe errors reported with a consistent function would allow that. Cc: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20210916161931.1.I32bea713bd6c6fb419a24da73686145742b6c117@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Saravana Kannan
|
2de9d8e0d2 |
driver core: fw_devlink: Improve handling of cyclic dependencies
When we have a dependency of the form:
Device-A -> Device-C
Device-B
Device-C -> Device-B
Where,
* Indentation denotes "child of" parent in previous line.
* X -> Y denotes X is consumer of Y based on firmware (Eg: DT).
We have cyclic dependency: device-A -> device-C -> device-B -> device-A
fw_devlink current treats device-C -> device-B dependency as an invalid
dependency and doesn't enforce it but leaves the rest of the
dependencies as is.
While the current behavior is necessary, it is not sufficient if the
false dependency in this example is actually device-A -> device-C. When
this is the case, device-C will correctly probe defer waiting for
device-B to be added, but device-A will be incorrectly probe deferred by
fw_devlink waiting on device-C to probe successfully. Due to this, none
of the devices in the cycle will end up probing.
To fix this, we need to go relax all the dependencies in the cycle like
we already do in the other instances where fw_devlink detects cycles.
A real world example of this was reported[1] and analyzed[2].
[1] - https://lore.kernel.org/lkml/0a2c4106-7f48-2bb5-048e-8c001a7c3fda@samsung.com/
[2] - https://lore.kernel.org/lkml/CAGETcx8peaew90SWiux=TyvuGgvTQOmO4BFALz7aj0Za5QdNFQ@mail.gmail.com/
Fixes:
|
||
Linus Torvalds
|
c6460daea2 |
xen: branch for v5.15-rc2
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYUR8DAAKCRCAXGG7T9hj vjclAP0ekisZSabOsmzM0iz8rpuYj113/kQozokFeFD5eQlEiAD/b5LytLosic/1 5XKa5fbOyimAyRBwblWhpLLhrW6uiAg= =WoAB -----END PGP SIGNATURE----- Merge tag 'for-linus-5.15b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - The first hunk of a Xen swiotlb fixup series fixing multiple minor issues and doing some small cleanups - Some further Xen related fixes avoiding WARN() splats when running as Xen guests or dom0 - A Kconfig fix allowing the pvcalls frontend to be built as a module * tag 'for-linus-5.15b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: swiotlb-xen: drop DEFAULT_NSLABS swiotlb-xen: arrange to have buffer info logged swiotlb-xen: drop leftover __ref swiotlb-xen: limit init retries swiotlb-xen: suppress certain init retries swiotlb-xen: maintain slab count properly swiotlb-xen: fix late init retry swiotlb-xen: avoid double free xen/pvcalls: backend can be a module xen: fix usage of pmd_populate in mremap for pv guests xen: reset legacy rtc flag for PV domU PM: base: power: don't try to use non-existing RTC for storing data xen/balloon: use a kernel thread instead a workqueue |
||
Laurentiu Tudor
|
5aeb05b27f |
software node: balance refcount for managed software nodes
software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
software nodes, thus leading to underflow errors. Balance the refcount by
bumping it in the device_create_managed_software_node() function.
The error [1] was encountered after adding a .shutdown() op to our
fsl-mc-bus driver.
[1]
pc : refcount_warn_saturate+0xf8/0x150
lr : refcount_warn_saturate+0xf8/0x150
sp : ffff80001009b920
x29: ffff80001009b920 x28: ffff1a2420318000 x27: 0000000000000000
x26: ffffccac15e7a038 x25: 0000000000000008 x24: ffffccac168e0030
x23: ffff1a2428a82000 x22: 0000000000080000 x21: ffff1a24287b5000
x20: 0000000000000001 x19: ffff1a24261f4400 x18: ffffffffffffffff
x17: 6f72645f726f7272 x16: 0000000000000000 x15: ffff80009009b607
x14: 0000000000000000 x13: ffffccac16602670 x12: 0000000000000a17
x11: 000000000000035d x10: ffffccac16602670 x9 : ffffccac16602670
x8 : 00000000ffffefff x7 : ffffccac1665a670 x6 : ffffccac1665a670
x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1a2420318000
Call trace:
refcount_warn_saturate+0xf8/0x150
kobject_put+0x10c/0x120
software_node_notify+0xd8/0x140
device_platform_notify+0x4c/0xb4
device_del+0x188/0x424
fsl_mc_device_remove+0x2c/0x4c
rebofind sp.c__fsl_mc_device_remove+0x14/0x2c
device_for_each_child+0x5c/0xac
dprc_remove+0x9c/0xc0
fsl_mc_driver_remove+0x28/0x64
__device_release_driver+0x188/0x22c
device_release_driver+0x30/0x50
bus_remove_device+0x128/0x134
device_del+0x16c/0x424
fsl_mc_bus_remove+0x8c/0x114
fsl_mc_bus_shutdown+0x14/0x20
platform_shutdown+0x28/0x40
device_shutdown+0x15c/0x330
__do_sys_reboot+0x218/0x2a0
__arm64_sys_reboot+0x28/0x34
invoke_syscall+0x48/0x114
el0_svc_common+0x40/0xdc
do_el0_svc+0x2c/0x94
el0_svc+0x2c/0x54
el0t_64_sync_handler+0xa8/0x12c
el0t_64_sync+0x198/0x19c
---[ end trace 32eb1c71c7d86821 ]---
Fixes:
|
||
Linus Torvalds
|
77e02cf57b |
memblock: introduce saner 'memblock_free_ptr()' interface
The boot-time allocation interface for memblock is a mess, with
'memblock_alloc()' returning a virtual pointer, but then you are
supposed to free it with 'memblock_free()' that takes a _physical_
address.
Not only is that all kinds of strange and illogical, but it actually
causes bugs, when people then use it like a normal allocation function,
and it fails spectacularly on a NULL pointer:
https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/
or just random memory corruption if the debug checks don't catch it:
https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/
I really don't want to apply patches that treat the symptoms, when the
fundamental cause is this horribly confusing interface.
I started out looking at just automating a sane replacement sequence,
but because of this mix or virtual and physical addresses, and because
people have used the "__pa()" macro that can take either a regular
kernel pointer, or just the raw "unsigned long" address, it's all quite
messy.
So this just introduces a new saner interface for freeing a virtual
address that was allocated using 'memblock_alloc()', and that was kept
as a regular kernel pointer. And then it converts a couple of users
that are obvious and easy to test, including the 'xbc_nodes' case in
lib/bootconfig.c that caused problems.
Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes:
|
||
Cai Huoqing
|
86854b4379 |
driver core: platform: Make use of the helper macro SET_RUNTIME_PM_OPS()
Use the helper macro SET_RUNTIME_PM_OPS() instead of the verbose operators ".runtime_suspend/.runtime_resume", because the SET_RUNTIME_PM_OPS() is a nice helper macro that could be brought in to make code a little clearer, a little more concise. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Link: https://lore.kernel.org/r/20210828090219.1177-1-caihuoqing@baidu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Juergen Gross
|
0560204b36 |
PM: base: power: don't try to use non-existing RTC for storing data
If there is no legacy RTC device, don't try to use it for storing trace data across suspend/resume. Cc: <stable@vger.kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20210903084937.19392-2-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> |
||
Rafael J. Wysocki
|
be2d24336f |
Merge branches 'pm-cpufreq', 'pm-sleep' and 'pm-em'
* pm-cpufreq: cpufreq: intel_pstate: hybrid: Rework HWP calibration ACPI: CPPC: Introduce cppc_get_nominal_perf() * pm-sleep: PM: sleep: core: Avoid setting power.must_resume to false PM: sleep: wakeirq: drop useless parameter from dev_pm_attach_wake_irq() * pm-em: Documentation: power: include kernel-doc in Energy Model doc PM: EM: fix kernel-doc comments |
||
Linus Torvalds
|
30f3490978 |
More power management updates for 5.15-rc1
- Add new cpufreq driver for the MediaTek MT6779 platform called mediatek-hw along with corresponding DT bindings (Hector.Yuan). - Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara Gopinath). - Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu policy flag (Taniya Das). - Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn Andersson). - Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV flag (Viresh Kumar). - Add new cpufreq driver callback to allow drivers to register with the Energy Model in a consistent way and make several drivers use it (Viresh Kumar). - Change the remaining users of the .ready() cpufreq driver callback to move the code from it elsewhere and drop it from the cpufreq core (Viresh Kumar). - Revert recent intel_pstate change adding HWP guaranteed performance change notification support to it that led to problems, because the notification in question is triggered prematurely on some systems (Rafael Wysocki). - Convert the OPP DT bindings to DT schema and clean them up while at it (Rob Herring). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmE41VgSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxCN4P+gMjMrrZmuU6gZsbbvpDlaBhCd2Xq3TD xR/DMDi7znkh3TUX3uwL+xnr+k0krIH0jBIeUQUE7NeNIoT6wgbjJ4Ty5rFq76qB AODmmZ4vO7lmnupSyqUQbHfYohDmyICSKiStf8UOEj1o+jSWNmrgUYUv0tDtDUH+ Cn0vByah8gJAnoZX8Y8BM1jmRc3YoNHWpvtTQIhIBPkVZ//+NOKvDZvwUUPZFb+M 1PzMSfX7WsIDiUrUHpdvtZsoBniaMk0WS1EqVBRvEprqUXad1eHF19yuhtLxeUPH 8xh/7o8kYzjqVJvs7blTT8DztxRDScWHeGKSVdwoEJupbCwc5R3qfGaD6PWyhI1x 9R5Swsp64nLptTCwH7ZmgdJbC9IqN3cz1Nadd5v2Q2wr21KvZnj7zI2ijkPKGnZo kqYQHghqnDkGPFVjdls/RKUXGCQIXZFQb+FeuyCvpVlz9Ol8+DnTYtgFjjc6VcU1 kApIqsE8V8GQEyzmm/OIAf6xtsA+mEUUh3Qds16KNCwBCRglC8I6v5IWnBc8PEJz a+wtwjx+tyKSSlAMEvcDNtrWVN+3JNrwCFG+Q+QjMrwLiAgACrtJZ6e6PmLj9sZv FPbZM8rzbzN7Zqd7XVNY37KHjqNs7zzDScAnATTUCKThp8ijDgtmG3ZhExmOPayc 74aQLham4TBO =WSpr -----END PGP SIGNATURE----- Merge tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These are mostly ARM cpufreq driver updates, including one new MediaTek driver that has just passed all of the reviews, with the addition of a revert of a recent intel_pstate commit, some core cpufreq changes and a DT-related update of the operating performance points (OPP) support code. Specifics: - Add new cpufreq driver for the MediaTek MT6779 platform called mediatek-hw along with corresponding DT bindings (Hector.Yuan). - Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara Gopinath). - Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu policy flag (Taniya Das). - Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn Andersson). - Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV flag (Viresh Kumar). - Add new cpufreq driver callback to allow drivers to register with the Energy Model in a consistent way and make several drivers use it (Viresh Kumar). - Change the remaining users of the .ready() cpufreq driver callback to move the code from it elsewhere and drop it from the cpufreq core (Viresh Kumar). - Revert recent intel_pstate change adding HWP guaranteed performance change notification support to it that led to problems, because the notification in question is triggered prematurely on some systems (Rafael Wysocki). - Convert the OPP DT bindings to DT schema and clean them up while at it (Rob Herring)" * tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits) Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification" cpufreq: mediatek-hw: Add support for CPUFREQ HW cpufreq: Add of_perf_domain_get_sharing_cpumask dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW cpufreq: Remove ready() callback cpufreq: sh: Remove sh_cpufreq_cpu_ready() cpufreq: acpi: Remove acpi_cpufreq_cpu_ready() cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support cpufreq: scmi: Use .register_em() to register with energy model cpufreq: vexpress: Use .register_em() to register with energy model cpufreq: scpi: Use .register_em() to register with energy model dt-bindings: opp: Convert to DT schema dt-bindings: Clean-up OPP binding node names in examples ARM: dts: omap: Drop references to opp.txt cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model cpufreq: omap: Use .register_em() to register with energy model cpufreq: mediatek: Use .register_em() to register with energy model cpufreq: imx6q: Use .register_em() to register with energy model ... |
||
Linus Torvalds
|
2d338201d5 |
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
"147 patches, based on
|
||
David Hildenbrand
|
3fcebf9020 |
mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy
Currently, the "auto-movable" online policy does not allow for hotplugged KERNEL (ZONE_NORMAL) memory to increase the amount of MOVABLE memory we can have, primarily, because there is no coordiantion across memory devices and we don't want to create zone-imbalances accidentially when unplugging memory. However, within a single memory device it's different. Let's allow for KERNEL memory within a dynamic memory group to allow for more MOVABLE within the same memory group. The only thing we have to take care of is that the managing driver avoids zone imbalances by unplugging MOVABLE memory first, otherwise there can be corner cases where unplug of memory could result in (accidential) zone imbalances. virtio-mem is the only user of dynamic memory groups and recently added support for prioritizing unplug of ZONE_MOVABLE over ZONE_NORMAL, so we don't need a new toggle to enable it for dynamic memory groups. We limit this handling to dynamic memory groups, because: * We want to keep the runtime overhead for collecting stats when onlining a single memory block small. We tend to have only a handful of dynamic memory groups, but we can have quite some static memory groups (e.g., 256 DIMMs). * It doesn't make too much sense for static memory groups, as we try onlining all applicable memory blocks either completely to ZONE_MOVABLE or not. In ordinary operation, we won't have a mixture of zones within a static memory group. When adding memory to a dynamic memory group, we'll first online memory to ZONE_MOVABLE as long as early KERNEL memory allows for it. Then, we'll online the next unit(s) to ZONE_NORMAL, until we can online the next unit(s) to ZONE_MOVABLE. For a simple virtio-mem device with a MOVABLE:KERNEL ratio of 3:1, it will result in a layout like: [M][M][M][M][M][M][M][M][N][M][M][M][N][M][M][M]... ^ movable memory due to early kernel memory ^ allows for more movable memory ... ^-----^ ... here ^ allows for more movable memory ... ^-----^ ... here While the created layout is sub-optimal when it comes to contiguous zones, it gives us the maximum flexibility when dynamically growing/shrinking a device; we can grow small VMs really big in small steps, and still shrink reliably to e.g., 1/4 of the maximum VM size in this example, removing full memory blocks along with meta data more reliably. Mark dynamic memory groups in the xarray such that we can efficiently iterate over them when collecting stats. In usual setups, we have one virtio-mem device per NUMA node, and usually only a small number of NUMA nodes. Note: for now, there seems to be no compelling reason to make this behavior configurable. Link: https://lkml.kernel.org/r/20210806124715.17090-10-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hui Zhu <teawater@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Marek Kedzierski <mkedzier@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
David Hildenbrand
|
445fcf7c72 |
mm/memory_hotplug: memory group aware "auto-movable" online policy
Use memory groups to improve our "auto-movable" onlining policy: 1. For static memory groups (e.g., a DIMM), online a memory block MOVABLE only if all other memory blocks in the group are either MOVABLE or could be onlined MOVABLE. A DIMM will either be MOVABLE or not, not a mixture. 2. For dynamic memory groups (e.g., a virtio-mem device), online a memory block MOVABLE only if all other memory blocks inside the current unit are either MOVABLE or could be onlined MOVABLE. For a virtio-mem device with a device block size with 512 MiB, all 128 MiB memory blocks wihin a 512 MiB unit will either be MOVABLE or not, not a mixture. We have to pass the memory group to zone_for_pfn_range() to take the memory group into account. Note: for now, there seems to be no compelling reason to make this behavior configurable. Link: https://lkml.kernel.org/r/20210806124715.17090-9-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hui Zhu <teawater@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Marek Kedzierski <mkedzier@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
David Hildenbrand
|
836809ec75 |
mm/memory_hotplug: track present pages in memory groups
Let's track all present pages in each memory group. Especially, track memory present in ZONE_MOVABLE and memory present in one of the kernel zones (which really only is ZONE_NORMAL right now as memory groups only apply to hotplugged memory) separately within a memory group, to prepare for making smart auto-online decision for individual memory blocks within a memory group based on group statistics. Link: https://lkml.kernel.org/r/20210806124715.17090-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hui Zhu <teawater@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Marek Kedzierski <mkedzier@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
David Hildenbrand
|
028fc57a1c |
drivers/base/memory: introduce "memory groups" to logically group memory blocks
In our "auto-movable" memory onlining policy, we want to make decisions across memory blocks of a single memory device. Examples of memory devices include ACPI memory devices (in the simplest case a single DIMM) and virtio-mem. For now, we don't have a connection between a single memory block device and the real memory device. Each memory device consists of 1..X memory block devices. Let's logically group memory blocks belonging to the same memory device in "memory groups". Memory groups can span multiple physical ranges and a memory group itself does not contain any information regarding physical ranges, only properties (e.g., "max_pages") necessary for improved memory onlining. Introduce two memory group types: 1) Static memory group: E.g., a single ACPI memory device, consisting of 1..X memory resources. A memory group consists of 1..Y memory blocks. The whole group is added/removed in one go. If any part cannot get offlined, the whole group cannot be removed. 2) Dynamic memory group: E.g., a single virtio-mem device. Memory is dynamically added/removed in a fixed granularity, called a "unit", consisting of 1..X memory blocks. A unit is added/removed in one go. If any part of a unit cannot get offlined, the whole unit cannot be removed. In case of 1) we usually want either all memory managed by ZONE_MOVABLE or none. In case of 2) we usually want to have as many units as possible managed by ZONE_MOVABLE. We want a single unit to be of the same type. For now, memory groups are an internal concept that is not exposed to user space; we might want to change that in the future, though. add_memory() users can specify a mgid instead of a nid when passing the MHP_NID_IS_MGID flag. Link: https://lkml.kernel.org/r/20210806124715.17090-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hui Zhu <teawater@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Marek Kedzierski <mkedzier@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
David Hildenbrand
|
4b09700244 |
mm: track present early pages per zone
Patch series "mm/memory_hotplug: "auto-movable" online policy and memory groups", v3. I. Goal The goal of this series is improving in-kernel auto-online support. It tackles the fundamental problems that: 1) We can create zone imbalances when onlining all memory blindly to ZONE_MOVABLE, in the worst case crashing the system. We have to know upfront how much memory we are going to hotplug such that we can safely enable auto-onlining of all hotplugged memory to ZONE_MOVABLE via "online_movable". This is far from practical and only applicable in limited setups -- like inside VMs under the RHV/oVirt hypervisor which will never hotplug more than 3 times the boot memory (and the limitation is only in place due to the Linux limitation). 2) We see more setups that implement dynamic VM resizing, hot(un)plugging memory to resize VM memory. In these setups, we might hotplug a lot of memory, but it might happen in various small steps in both directions (e.g., 2 GiB -> 8 GiB -> 4 GiB -> 16 GiB ...). virtio-mem is the primary driver of this upstream right now, performing such dynamic resizing NUMA-aware via multiple virtio-mem devices. Onlining all hotplugged memory to ZONE_NORMAL means we basically have no hotunplug guarantees. Onlining all to ZONE_MOVABLE means we can easily run into zone imbalances when growing a VM. We want a mixture, and we want as much memory as reasonable/configured in ZONE_MOVABLE. Details regarding zone imbalances can be found at [1]. 3) Memory devices consist of 1..X memory block devices, however, the kernel doesn't really track the relationship. Consequently, also user space has no idea. We want to make per-device decisions. As one example, for memory hotunplug it doesn't make sense to use a mixture of zones within a single DIMM: we want all MOVABLE if possible, otherwise all !MOVABLE, because any !MOVABLE part will easily block the whole DIMM from getting hotunplugged. As another example, virtio-mem operates on individual units that span 1..X memory blocks. Similar to a DIMM, we want a unit to either be all MOVABLE or !MOVABLE. A "unit" can be thought of like a DIMM, however, all units of a virtio-mem device logically belong together and are managed (added/removed) by a single driver. We want as much memory of a virtio-mem device to be MOVABLE as possible. 4) We want memory onlining to be done right from the kernel while adding memory, not triggered by user space via udev rules; for example, this is reqired for fast memory hotplug for drivers that add individual memory blocks, like virito-mem. We want a way to configure a policy in the kernel and avoid implementing advanced policies in user space. The auto-onlining support we have in the kernel is not sufficient. All we have is a) online everything MOVABLE (online_movable) b) online everything !MOVABLE (online_kernel) c) keep zones contiguous (online). This series allows configuring c) to mean instead "online movable if possible according to the coniguration, driven by a maximum MOVABLE:KERNEL ratio" -- a new onlining policy. II. Approach This series does 3 things: 1) Introduces the "auto-movable" online policy that initially operates on individual memory blocks only. It uses a maximum MOVABLE:KERNEL ratio to make a decision whether a memory block will be onlined to ZONE_MOVABLE or not. However, in the basic form, hotplugged KERNEL memory does not allow for more MOVABLE memory (details in the patches). CMA memory is treated like MOVABLE memory. 2) Introduces static (e.g., DIMM) and dynamic (e.g., virtio-mem) memory groups and uses group information to make decisions in the "auto-movable" online policy across memory blocks of a single memory device (modeled as memory group). More details can be found in patch #3 or in the DIMM example below. 3) Maximizes ZONE_MOVABLE memory within dynamic memory groups, by allowing ZONE_NORMAL memory within a dynamic memory group to allow for more ZONE_MOVABLE memory within the same memory group. The target use case is dynamic VM resizing using virtio-mem. See the virtio-mem example below. I remember that the basic idea of using a ratio to implement a policy in the kernel was once mentioned by Vitaly Kuznetsov, but I might be wrong (I lost the pointer to that discussion). For me, the main use case is using it along with virtio-mem (and DIMMs / ppc64 dlpar where necessary) for dynamic resizing of VMs, increasing the amount of memory we can hotunplug reliably again if we might eventually hotplug a lot of memory to a VM. III. Target Usage The target usage will be: 1) Linux boots with "mhp_default_online_type=offline" 2) User space (e.g., systemd unit) configures memory onlining (according to a config file and system properties), for example: * Setting memory_hotplug.online_policy=auto-movable * Setting memory_hotplug.auto_movable_ratio=301 * Setting memory_hotplug.auto_movable_numa_aware=true 3) User space enabled auto onlining via "echo online > /sys/devices/system/memory/auto_online_blocks" 4) User space triggers manual onlining of all already-offline memory blocks (go over offline memory blocks and set them to "online") IV. Example For DIMMs, hotplugging 4 GiB DIMMs to a 4 GiB VM with a configured ratio of 301% results in the following layout: Memory block 0-15: DMA32 (early) Memory block 32-47: Normal (early) Memory block 48-79: Movable (DIMM 0) Memory block 80-111: Movable (DIMM 1) Memory block 112-143: Movable (DIMM 2) Memory block 144-275: Normal (DIMM 3) Memory block 176-207: Normal (DIMM 4) ... all Normal (-> hotplugged Normal memory does not allow for more Movable memory) For virtio-mem, using a simple, single virtio-mem device with a 4 GiB VM will result in the following layout: Memory block 0-15: DMA32 (early) Memory block 32-47: Normal (early) Memory block 48-143: Movable (virtio-mem, first 12 GiB) Memory block 144: Normal (virtio-mem, next 128 MiB) Memory block 145-147: Movable (virtio-mem, next 384 MiB) Memory block 148: Normal (virtio-mem, next 128 MiB) Memory block 149-151: Movable (virtio-mem, next 384 MiB) ... Normal/Movable mixture as above (-> hotplugged Normal memory allows for more Movable memory within the same device) Which gives us maximum flexibility when dynamically growing/shrinking a VM in smaller steps. V. Doc Update I'll update the memory-hotplug.rst documentation, once the overhaul [1] is usptream. Until then, details can be found in patch #2. VI. Future Work 1) Use memory groups for ppc64 dlpar 2) Being able to specify a portion of (early) kernel memory that will be excluded from the ratio. Like "128 MiB globally/per node" are excluded. This might be helpful when starting VMs with extremely small memory footprint (e.g., 128 MiB) and hotplugging memory later -- not wanting the first hotplugged units getting onlined to ZONE_MOVABLE. One alternative would be a trigger to not consider ZONE_DMA memory in the ratio. We'll have to see if this is really rrequired. 3) Indicate to user space that MOVABLE might be a bad idea -- especially relevant when memory ballooning without support for balloon compaction is active. This patch (of 9): For implementing a new memory onlining policy, which determines when to online memory blocks to ZONE_MOVABLE semi-automatically, we need the number of present early (boot) pages -- present pages excluding hotplugged pages. Let's track these pages per zone. Pass a page instead of the zone to adjust_present_page_count(), similar as adjust_managed_page_count() and derive the zone from the page. It's worth noting that a memory block to be offlined/onlined is either completely "early" or "not early". add_memory() and friends can only add complete memory blocks and we only online/offline complete (individual) memory blocks. Link: https://lkml.kernel.org/r/20210806124715.17090-1-david@redhat.com Link: https://lkml.kernel.org/r/20210806124715.17090-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Cc: Hui Zhu <teawater@gmail.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Mike Rapoport
|
859a85ddf9 |
mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE
Patch series "mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE". After recent updates to freeing unused parts of the memory map, no architecture can have holes in the memory map within a pageblock. This makes pfn_valid_within() check and CONFIG_HOLES_IN_ZONE configuration option redundant. The first patch removes them both in a mechanical way and the second patch simplifies memory_hotplug::test_pages_in_a_zone() that had pfn_valid_within() surrounded by more logic than simple if. This patch (of 2): After recent changes in freeing of the unused parts of the memory map and rework of pfn_valid() in arm and arm64 there are no architectures that can have holes in the memory map within a pageblock and so nothing can enable CONFIG_HOLES_IN_ZONE which guards non trivial implementation of pfn_valid_within(). With that, pfn_valid_within() is always hardwired to 1 and can be completely removed. Remove calls to pfn_valid_within() and CONFIG_HOLES_IN_ZONE. Link: https://lkml.kernel.org/r/20210713080035.7464-1-rppt@kernel.org Link: https://lkml.kernel.org/r/20210713080035.7464-2-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Rafael J. Wysocki
|
eabf9e616e |
Merge branch 'pm-cpufreq'
* pm-cpufreq: Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification" cpufreq: mediatek-hw: Add support for CPUFREQ HW cpufreq: Add of_perf_domain_get_sharing_cpumask dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW cpufreq: Remove ready() callback cpufreq: sh: Remove sh_cpufreq_cpu_ready() cpufreq: acpi: Remove acpi_cpufreq_cpu_ready() cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support cpufreq: scmi: Use .register_em() to register with energy model cpufreq: vexpress: Use .register_em() to register with energy model cpufreq: scpi: Use .register_em() to register with energy model cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model cpufreq: omap: Use .register_em() to register with energy model cpufreq: mediatek: Use .register_em() to register with energy model cpufreq: imx6q: Use .register_em() to register with energy model cpufreq: dt: Use .register_em() to register with energy model cpufreq: Add callback to register with energy model cpufreq: vexpress: Set CPUFREQ_IS_COOLING_DEV flag |
||
Prasad Sodagudi
|
4a9344cd0a |
PM: sleep: core: Avoid setting power.must_resume to false
There are variables(power.may_skip_resume and dev->power.must_resume)
and DPM_FLAG_MAY_SKIP_RESUME flags to control the resume of devices after
a system wide suspend transition.
Setting the DPM_FLAG_MAY_SKIP_RESUME flag means that the driver allows
its "noirq" and "early" resume callbacks to be skipped if the device
can be left in suspend after a system-wide transition into the working
state. PM core determines that the driver's "noirq" and "early" resume
callbacks should be skipped or not with dev_pm_skip_resume() function by
checking power.may_skip_resume variable.
power.must_resume variable is getting set to false in __device_suspend()
function without checking device's DPM_FLAG_MAY_SKIP_RESUME settings.
In problematic scenario, where all the devices in the suspend_late
stage are successful and some device can fail to suspend in
suspend_noirq phase. So some devices successfully suspended in suspend_late
stage are not getting chance to execute __device_suspend_noirq()
to set dev->power.must_resume variable to true and not getting
resumed in early_resume phase.
Add a check for device's DPM_FLAG_MAY_SKIP_RESUME flag before
setting power.must_resume variable in __device_suspend function.
Fixes:
|
||
Sergey Shtylyov
|
d216bfb4d7 |
PM: sleep: wakeirq: drop useless parameter from dev_pm_attach_wake_irq()
This function has the 'irq' parameter which isn't ever used, so drop it. Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
Linus Torvalds
|
3de18c865f |
Merge branch 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb updates from Konrad Rzeszutek Wilk: "A new feature called restricted DMA pools. It allows SWIOTLB to utilize per-device (or per-platform) allocated memory pools instead of using the global one. The first big user of this is ARM Confidential Computing where the memory for DMA operations can be set per platform" * 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: (23 commits) swiotlb: use depends on for DMA_RESTRICTED_POOL of: restricted dma: Don't fail device probe on rmem init failure of: Move of_dma_set_restricted_buffer() into device.c powerpc/svm: Don't issue ultracalls if !mem_encrypt_active() s390/pv: fix the forcing of the swiotlb swiotlb: Free tbl memory in swiotlb_exit() swiotlb: Emit diagnostic in swiotlb_exit() swiotlb: Convert io_default_tlb_mem to static allocation of: Return success from of_dma_set_restricted_buffer() when !OF_ADDRESS swiotlb: add overflow checks to swiotlb_bounce swiotlb: fix implicit debugfs declarations of: Add plumbing for restricted DMA pool dt-bindings: of: Add restricted DMA pool swiotlb: Add restricted DMA pool initialization swiotlb: Add restricted DMA alloc/free support swiotlb: Refactor swiotlb_tbl_unmap_single swiotlb: Move alloc_size to swiotlb_find_slots swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing swiotlb: Update is_swiotlb_active to add a struct device argument swiotlb: Update is_swiotlb_buffer to add a struct device argument ... |
||
Linus Torvalds
|
14726903c8 |
Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton: "173 patches. Subsystems affected by this series: ia64, ocfs2, block, and mm (debug, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock, oom-kill, migration, ksm, percpu, vmstat, and madvise)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits) mm/madvise: add MADV_WILLNEED to process_madvise() mm/vmstat: remove unneeded return value mm/vmstat: simplify the array size calculation mm/vmstat: correct some wrong comments mm/percpu,c: remove obsolete comments of pcpu_chunk_populated() selftests: vm: add COW time test for KSM pages selftests: vm: add KSM merging time test mm: KSM: fix data type selftests: vm: add KSM merging across nodes test selftests: vm: add KSM zero page merging test selftests: vm: add KSM unmerge test selftests: vm: add KSM merge test mm/migrate: correct kernel-doc notation mm: wire up syscall process_mrelease mm: introduce process_mrelease system call memblock: make memblock_find_in_range method private mm/mempolicy.c: use in_task() in mempolicy_slab_node() mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies mm/mempolicy: advertise new MPOL_PREFERRED_MANY mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY ... |
||
Mike Rapoport
|
a7259df767 |
memblock: make memblock_find_in_range method private
There are a lot of uses of memblock_find_in_range() along with memblock_reserve() from the times memblock allocation APIs did not exist. memblock_find_in_range() is the very core of memblock allocations, so any future changes to its internal behaviour would mandate updates of all the users outside memblock. Replace the calls to memblock_find_in_range() with an equivalent calls to memblock_phys_alloc() and memblock_phys_alloc_range() and make memblock_find_in_range() private method of memblock. This simplifies the callers, ensures that (unlikely) errors in memblock_reserve() are handled and improves maintainability of memblock_find_in_range(). Link: https://lkml.kernel.org/r/20210816122622.30279-1-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ACPI] Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Nick Kossifidis <mick@ics.forth.gr> [riscv] Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Ohhoon Kwon
|
fc1f5e980a |
mm: sparse: pass section_nr to find_memory_block
With CONFIG_SPARSEMEM_EXTREME enabled, __section_nr() which converts mem_section to section_nr could be costly since it iterates all section roots to check if the given mem_section is in its range. On the other hand, __nr_to_section() which converts section_nr to mem_section can be done in O(1). Let's pass section_nr instead of mem_section ptr to find_memory_block() in order to reduce needless iterations. Link: https://lkml.kernel.org/r/20210707150212.855-3-ohoono.kwon@samsung.com Signed-off-by: Ohhoon Kwon <ohoono.kwon@samsung.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
a9c9a6f741 |
SCSI misc on 20210902
This series consists of the usual driver updates (ufs, qla2xxx, target, smartpqi, lpfc, mpt3sas). The core change causing the most churn was replacing the command request field request with a macro, allowing us to offset map to it and remove the redundant field; the same was also done for the tag field. The most impactful change is the final removal of scsi_ioctl, which has been deprecated for over a decade. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYTD/TiYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishdUkAQCjb3Ux 4K9438mMelHlzM4er1S1IJ0WNnvObaVMNO9LBwD+JUz+rHsrKvuEX9j3g3C3u6JH hC3BUEW8f2LLnujWanQ= =lC5o -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (ufs, qla2xxx, target, smartpqi, lpfc, mpt3sas). The core change causing the most churn was replacing the command request field request with a macro, allowing us to offset map to it and remove the redundant field; the same was also done for the tag field. The most impactful change is the final removal of scsi_ioctl, which has been deprecated for over a decade" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (293 commits) scsi: ufs: Fix ufshcd_request_sense_async() for Samsung KLUFG8RHDA-B2D1 scsi: ufs: ufs-exynos: Fix static checker warning scsi: mpt3sas: Use the proper SCSI midlayer interfaces for PI scsi: lpfc: Use the proper SCSI midlayer interfaces for PI scsi: lpfc: Copyright updates for 14.0.0.1 patches scsi: lpfc: Update lpfc version to 14.0.0.1 scsi: lpfc: Add bsg support for retrieving adapter cmf data scsi: lpfc: Add cmf_info sysfs entry scsi: lpfc: Add debugfs support for cm framework buffers scsi: lpfc: Add support for maintaining the cm statistics buffer scsi: lpfc: Add rx monitoring statistics scsi: lpfc: Add support for the CM framework scsi: lpfc: Add cmfsync WQE support scsi: lpfc: Add support for cm enablement buffer scsi: lpfc: Add cm statistics buffer support scsi: lpfc: Add EDC ELS support scsi: lpfc: Expand FPIN and RDF receive logging scsi: lpfc: Add MIB feature enablement support scsi: lpfc: Add SET_HOST_DATA mbox cmd to pass date/time info to firmware scsi: fc: Add EDC ELS definition ... |
||
Linus Torvalds
|
75d6e7d9ce |
Nothing changed in the clk framework core this time around. We did get
some updates to the basic clk types to use determine_rate for the divider type and add a power of two fractional divider flag though. Otherwise, this is a collection of clk driver updates. More than half the diffstat is in the Qualcomm clk driver where we add a bunch of data to describe clks on various SoCs and fix bugs. The other big new thing in here is the Mediatek MT8192 clk driver. That's been under review for a while and it's nice to see that it's finally upstream. Beyond that it's the usual set of minor fixes and tweaks to clk drivers. There are some non-clk driver bits in here which have all been acked by the respective maintainers. New Drivers: - Support video, gpu, display clks on qcom sc7280 SoCs - GCC clks on qcom MSM8953, SM4250/6115, and SM6350 SoCs - Multimedia clks (MMCC) on qcom MSM8994/MSM8992 - RPMh clks on qcom SM6350 SoCs - Support for Mediatek MT8192 SoCs - Add display (DU and DSI) clocks on Renesas R-Car V3U - Add I2C, DMAC, USB, sound (SSIF-2), GPIO, CANFD, and ADC clocks and resets on Renesas RZ/G2L Updates: - Support the SD/OE pin on IDT VersaClock 5 and 6 clock generators - Add power of two flag to fractional divider clk type - Migrate some clk drivers to clk_divider_ops.determine_rate - Migrate to clk_parent_data in gcc-sdm660 - Fix CLKOUT clocks on i.MX8MM and i.MX8MN by using imx_clk_hw_mux2 - Switch from .round_rate to .determine_rate in clk-divider-gate - Fix clock tree update for TF-A controlled clocks for all i.MX8M - Add missing M7 core clock for i.MX8MN - YAML conversion of rk3399 clock controller binding - Removal of GRF dependency for the rk3328/rk3036 pll types - Drop CLK_IS_CRITICAL flag from Tegra fuse clk - Make CLK_R9A06G032 Kconfig symbol invisible - Convert various DT bindings to YAML -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmExEooRHHNib3lkQGtl cm5lbC5vcmcACgkQrQKIl8bklSXXBhAAvhHm4fcm3fRjNdfImd+jDEl8XSvg+w43 adSnmVxbYM6ZVNOiJ4CJWHbj0hOY/PJnsQYWbV0xXvXW+zXva6p495MMHHOGSi2o lMgZVMvj5UAwu304ZC9Xfn31dwo8XdGrltp4JqIcI2NEBMh1/PlZW22esT+jDiWN 3SWFD3M7lu88xTREyiEu11FY3z/KiGzbGlqYcbivx1X0sHVnBRbl4qcqZway+BmQ 95Ma4YWwhvDGYc+ypKH2EPxs/LikHXj05nMooigy65DOQ5wrM4L1eWkwmVUf6h+e t4x7sAVysLnkihzdH5r2pw6CcAIom76v8w0+maSfk+jINUu1LeGVuat1eXSesFTu 49o+uTKRghkUe/Qh6r+7lbo8AZXQq+wUsLTYRuaWT/mSb+svAtJaUWAru8tJnMlH oK6OehcQwz4nGhH0HnBK1jCVdtgckxPBw8F/GYN9rYhsccIe0XmFjX1rzMM3s8De PLl6QO7Xzd+xb/FwAU8+S1WpKFdPU6ILTUnI2Ma3Mn/gfjZEZHvWAdTjo4oZGEsw +N4n924ArptbeSLRrlNUtqx4BVDL5yo54xS5gefNpmD5yezO7aoUtN0aGcBq+01p Qw0N5hKtcdsNYLBEFSvBGcZZmErMZbPwMXHWiUwNymXBDzJKgj5d+ks+1vJ3iCNW R5r9hvATJPQ= =Rrqg -----END PGP SIGNATURE----- Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk updates from Stephen Boyd: "Nothing changed in the clk framework core this time around. We did get some updates to the basic clk types to use determine_rate for the divider type and add a power of two fractional divider flag though. Otherwise, this is a collection of clk driver updates. More than half the diffstat is in the Qualcomm clk driver where we add a bunch of data to describe clks on various SoCs and fix bugs. The other big new thing in here is the Mediatek MT8192 clk driver. That's been under review for a while and it's nice to see that it's finally upstream. Beyond that it's the usual set of minor fixes and tweaks to clk drivers. There are some non-clk driver bits in here which have all been acked by the respective maintainers. New Drivers: - Support video, gpu, display clks on qcom sc7280 SoCs - GCC clks on qcom MSM8953, SM4250/6115, and SM6350 SoCs - Multimedia clks (MMCC) on qcom MSM8994/MSM8992 - RPMh clks on qcom SM6350 SoCs - Support for Mediatek MT8192 SoCs - Add display (DU and DSI) clocks on Renesas R-Car V3U - Add I2C, DMAC, USB, sound (SSIF-2), GPIO, CANFD, and ADC clocks and resets on Renesas RZ/G2L Updates: - Support the SD/OE pin on IDT VersaClock 5 and 6 clock generators - Add power of two flag to fractional divider clk type - Migrate some clk drivers to clk_divider_ops.determine_rate - Migrate to clk_parent_data in gcc-sdm660 - Fix CLKOUT clocks on i.MX8MM and i.MX8MN by using imx_clk_hw_mux2 - Switch from .round_rate to .determine_rate in clk-divider-gate - Fix clock tree update for TF-A controlled clocks for all i.MX8M - Add missing M7 core clock for i.MX8MN - YAML conversion of rk3399 clock controller binding - Removal of GRF dependency for the rk3328/rk3036 pll types - Drop CLK_IS_CRITICAL flag from Tegra fuse clk - Make CLK_R9A06G032 Kconfig symbol invisible - Convert various DT bindings to YAML" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (128 commits) dt-bindings: clock: samsung: fix header path in example clk: tegra: fix old-style declaration clk: qcom: Add SM6350 GCC driver MAINTAINERS: clock: include S3C and S5P in Samsung SoC clock entry dt-bindings: clock: samsung: convert S5Pv210 AudSS to dtschema dt-bindings: clock: samsung: convert Exynos AudSS to dtschema dt-bindings: clock: samsung: convert Exynos4 to dtschema dt-bindings: clock: samsung: convert Exynos3250 to dtschema dt-bindings: clock: samsung: convert Exynos542x to dtschema dt-bindings: clock: samsung: add bindings for Exynos external clock dt-bindings: clock: samsung: convert Exynos5250 to dtschema clk: vc5: Add properties for configuring SD/OE behavior clk: vc5: Use dev_err_probe dt-bindings: clk: vc5: Add properties for configuring the SD/OE pin dt-bindings: clock: brcm,iproc-clocks: fix armpll properties clk: zynqmp: Fix kernel-doc format clk: at91: clk-generated: Limit the requested rate to our range clk: ralink: avoid to set 'CLK_IS_CRITICAL' flag for gates clk: zynqmp: Fix a memory leak clk: zynqmp: Check the return type ... |