Commit Graph

4993 Commits

Author SHA1 Message Date
Nico Pache
586c402882 kunit: software node: adhear to KUNIT formatting standard
Change CONFIG_KUNIT_DRIVER_PE_TEST to CONFIG_DRIVER_PE_KUNIT_TEST inorder
to adhear to the KUNIT *_KUNIT_TEST config name format.

Fixes: aa811e3cec (software node: introduce CONFIG_KUNIT_DRIVER_PE_TEST)
Signed-off-by: Nico Pache <npache@redhat.com>
Link: https://lore.kernel.org/r/ef06f65f4a622cf83cce5ba2ba5a060d2aa2e1b9.1618388989.git.npache@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-15 08:56:27 +02:00
Greg Kroah-Hartman
a00fcbc115 Linux 5.12-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmBzdS0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGDdAIAIpKH/tAHhH7s7QH
 m5ewgE8foP7M5Ue9fp3+JmbtaYSzhCAMcKhqGtat/zk5PvA9AoYCDXrTetfYtBHh
 LUOmhL9hcKItNobfkYBok6BiFjGUEL3HMqz5w+MUsMwnXIc4RXqfJmsQ932z9Kxf
 yDwe6ehIzJVrQLI/C0mTamYRHu2aiZ1VWzhKuT493rLeg0R2odCCIClPN+/QvCwb
 8/sk6l1c8eOUYYMUzKFZifaZGb12qDjRt4pZmk51aMTzg0WCpElJG+7Uqr4QQhZP
 p6xeNuUQq6WwxtlDkmo79Uzkrurb5tN2/hZ1RcJhs3EdHfpR0MjIyH3Znnb31gnu
 39VjHhg=
 =4KP/
 -----END PGP SIGNATURE-----

Merge tag 'v5.12-rc7' into driver-core-next

We need the driver core fix in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14 19:53:39 +02:00
Dan Carpenter
4ce535ec00 node: fix device cleanups in error handling code
We can't use kfree() to free device managed resources so the kfree(dev)
is against the rules.

It's easier to write this code if we open code the device_register() as
a device_initialize() and device_add().  That way if dev_set_name() set
name fails we can call put_device() and it will clean up correctly.

Fixes: acc02a109b ("node: Add memory-side caching attributes")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/YHA0JUra+F64+NpB@mwanda
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10 11:10:21 +02:00
Greg Kroah-Hartman
c2f3f755f5 Revert "driver core: platform: Make platform_get_irq_optional() optional"
This reverts commit ed7027fdf4 as it
causes runtime issues:
	https://lore.kernel.org/r/20210406192514.GA34677@roeck-us.net

Link: https://lore.kernel.org/r/20210406192514.GA34677@roeck-us.net
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-07 11:47:56 +02:00
Andy Shevchenko
e588fead04 software node: Introduce SOFTWARE_NODE_REFERENCE() helper macro
This is useful to assign software node reference with arguments
in a common way. Moreover, we have already couple of users that
may be converted. And by the fact, one of them is moved right here
to use the helper.

Tested-by: Daniel Scally <djrscally@gmail.com>
Reviewed-by: Daniel Scally <djrscally@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210329151207.36619-5-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-05 13:18:22 +02:00
Andy Shevchenko
4a32e384e8 software node: Imply kobj_to_swnode() to be no-op
Since we don't use structure field layout randomization
the manual shuffling can affect some macros, in particular
kobj_to_swnode(), which becomes a no-op when kobj member
is the first one in the struct swnode.

Bloat-o-meter statistics for swnode.o:

  add/remove: 0/0 grow/shrink: 2/10 up/down: 9/-100 (-91)
  Total: Before=7217, After=7126, chg -1.26%

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210329151207.36619-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-05 13:18:22 +02:00
Andy Shevchenko
73c9342656 software node: Deduplicate code in fwnode_create_software_node()
Deduplicate conditional and assignment in fwnode_create_software_node(),
i.e. parent is checked in two out of three cases and parent software node
is assigned by to_swnode() call.

Reviewed-by: Daniel Scally <djrscally@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210329151207.36619-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-05 13:18:22 +02:00
Andy Shevchenko
06ad93c328 software node: Introduce software_node_alloc()/software_node_free()
Introduce software_node_alloc() and software_node_free() helpers.
This will help with code readability and maintenance.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210329151207.36619-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-05 13:18:22 +02:00
Andy Shevchenko
3f6b6536a7 software node: Free resources explicitly when swnode_register() fails
Currently we have a slightly twisted logic in swnode_register().
It frees resources that it doesn't allocate on error path and
in once case it relies on the ->release() implementation.

Untwist the logic by freeing resources explicitly when swnode_register()
fails. Currently it happens only in fwnode_create_software_node().

Tested-by: Daniel Scally <djrscally@gmail.com>
Reviewed-by: Daniel Scally <djrscally@gmail.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210329151207.36619-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-05 13:18:21 +02:00
Ahmad Fatoum
72a91f192d driver core: add helper for deferred probe reason setting
We now have three places within the same file doing the same operation
of freeing this pointer and setting it anew. A helper makes this
arguably easier to read, so add one.

Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Link: https://lore.kernel.org/r/20210323153714.25120-2-a.fatoum@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-05 13:17:01 +02:00
Saravana Kannan
d46f3e3ed5 driver core: Improve fw_devlink & deferred_probe_timeout interaction
deferred_probe_timeout kernel commandline parameter allows probing of
consumer devices if the supplier devices don't have any drivers.

fw_devlink=on will indefintely block probe() calls on a device if all
its suppliers haven't probed successfully. This completely skips calls
to driver_deferred_probe_check_state() since that's only called when a
.probe() function calls framework APIs. So fw_devlink=on breaks
deferred_probe_timeout.

deferred_probe_timeout in its current state also ignores a lot of
information that's now available to the kernel. It assumes all suppliers
that haven't probed when the timer expires (or when initcalls are done
on a static kernel) will never probe and fails any calls to acquire
resources from these unprobed suppliers.

However, this assumption by deferred_probe_timeout isn't true under many
conditions. For example:
- If the consumer happens to be before the supplier in the deferred
  probe list.
- If the supplier itself is waiting on its supplier to probe.

This patch fixes both these issues by relaxing device links between
devices only if the supplier doesn't have any driver that could match
with (NOT bound to) the supplier device. This way, we only fail attempts
to acquire resources from suppliers that truly don't have any driver vs
suppliers that just happen to not have probed yet.

Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210402040342.2944858-3-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-05 09:17:56 +02:00
Saravana Kannan
eed6e41813 driver core: Fix locking bug in deferred_probe_timeout_work_func()
list_for_each_entry_safe() is only useful if we are deleting nodes in a
linked list within the loop. It doesn't protect against other threads
adding/deleting nodes to the list in parallel. We need to grab
deferred_probe_mutex when traversing the deferred_probe_pending_list.

Cc: stable@vger.kernel.org
Fixes: 25b4e70dcc ("driver core: allow stopping deferred probe after init")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210402040342.2944858-2-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-05 09:14:18 +02:00
Greg Kroah-Hartman
b20e829390 Merge 5.12-rc6 into driver-core-next
We need the driver core fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-05 08:51:37 +02:00
Linus Torvalds
f5664825fc Driver core fix for 5.12-rc6
Here is a single driver core fix for a reported problem with differed
 probing.  It has been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYGhGpg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymgSACeJYD1EfjFSCB0gEmdjU261rWOcT0AoLTRkJW1
 vywQNi5XNrbbmsWpxLsF
 =wG2r
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fix from Greg KH:
 "Here is a single driver core fix for a reported problem with differed
  probing. It has been in linux-next for a while with no reported
  problems"

* tag 'driver-core-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  driver core: clear deferred probe reason on probe retry
2021-04-03 10:14:47 -07:00
Andy Shevchenko
ed7027fdf4 driver core: platform: Make platform_get_irq_optional() optional
Currently the platform_get_irq_optional() returns an error code even
if IRQ resource sumply has not been found. It prevents caller to be
error code agnostic in their error handling.

Now:
	ret = platform_get_irq_optional(...);
	if (ret != -ENXIO)
		return ret; // respect deferred probe
	if (ret > 0)
		...we get an IRQ...

After proposed change:
	ret = platform_get_irq_optional(...);
	if (ret < 0)
		return ret;
	if (ret > 0)
		...we get an IRQ...

Reported-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210331144526.19439-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-02 17:06:34 +02:00
Andy Shevchenko
318c3e00f1 driver core: Replace printf() specifier and drop unneeded casting
The size_t type has very well established specifier, i.e. "%zu",
use it directly instead of casting to unsigned long with "%lu".

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210401171042.60612-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-02 17:02:45 +02:00
Andy Shevchenko
d7aa44f5a1 driver core: Cast to (void *) with __force for __percpu pointer
Sparse is not happy:

  drivers/base/devres.c:1230:9: warning: cast removes address space '__percpu' of expression

Use __force attribute to make it happy.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210401171030.60527-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-02 17:02:39 +02:00
Andy Shevchenko
c99f4ebc68 driver core: platform: Make clear error code used for missed IRQ
We have few code paths where same error code is assigned and
returned for missed IRQ. Unify that under single error path.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210331145937.35980-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-02 17:02:22 +02:00
Pierre-Louis Bossart
cc71079023 devcoredump: fix kernel-doc warning
remove make W=1 warnings

drivers/base/devcoredump.c:208: warning:
Function parameter or member 'data' not described in
'devcd_free_sgtable'

drivers/base/devcoredump.c:208: warning:
Excess function parameter 'table' description in 'devcd_free_sgtable'

drivers/base/devcoredump.c:225: warning:
expecting prototype for devcd_read_from_table(). Prototype was for
devcd_read_from_sgtable() instead

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210331232614.304591-8-pierre-louis.bossart@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-02 16:40:08 +02:00
Pierre-Louis Bossart
3c652132ce platform-msi: fix kernel-doc warnings
remove make W=1 warnings

drivers/base/platform-msi.c:336: warning:
Function parameter or member 'is_tree' not described in
'__platform_msi_create_device_domain'

drivers/base/platform-msi.c:336: warning:
expecting prototype for platform_msi_create_device_domain(). Prototype
was for __platform_msi_create_device_domain() instead

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210331232614.304591-7-pierre-louis.bossart@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-02 16:40:08 +02:00
Pierre-Louis Bossart
f4651a7dd6 driver core: attribute_container: remove kernel-doc warnings
Remove make W=1 warnings

drivers/base/attribute_container.c:471: warning: Function parameter or
member 'cont' not described in
'attribute_container_add_class_device_adapter'

drivers/base/attribute_container.c:471: warning: Function parameter or
member 'dev' not described in
'attribute_container_add_class_device_adapter'

drivers/base/attribute_container.c:471: warning: Function parameter or
member 'classdev' not described in
'attribute_container_add_class_device_adapter'

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210331232614.304591-3-pierre-louis.bossart@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-02 16:40:07 +02:00
Pierre-Louis Bossart
37c52f7403 driver core: remove kernel-doc warnings
remove make W=1 warning:

drivers/base/core.c:1670: warning: Function parameter or member
'flags' not described in 'fw_devlink_create_devlink'

drivers/base/core.c:1670: warning: Function parameter or member 'con'
not described in 'fw_devlink_create_devlink'

drivers/base/core.c:1670: warning: Function parameter or member
'sup_handle' not described in 'fw_devlink_create_devlink'

drivers/base/core.c:1670: warning: Function parameter or member
'flags' not described in 'fw_devlink_create_devlink'

drivers/base/core.c:1763: warning: Function parameter or member 'dev'
not described in '__fw_devlink_link_to_consumers'

drivers/base/core.c:1844: warning: Function parameter or member 'dev'
not described in '__fw_devlink_link_to_suppliers'

drivers/base/core.c:1844: warning: Function parameter or member
'fwnode' not described in '__fw_devlink_link_to_suppliers'

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210331232614.304591-2-pierre-louis.bossart@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-02 16:40:07 +02:00
Adrian Hunter
9dfacc54a8 PM: runtime: Fix race getting/putting suppliers at probe
pm_runtime_put_suppliers() must not decrement rpm_active unless the
consumer is suspended. That is because, otherwise, it could suspend
suppliers for an active consumer.

That can happen as follows:

 static int driver_probe_device(struct device_driver *drv, struct device *dev)
 {
	int ret = 0;

	if (!device_is_registered(dev))
		return -ENODEV;

	dev->can_match = true;
	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
		 drv->bus->name, __func__, dev_name(dev), drv->name);

	pm_runtime_get_suppliers(dev);
	if (dev->parent)
		pm_runtime_get_sync(dev->parent);

 At this point, dev can runtime suspend so rpm_put_suppliers() can run,
 rpm_active becomes 1 (the lowest value).

	pm_runtime_barrier(dev);
	if (initcall_debug)
		ret = really_probe_debug(dev, drv);
	else
		ret = really_probe(dev, drv);

 Probe callback can have runtime resumed dev, and then runtime put
 so dev is awaiting autosuspend, but rpm_active is 2.

	pm_request_idle(dev);

	if (dev->parent)
		pm_runtime_put(dev->parent);

	pm_runtime_put_suppliers(dev);

 Now pm_runtime_put_suppliers() will put the supplier
 i.e. rpm_active 2 -> 1, but consumer can still be active.

	return ret;
 }

Fix by checking the runtime status. For any status other than
RPM_SUSPENDED, rpm_active can be considered to be "owned" by
rpm_[get/put]_suppliers() and pm_runtime_put_suppliers() need do nothing.

Reported-by: Asutosh Das <asutoshd@codeaurora.org>
Fixes: 4c06c4e6cf ("driver core: Fix possible supplier PM-usage counter imbalance")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-29 19:27:09 +02:00
Adrian Hunter
c0c33442f7 PM: runtime: Fix ordering in pm_runtime_get_suppliers()
rpm_active indicates how many times the supplier usage_count has been
incremented. Consequently it must be updated after pm_runtime_get_sync() of
the supplier, not before.

Fixes: 4c06c4e6cf ("driver core: Fix possible supplier PM-usage counter imbalance")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-29 19:27:09 +02:00
Jia-Ju Bai
d225ef6fda base: dd: fix error return code of driver_sysfs_add()
When device_create_file() fails and returns a non-zero value,
no error return code of driver_sysfs_add() is assigned.
To fix this bug, ret is assigned with the return value of
device_create_file(), and then ret is checked.

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Link: https://lore.kernel.org/r/20210324023405.12465-1-baijiaju1990@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-28 14:55:39 +02:00
Yogesh Lal
e611f8cd87 driver core: Use unbound workqueue for deferred probes
Deferred probe usually runs only on pinned kworkers, which might take
longer time if a device contains multiple sub-devices. One such case
is of sound card on mobile devices, where we have good number of
mixers and controls per mixer.

We observed boot up improvement - deferred probes take ~600ms when bound
to little core kworker and ~200ms when deferred probe is queued on
unbound wq. This is due to scheduler moving the worker running deferred
probe work to big CPUs. Without this change, we see the worker is running
on LITTLE CPU due to affinity.

Since kworker runs deferred probe of several devices, the locality may
not be important. Also, init thread executing driver initcalls, can
potentially migrate as it has cpu affinity set to all cpus.In addition
to this, async probes use unbounded workqueue. So, using unbounded wq for
deferred probes looks to be similar to these w.r.t. scheduling behavior.

Signed-off-by: Yogesh Lal <ylal@codeaurora.org>
Link: https://lore.kernel.org/r/1616583698-6398-1-git-send-email-ylal@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-28 14:55:39 +02:00
Ahmad Fatoum
f0acf637d6 driver core: clear deferred probe reason on probe retry
When retrying a deferred probe, any old defer reason string should be
discarded. Otherwise, if the probe is deferred again at a different spot,
but without setting a message, the now incorrect probe reason will remain.

This was observed with the i.MX I2C driver, which ultimately failed
to probe due to lack of the GPIO driver. The probe defer for GPIO
doesn't record a message, but a previous probe defer to clock_get did.
This had the effect that /sys/kernel/debug/devices_deferred listed
a misleading probe deferral reason.

Cc: stable <stable@vger.kernel.org>
Fixes: d090b70ede ("driver core: add deferring probe reason to devices_deferred property")
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Link: https://lore.kernel.org/r/20210319110459.19966-1-a.fatoum@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23 15:13:43 +01:00
Arnd Bergmann
53f95c5534 devcoredump: avoid -Wempty-body warnings
Cleaning out the last -Wempty-body warnings found some interesting
cases with empty macros, along with harmless warnings like this one:

drivers/base/devcoredump.c: In function 'dev_coredumpm':
drivers/base/devcoredump.c:297:56: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
  297 |                 /* nothing - symlink will be missing */;
      |                                                        ^
drivers/base/devcoredump.c:301:56: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
  301 |                 /* nothing - symlink will be missing */;
      |                                                        ^

Randy tried addressing this one before, and there were multiple
other ideas in that thread.

Add a runtime warning and code comment here.

Link: https://lore.kernel.org/lkml/20200418184111.13401-8-rdunlap@infradead.org/
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210322114258.3420937-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23 15:04:59 +01:00
Andy Shevchenko
7f2fac70b7 device property: Add test cases for fwnode_property_count_*() APIs
Add test cases for fwnode_property_count_*() APIs.

While at it, modify the arrays of integers to be size of non-power-of-2
for better test coverage and decreasing stack usage.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210212162539.86850-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23 15:04:06 +01:00
Andy Shevchenko
0b8bf06f67 device property: Sync descriptions of swnode array and group APIs
After a few updates against swnode APIs the kernel documentation, i.e.
for swnode group registration and unregistration deviates from the one
for swnode array. In general, the same rules are applied to both.
Hence, synchronize descriptions of swnode array and group APIs

Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210308103644.81960-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23 15:04:01 +01:00
Saravana Kannan
ea718c6990 Revert "Revert "driver core: Set fw_devlink=on by default""
This reverts commit 3e4c982f1c.

Since all reported issues due to fw_devlink=on should be addressed by
this series, revert the revert. fw_devlink=on Take II.

Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210302211133.2244281-4-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23 14:58:11 +01:00
Saravana Kannan
b6f617df4f driver core: Update device link status properly for device_bind_driver()
Device link status was not getting updated correctly when
device_bind_driver() is called on a device. This causes a warning[1].
Fix this by updating device links that can be updated and dropping
device links that can't be updated to a sensible state.

[1] - https://lore.kernel.org/lkml/56f7d032-ba5a-a8c7-23de-2969d98c527e@nvidia.com/

Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210302211133.2244281-3-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23 14:58:11 +01:00
Saravana Kannan
f2db85b64f driver core: Avoid pointless deferred probe attempts
There's no point in adding a device to the deferred probe list if we
know for sure that it doesn't have a matching driver. So, check if a
device can match with a driver before adding it to the deferred probe
list.

Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210302211133.2244281-2-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23 14:58:10 +01:00
Rasmus Villemoes
01085e24ff devtmpfs: actually reclaim some init memory
Currently gcc seems to inline devtmpfs_setup() into devtmpfsd(), so
its memory footprint isn't reclaimed as intended. Mark it noinline to
make sure it gets put in .init.text.

While here, setup_done can also be put in .init.data: After complete()
releases the internal spinlock, the completion object is never touched
again by that thread, and the waiting thread doesn't proceed until it
observes ->done while holding that spinlock.

This is now the same pattern as for kthreadd_done in init/main.c:
complete() is done in a __ref function, while the corresponding
wait_for_completion() is in an __init function.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://lore.kernel.org/r/20210312103027.2701413-2-linux@rasmusvillemoes.dk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23 14:57:35 +01:00
Rasmus Villemoes
38f087de89 devtmpfs: fix placement of complete() call
Calling complete() from within the __init function is wrong -
theoretically, the init process could proceed all the way to freeing
the init mem before the devtmpfsd thread gets to execute the return
instruction in devtmpfs_setup().

In practice, it seems to be harmless as gcc inlines devtmpfs_setup()
into devtmpfsd(). So the calls of the __init functions init_chdir()
etc. actually happen from devtmpfs_setup(), but the __ref on that one
silences modpost (it's all right, because those calls happen before
the complete()). But it does make the __init annotation of the setup
function moot, which we'll fix in a subsequent patch.

Fixes: bcbacc4909 ("devtmpfs: refactor devtmpfsd()")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://lore.kernel.org/r/20210312103027.2701413-1-linux@rasmusvillemoes.dk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23 14:57:35 +01:00
Colin Ian King
6b72cf1282 drivers/base/cpu: remove redundant assignment of variable retval
The variable retval is being initialized with a value that is never read
and it is being updated later with a new value.  Clean this up by
initializing retval to -ENOMEM and remove the assignment to retval
on the !dev failure path.

Kudos to Rafael for the improved fix suggestion.

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210218202837.516231-1-colin.king@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23 14:56:50 +01:00
Greg Kroah-Hartman
2942df6751 driver core: dd: remove deferred_devices variable
No need to save the debugfs dentry for the "devices_deferred" debugfs
file (gotta love the juxtaposition), if we need to remove it we can look
it up from debugfs itself.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20210216142400.3759099-2-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23 10:49:04 +01:00
Greg Kroah-Hartman
c654cea59d driver core: component: remove dentry pointer in "struct master"
There is no need to keep around a pointer to a dentry when all it is
used for is to remove the debugfs file when tearing things down.  As the
name is simple, have debugfs look up the dentry when removing things,
keeping the logic much simpler.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20210216142400.3759099-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23 10:49:02 +01:00
Dave Jiang
bbf44abeea driver core: auxiliary bus: Remove unneeded module bits
Remove module bits in the auxiliary bus code since the auxiliary bus
cannot be built as a module and the relevant code is not needed.

Cc: Dave Ertman <david.m.ertman@intel.com>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/161307488980.1896017.15627190714413338196.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23 10:47:55 +01:00
Rafael J. Wysocki
5244f5e2d8 PM: runtime: Defer suspending suppliers
Because the PM-runtime status of the device is not updated in
__rpm_callback(), attempts to suspend the suppliers of the given
device triggered by the rpm_put_suppliers() call in there may
cause a supplier to be suspended completely before the status of
the consumer is updated to RPM_SUSPENDED, which is confusing.

To avoid that (1) modify __rpm_callback() to only decrease the
PM-runtime usage counter of each supplier and (2) make rpm_suspend()
try to suspend the suppliers after changing the consumer's status to
RPM_SUSPENDED, in analogy with the device's parent.

Link: https://lore.kernel.org/linux-pm/CAPDyKFqm06KDw_p8WXsM4dijDbho4bb6T4k50UqqvR1_COsp8g@mail.gmail.com/
Fixes: 21d5c57b37 ("PM / runtime: Use device links")
Reported-by: elaine.zhang <zhangqing@rock-chips.com>
Diagnosed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-22 15:21:38 +01:00
Rafael J. Wysocki
0cab893f40 Revert "PM: runtime: Update device status before letting suppliers suspend"
Revert commit 44cc89f764 ("PM: runtime: Update device status
before letting suppliers suspend") that introduced a race condition
into __rpm_callback() which allowed a concurrent rpm_resume() to
run and resume the device prematurely after its status had been
changed to RPM_SUSPENDED by __rpm_callback().

Fixes: 44cc89f764 ("PM: runtime: Update device status before letting suppliers suspend")
Link: https://lore.kernel.org/linux-pm/24dfb6fc-5d54-6ee2-9195-26428b7ecf8a@intel.com/
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: 4.10+ <stable@vger.kernel.org> # 4.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-03-19 16:35:47 +01:00
Heikki Krogerus
2a92c90f2e software node: Fix device_add_software_node()
The function device_add_software_node() was meant to
register the node supplied to it, but only if that node
wasn't already registered. Right now the function attempts
to always register the node. That will cause a failure with
nodes that are already registered.

Fixing that by incrementing the reference count of the nodes
that have already been registered, and only registering the
new nodes. Also, clarifying the behaviour in the function
documentation.

Fixes: e68d0119e3 ("software node: Introduce device_add_software_node()")
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-10 15:25:02 +01:00
Heikki Krogerus
8891123f9c software node: Fix node registration
Software node can not be registered before its parent.

Fixes: 80488a6b1d ("software node: Add support for static node descriptors")
Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-10 15:19:58 +01:00
Rafael J. Wysocki
44cc89f764 PM: runtime: Update device status before letting suppliers suspend
Because the PM-runtime status of the device is not updated in
__rpm_callback(), attempts to suspend the suppliers of the given
device triggered by rpm_put_suppliers() called by it may fail.

Fix this by making __rpm_callback() update the device's status to
RPM_SUSPENDED before calling rpm_put_suppliers() if the current
status of the device is RPM_SUSPENDING and the callback just invoked
by it has returned 0 (success).

While at it, modify the code in __rpm_callback() to always check
the device's PM-runtime status under its PM lock.

Link: https://lore.kernel.org/linux-pm/CAPDyKFqm06KDw_p8WXsM4dijDbho4bb6T4k50UqqvR1_COsp8g@mail.gmail.com/
Fixes: 21d5c57b37 ("PM / runtime: Use device links")
Reported-by: Elaine Zhang <zhangqing@rock-chips.com>
Diagnosed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Elaine Zhang <zhangiqng@rock-chips.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Cc: 4.10+ <stable@vger.kernel.org> # 4.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-01 17:40:38 +01:00
Linus Torvalds
8b83369ddc RISC-V Patches for the 5.12 Merge Window
I have a handful of new RISC-V related patches for this merge window:
 
 * A check to ensure drivers are properly using uaccess.  This isn't
   manifesting with any of the drivers I'm currently using, but may catch
   errors in new drivers.
 * Some preliminary support for the FU740, along with the HiFive
   Unleashed it will appear on.
 * NUMA support for RISC-V, which involves making the arm64 code generic.
 * Support for kasan on the vmalloc region.
 * A handful of new drivers for the Kendryte K210, along with the DT
   plumbing required to boot on a handful of K210-based boards.
 * Support for allocating ASIDs.
 * Preliminary support for kernels larger than 128MiB.
 * Various other improvements to our KASAN support, including the
   utilization of huge pages when allocating the KASAN regions.
 
 We may have already found a bug with the KASAN_VMALLOC code, but it's
 passing my tests.  There's a fix in the works, but that will probably
 miss the merge window.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmA4hXATHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYifryD/0SfXGOfj93Cxq7I7AYhhzCN7lJ5jvv
 iEQScTlPqU9nfvYodo4EDq0fp+5LIPpTL/XBHtqVjzv0FqRNa28Ea0K7kO8HuXc4
 BaUd0m/DqyB4Gfgm4qjc5bDneQ1ZYxVXprYERWNQ5Fj+tdWhaQGOW64N/TVodjjj
 NgJtTqbIAcjJqjUtttM8TZN5U1TgwLo+KCqw3iYW12lV1YKBBuvrwvSdD6jnFdIQ
 AzG/wRGZhxLoFxgBB/NEsZxDoSd6ztiwxLhS9lX4okZVsryyIdOE70Q/MflfiTlU
 xE+AdxQXTMUiiqYSmHeDD6PDb57GT/K3hnjI1yP+lIZpbInsi29JKow1qjyYjfHl
 9cSSKYCIXHL7jKU6pgt34G1O5N5+fgqHQhNbfKvlrQ2UPlfs/tWdKHpFIP/z9Jlr
 0vCAou7NSEB9zZGqzO63uBLXoN8yfL8FT3uRnnRvoRpfpex5dQX2QqPLQ7327D7N
 GUG31nd1PHTJPdxJ1cI4SO24PqPpWDWY9uaea+0jv7ivGClVadZPco/S3ZKloguT
 lazYUvyA4oRrSAyln785Rd8vg4CinqTxMtIyZbRMbNkgzVQARi9a8rjvu4n9qms2
 2wlXDFi8nR8B4ih5n79dSiiLM9ay9GJDxMcf9VxIxSAYZV2fJALnpK6gV2fzRBUe
 +k/uv8BIsFmlwQ==
 =CutX
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.12-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:
 "A handful of new RISC-V related patches for this merge window:

   - A check to ensure drivers are properly using uaccess. This isn't
     manifesting with any of the drivers I'm currently using, but may
     catch errors in new drivers.

   - Some preliminary support for the FU740, along with the HiFive
     Unleashed it will appear on.

   - NUMA support for RISC-V, which involves making the arm64 code
     generic.

   - Support for kasan on the vmalloc region.

   - A handful of new drivers for the Kendryte K210, along with the DT
     plumbing required to boot on a handful of K210-based boards.

   - Support for allocating ASIDs.

   - Preliminary support for kernels larger than 128MiB.

   - Various other improvements to our KASAN support, including the
     utilization of huge pages when allocating the KASAN regions.

  We may have already found a bug with the KASAN_VMALLOC code, but it's
  passing my tests. There's a fix in the works, but that will probably
  miss the merge window.

* tag 'riscv-for-linus-5.12-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (75 commits)
  riscv: Improve kasan population by using hugepages when possible
  riscv: Improve kasan population function
  riscv: Use KASAN_SHADOW_INIT define for kasan memory initialization
  riscv: Improve kasan definitions
  riscv: Get rid of MAX_EARLY_MAPPING_SIZE
  soc: canaan: Sort the Makefile alphabetically
  riscv: Disable KSAN_SANITIZE for vDSO
  riscv: Remove unnecessary declaration
  riscv: Add Canaan Kendryte K210 SD card defconfig
  riscv: Update Canaan Kendryte K210 defconfig
  riscv: Add Kendryte KD233 board device tree
  riscv: Add SiPeed MAIXDUINO board device tree
  riscv: Add SiPeed MAIX GO board device tree
  riscv: Add SiPeed MAIX DOCK board device tree
  riscv: Add SiPeed MAIX BiT board device tree
  riscv: Update Canaan Kendryte K210 device tree
  dt-bindings: add resets property to dw-apb-timer
  dt-bindings: fix sifive gpio properties
  dt-bindings: update sifive uart compatible string
  dt-bindings: update sifive clint compatible string
  ...
2021-02-26 10:28:35 -08:00
David Hildenbrand
e9a2e48e87 drivers/base/memory: don't store phys_device in memory blocks
No need to store the value for each and every memory block, as we can
easily query the value at runtime.  Reshuffle the members to optimize the
memory layout.  Also, let's clarify what the interface once was used for
and why it's legacy nowadays.

"phys_device" was used on s390x in older versions of lsmem[2]/chmem[3],
back when they were still part of s390x-tools.  They were later replaced
by the variants in linux-utils.  For example, RHEL6 and RHEL7 contain
lsmem/chmem from s390-utils.  RHEL8 switched to versions from util-linux
on s390x [4].

"phys_device" was added with sysfs support for memory hotplug in commit
3947be1969 ("[PATCH] memory hotplug: sysfs and add/remove functions") in
2005.  It always returned 0.

s390x started returning something != 0 on some setups (if sclp.rzm is set
by HW) in 2010 via commit 57b552ba0b ("memory hotplug/s390: set
phys_device").

For s390x, it allowed for identifying which memory block devices belong to
the same storage increment (RZM).  Only if all memory block devices
comprising a single storage increment were offline, the memory could
actually be removed in the hypervisor.

Since commit e5d709bb5f ("s390/memory hotplug: provide
memory_block_size_bytes() function") in 2013 a memory block device spans
at least one storage increment - which is why the interface isn't really
helpful/used anymore (except by old lsmem/chmem tools).

There were once RFC patches to make use of "phys_device" in ACPI context;
however, the underlying problem could be solved using different interfaces
[1].

[1] https://patchwork.kernel.org/patch/2163871/
[2] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/lsmem
[3] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/chmem
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1504134

Link: https://lkml.kernel.org/r/20210201181347.13262-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 09:41:00 -08:00
Anshuman Khandual
1adf8b468f mm/memory_hotplug: rename all existing 'memhp' into 'mhp'
This renames all 'memhp' instances to 'mhp' except for memhp_default_state
for being a kernel command line option.  This is just a clean up and
should not cause a functional change.  Let's make it consistent rater than
mixing the two prefixes.  In preparation for more users of the 'mhp'
terminology.

Link: https://lkml.kernel.org/r/1611554093-27316-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 09:41:00 -08:00
Linus Torvalds
4c48faba5b Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "A few small subsystems and some of MM.

  172 patches.

  Subsystems affected by this patch series: hexagon, scripts, ntfs,
  ocfs2, vfs, and mm (slab-generic, slab, slub, debug, pagecache, swap,
  memcg, pagemap, mprotect, mremap, page-reporting, vmalloc, kasan,
  pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
  mempolicy, oom-kill, hugetlbfs, and migration)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (172 commits)
  mm/migrate: remove unneeded semicolons
  hugetlbfs: remove unneeded return value of hugetlb_vmtruncate()
  hugetlbfs: fix some comment typos
  hugetlbfs: correct some obsolete comments about inode i_mutex
  hugetlbfs: make hugepage size conversion more readable
  hugetlbfs: remove meaningless variable avoid_reserve
  hugetlbfs: correct obsolete function name in hugetlbfs_read_iter()
  hugetlbfs: use helper macro default_hstate in init_hugetlbfs_fs
  hugetlbfs: remove useless BUG_ON(!inode) in hugetlbfs_setattr()
  hugetlbfs: remove special hugetlbfs_set_page_dirty()
  mm/hugetlb: change hugetlb_reserve_pages() to type bool
  mm, oom: fix a comment in dump_task()
  mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk()
  numa balancing: migrate on fault among multiple bound nodes
  mm, compaction: make fast_isolate_freepages() stay within zone
  mm/compaction: fix misbehaviors of fast_find_migrateblock()
  mm/compaction: correct deferral logic for proactive compaction
  mm/compaction: remove duplicated VM_BUG_ON_PAGE !PageLocked
  mm/compaction: remove rcu_read_lock during page compaction
  z3fold: simplify the zhdr initialization code in init_z3fold_page()
  ...
2021-02-24 16:20:38 -08:00
Shakeel Butt
b603894248 mm: memcg: add swapcache stat for memcg v2
This patch adds swapcache stat for the cgroup v2.  The swapcache
represents the memory that is accounted against both the memory and the
swap limit of the cgroup.  The main motivation behind exposing the
swapcache stat is for enabling users to gracefully migrate from cgroup
v1's memsw counter to cgroup v2's memory and swap counters.

Cgroup v1's memsw limit allows users to limit the memory+swap usage of a
workload but without control on the exact proportion of memory and swap.
Cgroup v2 provides separate limits for memory and swap which enables more
control on the exact usage of memory and swap individually for the
workload.

With some little subtleties, the v1's memsw limit can be switched with the
sum of the v2's memory and swap limits.  However the alternative for memsw
usage is not yet available in cgroup v2.  Exposing per-cgroup swapcache
stat enables that alternative.  Adding the memory usage and swap usage and
subtracting the swapcache will approximate the memsw usage.  This will
help in the transparent migration of the workloads depending on memsw
usage and limit to v2' memory and swap counters.

The reasons these applications are still interested in this approximate
memsw usage are: (1) these applications are not really interested in two
separate memory and swap usage metrics.  A single usage metric is more
simple to use and reason about for them.

(2) The memsw usage metric hides the underlying system's swap setup from
the applications.  Applications with multiple instances running in a
datacenter with heterogeneous systems (some have swap and some don't) will
keep seeing a consistent view of their usage.

[akpm@linux-foundation.org: fix CONFIG_SWAP=n build]

Link: https://lkml.kernel.org/r/20210108155813.2914586-3-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24 13:38:29 -08:00
Muchun Song
380780e718 mm: memcontrol: convert NR_FILE_PMDMAPPED account to pages
Currently we use struct per_cpu_nodestat to cache the vmstat counters,
which leads to inaccurate statistics especially THP vmstat counters.  In
the systems with hundreds of processors it can be GBs of memory.  For
example, for a 96 CPUs system, the threshold is the maximum number of 125.
And the per cpu counters can cache 23.4375 GB in total.

The THP page is already a form of batched addition (it will add 512 worth
of memory in one go) so skipping the batching seems like sensible.
Although every THP stats update overflows the per-cpu counter, resorting
to atomic global updates.  But it can make the statistics more accuracy
for the THP vmstat counters.

So we convert the NR_FILE_PMDMAPPED account to pages.  This patch is
consistent with 8f182270df ("mm/swap.c: flush lru pvecs on compound page
arrival").  Doing this also can make the unit of vmstat counters more
unified.  Finally, the unit of the vmstat counters are pages, kB and
bytes.  The B/KB suffix can tell us that the unit is bytes or kB.  The
rest which is without suffix are pages.

Link: https://lkml.kernel.org/r/20201228164110.2838-7-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Cc: Rafael. J. Wysocki <rafael@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24 13:38:29 -08:00