Commit Graph

1105379 Commits

Author SHA1 Message Date
Colin Ian King
2ecd0edd13 ceph: remove redundant variable ino
Variable ino is being assigned a value that is never read. The variable
and assignment are redundant, remove it.

Cleans up clang scan build warning:
warning: Although the value stored to 'ino' is used in the enclosing
expression, the value is never actually read from 'ino'
[deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li
a74379543d ceph: try to queue a writeback if revoking fails
If the pagecaches writeback just finished and the i_wrbuffer_ref
reaches zero it will try to trigger ceph_check_caps(). But if just
before ceph_check_caps() the i_wrbuffer_ref could be increased
again by mmap/cache write, then the Fwb revoke will fail.

We need to try to queue a writeback in this case instead of
triggering the writeback by BDI's delayed work per 5 seconds.

URL: https://tracker.ceph.com/issues/46904
URL: https://tracker.ceph.com/issues/55377
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Luís Henriques
55ab552080 ceph: fix statfs for subdir mounts
When doing a mount using as base a directory that has 'max_bytes' quotas
statfs uses that value as the total; if a subdirectory is used instead,
the same 'max_bytes' too in statfs, unless there is another quota set.

Unfortunately, if this subdirectory only has the 'max_files' quota set,
then statfs uses the filesystem total.  Fix this by making sure we only
lookup realms that contain the 'max_bytes' quota.

Cc: Ryan Taylor <rptaylor@uvic.ca>
URL: https://tracker.ceph.com/issues/55090
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li
825978fd6a ceph: fix possible deadlock when holding Fwb to get inline_data
1, mount with wsync.
2, create a file with O_RDWR, and the request was sent to mds.0:

   ceph_atomic_open()-->
     ceph_mdsc_do_request(openc)
     finish_open(file, dentry, ceph_open)-->
       ceph_open()-->
         ceph_init_file()-->
           ceph_init_file_info()-->
             ceph_uninline_data()-->
             {
               ...
               if (inline_version == 1 || /* initial version, no data */
                   inline_version == CEPH_INLINE_NONE)
                     goto out_unlock;
               ...
             }

The inline_version will be 1, which is the initial version for the
new create file. And here the ci->i_inline_version will keep with 1,
it's buggy.

3, buffer write to the file immediately:

   ceph_write_iter()-->
     ceph_get_caps(file, need=Fw, want=Fb, ...);
     generic_perform_write()-->
       a_ops->write_begin()-->
         ceph_write_begin()-->
           netfs_write_begin()-->
             netfs_begin_read()-->
               netfs_rreq_submit_slice()-->
                 netfs_read_from_server()-->
                   rreq->netfs_ops->issue_read()-->
                     ceph_netfs_issue_read()-->
                     {
                       ...
                       if (ci->i_inline_version != CEPH_INLINE_NONE &&
                           ceph_netfs_issue_op_inline(subreq))
                         return;
                       ...
                     }
     ceph_put_cap_refs(ci, Fwb);

The ceph_netfs_issue_op_inline() will send a getattr(Fsr) request to
mds.1.

4, then the mds.1 will request the rd lock for CInode::filelock from
the auth mds.0, the mds.0 will do the CInode::filelock state transation
from excl --> sync, but it need to revoke the Fxwb caps back from the
clients.

While the kernel client has aleady held the Fwb caps and waiting for
the getattr(Fsr).

It's deadlock!

URL: https://tracker.ceph.com/issues/55377
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li
3459bd0c55 ceph: redirty the page for writepage on failure
When run out of memories we should redirty the page before failing
the writepage. Or we will hit BUG_ON(folio_get_private(folio)) in
ceph_dirty_folio().

URL: https://tracker.ceph.com/issues/55421
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li
5eed80fba6 ceph: try to choose the auth MDS if possible for getattr
If any 'x' caps is issued we can just choose the auth MDS instead
of the random replica MDSes. Because only when the Locker is in
LOCK_EXEC state will the loner client could get the 'x' caps. And
if we send the getattr requests to any replica MDS it must auth pin
and tries to rdlock from the auth MDS, and then the auth MDS need
to do the Locker state transition to LOCK_SYNC. And after that the
lock state will change back.

This cost much when doing the Locker state transition and usually
will need to revoke caps from clients.

URL: https://tracker.ceph.com/issues/55240
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li
f7a2d0688a ceph: disable updating the atime since cephfs won't maintain it
Since CephFS makes no attempt to maintain atime, we shouldn't
try to update it in mmap and generic read cases and ignore updating
it in direct and sync read cases.

And even we update it in mmap and generic read cases we will drop
it and won't sync it to MDS. And we are seeing the atime will be
updated and then dropped to the floor again and again.

URL: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/VSJM7T4CS5TDRFF6XFPIYMHP75K73PZ6/
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Acked-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li
1b2ba3c561 ceph: flush the mdlog for filesystem sync
Before waiting for a request's safe reply, we will send the mdlog flush
request to the relevant MDS. And this will also flush the mdlog for all
the other unsafe requests in the same session, so we can record the last
session and no need to flush mdlog again in the next loop. But there
still have cases that it may send the mdlog flush requst twice or more,
but that should be not often.

Rename wait_unsafe_requests() to
flush_mdlog_and_wait_mdsc_unsafe_requests() to make it more
descriptive.

[xiubli: fold in MDS request refcount leak fix from Jeff]

URL: https://tracker.ceph.com/issues/55284
URL: https://tracker.ceph.com/issues/55411
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li
ae06706330 ceph: rename unsafe_request_wait()
Rename it to flush_mdlog_and_wait_inode_unsafe_requests() to make
it more descriptive.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Guo Zhengkui
d9d58f0402 libceph: use swap() macro instead of taking tmp variable
Fix the following coccicheck warning:
net/ceph/crush/mapper.c:1077:8-9: WARNING opportunity for swap()

by using swap() for the swapping of variable values and drop
the tmp variable that is not needed any more.

Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li
261998c300 ceph: fix statx AT_STATX_DONT_SYNC vs AT_STATX_FORCE_SYNC check
From the posix and the initial statx supporting commit comments,
the AT_STATX_DONT_SYNC is a lightweight stat and the
AT_STATX_FORCE_SYNC is a heaverweight one. And also checked all
the other current usage about these two flags they are all doing
the same, that is only when the AT_STATX_FORCE_SYNC is not set
and the AT_STATX_DONT_SYNC is set will they skip sync retriving
the attributes from storage.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li
68e5ec2ec9 ceph: no need to invalidate the fscache twice
Fixes: 400e1286c0 ("ceph: conversion to new fscache API")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Jakob Koschel
3ffa9d6f99 ceph: replace usage of found with dedicated list iterator variable
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean.

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Jakob Koschel
57a5df0e86 ceph: use dedicated list iterator variable
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable.

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li
7ffe4fcea7 ceph: update the dlease for the hashed dentry when removing
The MDS will always refresh the dentry lease when removing the files
or directories. And if the dentry is still hashed, we can update
the dentry lease and no need to do the lookup from the MDS later.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li
546a5d6122 ceph: stop retrying the request when exceeding 256 times
The type of 'r_attempts' in kernel 'ceph_mds_request' is 'int',
while in 'ceph_mds_request_head' the type of 'num_retry' is '__u8'.
So in case the request retries exceeding 256 times, the MDS will
receive a incorrect retry seq.

In this case it's ususally a bug in MDS and continue retrying the
request makes no sense. For now let's limit it to 256. In future
this could be fixed in ceph code, so avoid using the hardcode here.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li
1980b1bf17 ceph: stop forwarding the request when exceeding 256 times
The type of 'num_fwd' in ceph 'MClientRequestForward' is 'int32_t',
while in 'ceph_mds_request_head' the type is '__u8'. So in case
the request bounces between MDSes exceeding 256 times, the client
will get stuck.

In this case it's ususally a bug in MDS and continue bouncing the
request makes no sense.

URL: https://tracker.ceph.com/issues/55130
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luís Henriques <lhenriques@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li
6c1dc50284 ceph: remove unused CEPH_MDS_LEASE_RELEASE related code
The ceph_mdsc_lease_release() has been removed by commit 8aa152c778
(ceph: remove ceph_mdsc_lease_release). ceph_mdsc_lease_send_msg will
never be called with CEPH_MDS_LEASE_RELEASE.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Jakob Koschel
3302ffd44c rbd: replace usage of found with dedicated list iterator variable
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean.

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Venky Shankar
d7a2dc5230 ceph: allow ceph.dir.rctime xattr to be updatable
`rctime' has been a pain point in cephfs due to its buggy
nature - inconsistent values reported and those sorts.
Fixing rctime is non-trivial needing an overall redesign
of the entire nested statistics infrastructure.

As a workaround, PR

     http://github.com/ceph/ceph/pull/37938

allows this extended attribute to be manually set. This allows
users to "fixup" inconsistent rctime values. While this sounds
messy, its probably the wisest approach allowing users/scripts
to workaround buggy rctime values.

The above PR enables Ceph MDS to allow manually setting
rctime extended attribute with the corresponding user-land
changes. We may as well allow the same to be done via kclient
for parity.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Linus Torvalds
64e34b50d7 linux-kselftest-kunit-5.19-rc1
This KUnit update for Linux 5.19-rc1 consists of several fixes, cleanups,
 and enhancements to tests and framework:
 
 - introduces _NULL and _NOT_NULL macros to pointer error checks
 
 - reworks kunit_resource allocation policy to fix memory leaks when
   caller doesn't specify free() function to be used when allocating
   memory using kunit_add_resource() and kunit_alloc_resource() funcs.
 
 - adds ability to specify suite-level init and exit functions
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmKLw4QACgkQCwJExA0N
 Qxz9wRAA3PonJESDAFF2sXTDzQurEXdWoJHqNvO0JCObku8SDODEI7nozXOD0MBC
 ASAXiX3HuNI0yESF27xECqu3xbe8KsYOtCN8vco/sYUroVGmzgAt/atsvrSUv2Oh
 sEQbjrTMwkMUjL5ECvjR2dArd6bQew7PPBkl3HqOpyysL3b/EAMEAY0DmDXrrrwB
 +oNvXGVAR1Tczg4ahcSSwDdZl1C41kREj5f8S/4+kohMdIjCUPWOAYnaWHpVdAOJ
 C+LWkPSJ5IpgjU2urDX2kNfg32UxIJpFI009ovytBmwCbd+GEs24u7gtgtksPM2s
 YypoPEqC40gxkbY99omojtADiDdZlKqlIipCTWYe/CpzgBD+WQ4PVqMGM4ZprP9w
 Hrc6ulVmd8hZ4F9QQ3oN6W9L6pBCgdXtPPCsQtGoUTbw7r79BP67PjJ6Ko+usn3s
 Jy0FR5LvzYBjykoJzKSIaJ8ONaX34DB6w5rB+q5mBGwPKPHWo3eAZVZDPEMVo3Z7
 D9TW5UliGBt2y5YJZbPbSnhdJPMPHSK5ef9hIy0wYjVJFafirdgrQhgbWbVxalRT
 eZz1edcs1sdU7GAzfMA/v+NqAAA3bFIUVr2b+GTc+4zzWhq+cwI2SNikgyhETv/f
 xKq8Xek8EkOIdaa2lu9chTPT4sG7A6991EkRqfc7rL1IptkPiS8=
 =DzVQ
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-kunit-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull KUnit updates from Shuah Khan:
 "Several fixes, cleanups, and enhancements to tests and framework:

   - introduce _NULL and _NOT_NULL macros to pointer error checks

   - rework kunit_resource allocation policy to fix memory leaks when
     caller doesn't specify free() function to be used when allocating
     memory using kunit_add_resource() and kunit_alloc_resource() funcs.

   - add ability to specify suite-level init and exit functions"

* tag 'linux-kselftest-kunit-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (41 commits)
  kunit: tool: Use qemu-system-i386 for i386 runs
  kunit: fix executor OOM error handling logic on non-UML
  kunit: tool: update riscv QEMU config with new serial dependency
  kcsan: test: use new suite_{init,exit} support
  kunit: tool: Add list of all valid test configs on UML
  kunit: take `kunit_assert` as `const`
  kunit: tool: misc cleanups
  kunit: tool: minor cosmetic cleanups in kunit_parser.py
  kunit: tool: make parser stop overwriting status of suites w/ no_tests
  kunit: tool: remove dead parse_crash_in_log() logic
  kunit: tool: print clearer error message when there's no TAP output
  kunit: tool: stop using a shell to run kernel under QEMU
  kunit: tool: update test counts summary line format
  kunit: bail out of test filtering logic quicker if OOM
  lib/Kconfig.debug: change KUnit tests to default to KUNIT_ALL_TESTS
  kunit: Rework kunit_resource allocation policy
  kunit: fix debugfs code to use enum kunit_status, not bool
  kfence: test: use new suite_{init/exit} support, add .kunitconfig
  kunit: add ability to specify suite-level init and exit functions
  kunit: rename print_subtest_{start,end} for clarity (s/subtest/suite)
  ...
2022-05-25 11:32:53 -07:00
Linus Torvalds
1c6d2ead87 linux-kselftest-next-5.19-rc1
This Kselftest update for Linux 5.19-rc1 consists of several fixes,
 cleanups, and enhancements to tests:
 
 - adds mips support for kprobe args string and syntax tests
 - updates to resctrl test to use kselftest framework
 - fixes, cleanups, and enhancements to tests
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmKL2WUACgkQCwJExA0N
 Qxw3ERAAmmZ3p0ULMeUNiNjsL7zFL0x7LvgutfnXeBmy99QvZif0Zn44c1/Kv3Qc
 Jb/cUNA76u4Z9Z2L3ITeoZbk/s0Vdc4gXtP22O5CnBbQLMv8ptpMCRBPLmrYVmdm
 YIep2hEuqL9IuraRLsvbAWwJm6r7xTeR5IaLf3B7HtY2IjHAdEuX4wklnaap1gt/
 nk1BYMb0i8WAPlpl7leCL+2ZbFzI2tcpap9gTcFVNnsPTI4cS3+88mHopG0pw8cv
 uszj63xOFDhV6sOxtpJDhepBOKbjHMNgxc+wm/4zWdLPVA71dF46CEdLqKZ2gDVt
 rNocmDZUlLv18N2jHm8S6YWjZuDbv3L3FCkA1d8grsDC0khKSkY8wHd9PBgLkZ+a
 gSO7TNU2o8dLj4i+MMgs2x+/iKAftyBWAP5XQURdZHiTJ+BrJTxN9Ph4kwHpzXai
 2Cauk9xIwDJanRI7ArOBcjSnhXY64opHUOBt1h7NIzm/114Qlr9Aj8TKZIAYyqMw
 BgjgOY30nQzSFkG7ASOdQckmUxa5gV6VVFmDt5+p13bisypz8iFWefOL9MvN8sdE
 z7PZChAgi0oYicGOUk1jBlXZfaWb8kmtk8KV2vFMcF4GnvMzlAWaBN7v3vP34afG
 IqQXZHI6//XiedDtLUn9XBdnT0NPFSbhmRYDHwGYGZ6oM4k4pRs=
 =MKRc
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-next-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest updates from Shuah Khan:
 "Several fixes, cleanups, and enhancements to tests:

   - add mips support for kprobe args string and syntax tests

   - updates to resctrl test to use kselftest framework

   - fixes, cleanups, and enhancements to tests"

* tag 'linux-kselftest-next-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kselftests/ir : Improve readability of modprobe error message
  selftests/resctrl: Fix null pointer dereference on open failed
  selftests/resctrl: Add missing SPDX license to Makefile
  selftests/resctrl: Update README about using kselftest framework to build/run resctrl_tests
  selftests/resctrl: Make resctrl_tests run using kselftest framework
  selftests/resctrl: Fix resctrl_tests' return code to work with selftest framework
  selftests/resctrl: Change the default limited time to 120 seconds
  selftests/resctrl: Kill child process before parent process terminates if SIGTERM is received
  selftests/resctrl: Print a message if the result of MBM&CMT tests is failed on Intel CPU
  selftests/resctrl: Extend CPU vendor detection
  selftests/x86/corrupt_xstate_header: Use provided __cpuid_count() macro
  selftests/x86/amx: Use provided __cpuid_count() macro
  selftests/vm/pkeys: Use provided __cpuid_count() macro
  selftests: Provide local define of __cpuid_count()
  selftests/damon: add damon to selftests root Makefile
  selftests/binderfs: Improve message to provide more info
  selftests: mqueue: drop duplicate min definition
  selftests/ftrace: add mips support for kprobe args syntax tests
  selftests/ftrace: add mips support for kprobe args string tests
2022-05-25 11:30:21 -07:00
Linus Torvalds
88a618920e It was a moderately busy cycle for documentation; highlights include:
- After a long period of inactivity, the Japanese translations are seeing
    some much-needed maintenance and updating.
 
  - Reworked IOMMU documentation
 
  - Some new documentation for static-analysis tools
 
  - A new overall structure for the memory-management documentation.  This
    is an LSFMM outcome that, it is hoped, will help encourage developers to
    fill in the many gaps.  Optimism is eternal...but hopefully it will
    work.
 
  - More Chinese translations.
 
 Plus the usual typo fixes, updates, etc.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmKLqZQPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YdgQH/2/9+EgQDes93f/+iKtbO23EV67392dwrmXS
 kYg8lR4948/Q3jzgMloUo6hNOoxXeV/sqmdHu0LjUhFN+BGsp9fFjd/jp0XhWcqA
 nnc9foGbpmeFPxHeAg2aqV84eeasLoO5lUUm2rNoPBLd6HFV+IYC5R4VZ+w42StB
 5bYEOYwHXMvQZXkivZDse82YmvQK3/2rRGTUoFhME/Aap6rFgWJJ+XQcSKA7WmwW
 OpJqq+FOsjsxHe6IFVy6onzlqgGJM8zM2bLtqedid6yaE3uACcHMb/OyAjp0rdKF
 BQvaG+d3f7DugABqM6Y1oU75iBtJWWYgGeAm36JtX+3mz2uR/f0=
 =3UoR
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.19' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It was a moderately busy cycle for documentation; highlights include:

   - After a long period of inactivity, the Japanese translations are
     seeing some much-needed maintenance and updating.

   - Reworked IOMMU documentation

   - Some new documentation for static-analysis tools

   - A new overall structure for the memory-management documentation.
     This is an LSFMM outcome that, it is hoped, will help encourage
     developers to fill in the many gaps. Optimism is eternal...but
     hopefully it will work.

   - More Chinese translations.

  Plus the usual typo fixes, updates, etc"

* tag 'docs-5.19' of git://git.lwn.net/linux: (70 commits)
  docs: pdfdocs: Add space for chapter counts >= 100 in TOC
  docs/zh_CN: Add dev-tools/gdb-kernel-debugging.rst Chinese translation
  input: Docs: correct ntrig.rst typo
  input: Docs: correct atarikbd.rst typos
  MAINTAINERS: Become the docs/zh_CN maintainer
  docs/zh_CN: fix devicetree usage-model translation
  mm,doc: Add new documentation structure
  Documentation: drop more IDE boot options and ide-cd.rst
  Documentation/process: use scripts/get_maintainer.pl on patches
  MAINTAINERS: Add entry for DOCUMENTATION/JAPANESE
  docs/trans/ja_JP/howto: Don't mention specific kernel versions
  docs/ja_JP/SubmittingPatches: Request summaries for commit references
  docs/ja_JP/SubmittingPatches: Add Suggested-by as a standard signature
  docs/ja_JP/SubmittingPatches: Randy has moved
  docs/ja_JP/SubmittingPatches: Suggest the use of scripts/get_maintainer.pl
  docs/ja_JP/SubmittingPatches: Update GregKH links
  Documentation/sysctl: document max_rcu_stall_to_panic
  Documentation: add missing angle bracket in cgroup-v2 doc
  Documentation: dev-tools: use literal block instead of code-block
  docs/zh_CN: add vm numa translation
  ...
2022-05-25 11:17:41 -07:00
Yufen Yu
908ea65416 f2fs: add f2fs_init_write_merge_io function
Almost all other initialization of variables in f2fs_fill_super are
extraced to a single function. Also do it for write_io[], which can
make code more clean.

This patch just refactors the code, theres no functional change.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[Jaegeuk Kim: clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-25 10:48:56 -07:00
Kefeng Wang
f403f22f8c mm: kfence: use PAGE_ALIGNED helper
Use PAGE_ALIGNED macro instead of IS_ALIGNED and passing PAGE_SIZE.

Link: https://lkml.kernel.org/r/20220520021833.121405-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Cc: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:49 -07:00
Patrick Wang
0598739900 selftests: vm: add the "settings" file with timeout variable
The default "timeout" for one kselftest is 45 seconds, while some cases in
run_vmtests.sh require more time.  This will cause testing timeout like:

  not ok 4 selftests: vm: run_vmtests.sh # TIMEOUT 45 seconds

Therefore, add the "settings" file with timeout variable so users can set
the "timeout" value.

Link: https://lkml.kernel.org/r/20220521083825.319654-4-patrick.wang.shcn@gmail.com
Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:49 -07:00
Patrick Wang
ccd2a1201d selftests: vm: add "test_hmm.sh" to TEST_FILES
The "test_hmm.sh" file used by run_vmtests.sh dose not be installed into
INSTALL_PATH.  Thus run_vmtests.sh can not call it in INSTALL_PATH:

  ---------------------------
  running ./test_hmm.sh smoke
  ---------------------------
  ./run_vmtests.sh: line 74: ./test_hmm.sh: No such file or directory
  [FAIL]
  -----------------------

Add "test_hmm.sh" to TEST_FILES so that it will be installed.

Link: https://lkml.kernel.org/r/20220521083825.319654-3-patrick.wang.shcn@gmail.com
Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:49 -07:00
Patrick Wang
9aa1af954d selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests
Patch series "selftests: vm: a few fixup patches".

This series contains three fixup patches for vm selftests.  They are
independent.  Please see the patches.


This patch (of 3):

Currently, ksm_tests operates "merge_across_nodes" with NUMA either
enabled or disabled.  In a system with NUMA disabled, these operations
will fail and output a misleading report given "merge_across_nodes" does
not exist in sysfs:

  ----------------------------
  running ./ksm_tests -M -p 10
  ----------------------------
  f /sys/kernel/mm/ksm/merge_across_nodes
  fopen: No such file or directory
  Cannot save default tunables
  [FAIL]
  ----------------------

So check numa_available() before those operations to skip them if NUMA is
disabled.

Link: https://lkml.kernel.org/r/20220521083825.319654-1-patrick.wang.shcn@gmail.com
Link: https://lkml.kernel.org/r/20220521083825.319654-2-patrick.wang.shcn@gmail.com
Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:49 -07:00
Muhammad Usama Anjum
3d3921ed27 selftests: vm: add migration to the .gitignore
Add newly added migration test object to .gitignore file.

Link: https://lkml.kernel.org/r/20220521094313.166505-1-usama.anjum@collabora.com
Fixes: 0c2d087284 ("mm: add selftests for migration entries")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:49 -07:00
Julia Lawall
75c96ccea2 selftests/vm/pkeys: fix typo in comment
Spelling mistake (triple letters) in comment.  Detected with the help of
Coccinelle.

Link: https://lkml.kernel.org/r/20220521111145.81697-80-Julia.Lawall@inria.fr
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:48 -07:00
Julia Lawall
3413b2c872 ksm: fix typo in comment
Spelling mistake (triple letters) in comment.  Detected with the help of
Coccinelle.

Link: https://lkml.kernel.org/r/20220521111145.81697-94-Julia.Lawall@inria.fr
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:48 -07:00
Suren Baghdasaryan
33776141b8 selftests: vm: add process_mrelease tests
Introduce process_mrelease syscall sanity tests which include tests
which expect to fail:

- process_mrelease with invalid pidfd and flags inputs
- process_mrelease on a live process with no pending signals

and valid process_mrelease usage which is expected to succeed.  Because
process_mrelease has to be used against a process with a pending SIGKILL,
it's possible that the process exits before process_mrelease gets called. 
In such cases we retry the test with a victim that allocates twice more
memory up to 1GB.  This would require the victim process to spend more
time during exit and process_mrelease has a better chance of catching the
process before it exits and succeeding.

On success the test reports the amount of memory the child had to allocate
for reaping to succeed.  Sample output:

$ mrelease_test
Success reaping a child with 1MB of memory allocations

On failure the test reports the failure. Sample outputs:

$ mrelease_test
All process_mrelease attempts failed!

$ mrelease_test
process_mrelease: Invalid argument

Link: https://lkml.kernel.org/r/20220518204316.13131-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:48 -07:00
Johannes Weiner
3f1509c57b Revert "mm/vmscan: never demote for memcg reclaim"
This reverts commit 3a235693d3.

Its premise was that cgroup reclaim cares about freeing memory inside the
cgroup, and demotion just moves them around within the cgroup limit. 
Hence, pages from toptier nodes should be reclaimed directly.

However, with NUMA balancing now doing tier promotions, demotion is part
of the page aging process.  Global reclaim demotes the coldest toptier
pages to secondary memory, where their life continues and from which they
have a chance to get promoted back.  Essentially, tiered memory systems
have an LRU order that spans multiple nodes.

When cgroup reclaims pages coming off the toptier directly, there can be
colder pages on lower tier nodes that were demoted by global reclaim. 
This is an aging inversion, not unlike if cgroups were to reclaim directly
from the active lists while there are inactive pages.

Proactive reclaim is another factor.  The goal of that it is to offload
colder pages from expensive RAM to cheaper storage.  When lower tier
memory is available as an intermediate layer, we want offloading to take
advantage of it instead of bypassing to storage.

Revert the patch so that cgroups respect the LRU order spanning the memory
hierarchy.

Of note is a specific undercommit scenario, where all cgroup limits in the
system add up to <= available toptier memory.  In that case, shuffling
pages out to lower tiers first to reclaim them from there is inefficient. 
This is something could be optimized/short-circuited later on (although
care must be taken not to accidentally recreate the aging inversion). 
Let's ensure correctness first.

Link: https://lkml.kernel.org/r/20220518190911.82400-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:48 -07:00
Jackie Liu
83d7d04f9d mm/kfence: print disabling or re-enabling message
By printing information, we can friendly prompt the status change
information of kfence by dmesg and record by syslog.

Also, set kfence_enabled to false only when needed.

Link: https://lkml.kernel.org/r/20220518073105.3160335-1-liu.yun@linux.dev
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Co-developed-by: Marco Elver <elver@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:48 -07:00
Vasily Averin
e5c3f619a0 include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace"
Fix sparse warning about incorrect gfp_t cast.

Link: https://lkml.kernel.org/r/001979f3-e978-0998-cbed-61a4a2ac87b8@openvz.org
Fixes: f67bed134a ("percpu: improve percpu_alloc_percpu event trace")
Signed-off-by: Vasily Averin <vvs@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:48 -07:00
Vasily Averin
185194f191 include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion"
Redefines __def_gfpflag_names array according to akpm@, willy@ and Joe
Perches recommendations.

Link: https://lkml.kernel.org/r/6f811e19-41c6-f3e8-fca6-23a19a62e313@openvz.org
Fixes: fe573327ff ("tracing: incorrect gfp_t conversion")
Signed-off-by: Vasily Averin <vvs@openvz.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Joe Perches <joe@perches.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:47 -07:00
Zi Yan
88ee134320 mm: fix a potential infinite loop in start_isolate_page_range()
In isolate_single_pageblock() called by start_isolate_page_range(), there
are some pageblock isolation issues causing a potential infinite loop when
isolating a page range.  This is reported by Qian Cai.

1. the pageblock was isolated by just changing pageblock migratetype
   without checking unmovable pages. Calling set_migratetype_isolate() to
   isolate pageblock properly.
2. an off-by-one error caused migrating pages unnecessarily, since the page
   is not crossing pageblock boundary.
3. migrating a compound page across pageblock boundary then splitting the
   free page later has a small race window that the free page might be
   allocated again, so that the code will try again, causing an potential
   infinite loop. Temporarily set the to-be-migrated page's pageblock to
   MIGRATE_ISOLATE to prevent that and bail out early if no free page is
   found after page migration.

An additional fix to split_free_page() aims to avoid crashing in
__free_one_page().  When the free page is split at the specified
split_pfn_offset, free_page_order should check both the first bit of
free_page_pfn and the last bit of split_pfn_offset and use the smaller
one.  For example, if free_page_pfn=0x10000, split_pfn_offset=0xc000,
free_page_order should first be 0x8000 then 0x4000, instead of 0x4000 then
0x8000, which the original algorithm did.

[akpm@linux-foundation.org: suppress min() warning]
Link: https://lkml.kernel.org/r/20220524194756.1698351-1-zi.yan@sent.com
Fixes: b2c9e2fbba ("mm: make alloc_contig_range work at pageblock granularity")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Ren <renzhengeek@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:47 -07:00
Muchun Song
bb5ced41a6 MAINTAINERS: add Muchun as co-maintainer for HugeTLB
I have been focusing on mm for the past two years.  e.g.  developing,
fixing bugs, reviewing related to HugeTLB system.  I would like to help
Mike and other people working on HugeTLB by reviewing their work.

When I first introduced the vmemmmap reduction, I forgot to update
MAINTAINERS file.  Let's update it as well.  And rename "HUGETLB
FILESYSTEM" to "HUGETLB SUBSYSTEM" since some files are not only related
to filesystem but also memory management (the name of FILESYSTEM cannot
cover this area).

Link: https://lkml.kernel.org/r/20220521074103.79468-1-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:47 -07:00
Randy Dunlap
6140ae41ef zram: fix Kconfig dependency warning
ZSMALLOC depends on MMU so ZRAM should also depend on MMU since 'select'
does not follow any dependency chains.

Fixes this Kconfig warning:

WARNING: unmet direct dependencies detected for ZSMALLOC
  Depends on [n]: MMU [=n]
  Selected by [y]:
  - ZRAM [=y] && BLK_DEV [=y] && BLOCK [=y] && SYSFS [=y] && (CRYPTO_LZO [=y] || CRYPTO_ZSTD [=m] || CRYPTO_LZ4 [=m] || CRYPTO_LZ4HC [=n] || CRYPTO_842 [=n])

Link: https://lkml.kernel.org/r/20220522204027.22964-1-rdunlap@infradead.org
Fixes: b3fbd58fcb ("mm: Kconfig: simplify zswap configuration")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:47 -07:00
Hugh Dickins
e384200e70 mm/shmem: fix shmem folio swapoff hang
Shmem swapoff makes no progress: the index to indices is not incremented. 
But "ret" is no longer a return value, so use folio_batch_count() instead.

Link: https://lkml.kernel.org/r/c32bee8a-f0aa-245-f94e-24dd271924fa@google.com
Fixes: da08e9b793 ("mm/shmem: convert shmem_swapin_page() to shmem_swapin_folio()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Tested-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:47 -07:00
Christophe JAILLET
7fb6378701 cgroup: fix an error handling path in alloc_pagecache_max_30M()
If the first goto is taken, 'fd' is not opened yet (and is un-initialized).
So a direct return is safer.

Link: https://lkml.kernel.org/r/628312312eb40e0e39463a2c06415fde5295c716.1653229120.git.christophe.jaillet@wanadoo.fr
Fixes: c1a31a2f7a ("cgroup: fix racy check in alloc_pagecache_max_30M() helper function")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: David Vernet <void@manifault.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:46 -07:00
Linus Torvalds
537e62c865 printk changes for 5.19
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmKLXH8ACgkQUqAMR0iA
 lPIABhAAtAZRmvg9UjUS8dpmS3plXdg/zJU0AbK9o/m/hGzMfs2bgHxwM7mbGa1O
 VC0Jczj9tfJXESfrBsV0ZpY5H+iGilEkTF86/ME4sS8lmIeSim9dAxF4sTvM1vw/
 IST4llN0IRuNHwrb20GyH44MOG9JwFwEyIgYITwkB8iYK/lo/sP8xkZuC44CmaJf
 28ZZAwICigtyR9lF0psQGLgMc4+laT5l3XF/c9OyqEFbB5khBGxT0RwV0WS4ZcPA
 mTn5kW6WcDbTNKUVUHW1jzmJBq3ci+0ckh6jLNJWc6Olh5jbGU7selVTst96GQKm
 sgWF7uykURls3ZFPzTJSY6E3Gnwrsw75RQYDLtTOSxqB2NlVsBTyZq4jgNtxiR3z
 ovA9souDe4t/BPqkHTHZkVEyaFWZlRwNlzJZIwN2Auy/uFjznWnOQxT2t3BYUZt5
 8qnUt+JBvtSNyLDvoNtQnyCiCyEZdyrHQ+3RsFWIQz6CnA34Xh6oZPxbK24pnfDy
 F5OuIulrpIPfEFufV6ZR30QeB2gLkvCorUfl5pde4QL/Pujxrk6CCikv39QOfL7K
 6+X7hq/Moq8vhzMfWl+LEPS6qpAwNJl69JIaQrp18JHVGeKVagS1e6pOmThSOPv7
 bDucE08oOK8KTnR6ysfKf24JC6HopB7vFYfhSEa8rgssDLtcGso=
 =pN3o
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - Offload writing printk() messages on consoles to per-console
   kthreads.

   It prevents soft-lockups when an extensive amount of messages is
   printed. It was observed, for example, during boot of large systems
   with a lot of peripherals like disks or network interfaces.

   It prevents live-lockups that were observed, for example, when
   messages about allocation failures were reported and a CPU handled
   consoles instead of reclaiming the memory. It was hard to solve even
   with rate limiting because it would need to take into account the
   amount of messages and the speed of all consoles.

   It is a must to have for real time. Otherwise, any printk() might
   break latency guarantees.

   The per-console kthreads allow to handle each console on its own
   speed. Slow consoles do not longer slow down faster ones. And
   printk() does not longer unpredictably slows down various code paths.

   There are situations when the kthreads are either not available or
   not reliable, for example, early boot, suspend, or panic. In these
   situations, printk() uses the legacy mode and tries to handle
   consoles immediately.

 - Add documentation for the printk index.

* tag 'printk-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk, tracing: fix console tracepoint
  printk: remove @console_locked
  printk: extend console_lock for per-console locking
  printk: add kthread console printers
  printk: add functions to prefer direct printing
  printk: add pr_flush()
  printk: move buffer definitions into console_emit_next_record() caller
  printk: refactor and rework printing logic
  printk: add con_printk() macro for console details
  printk: call boot_delay_msec() in printk_delay()
  printk: get caller_id/timestamp after migration disable
  printk: wake waiters for safe and NMI contexts
  printk: wake up all waiters
  printk: add missing memory barrier to wake_up_klogd()
  printk: cpu sync always disable interrupts
  printk: rename cpulock functions
  printk/index: Printk index feature documentation
  MAINTAINERS: Add printk indexing maintainers on mention of printk_index
2022-05-25 10:32:08 -07:00
Linus Torvalds
2e17ce1106 slab changes for 5.19
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEjUuTAak14xi+SF7M4CHKc/GJqRAFAmKLUYoACgkQ4CHKc/GJ
 qRCMFwf/Tm1cf2JLUANrT58rjkrrj15EtKhnJdm5/yvmsWKps7WKPP4jeUHe+NTO
 NovAGt67lG1l6LMLczZkWckOkWlyYjC42CPDLdxRUkk+zQRb3nRA8Nbt6VTNBOfQ
 0wTLOqXgsNXdSPSVUsKGL8kIAHNQTMX+7TjO6s7CXy/5Qag6r1iZX2HZxASOHxLa
 yYzaJ9pJRZBAMGnzV6L6v0J8KPnjYO0fB68S1qYQTbhoRxchtFF+0AIr1JydGgBI
 9RFUowTrSpJkZtcSjabopvZz4JfCRDP+eAxkyw13feji7MG1FMX74HgDdw+HhzTv
 R2/6iA5WcsmzcXopsfMx8lUP/KIfPw==
 =gnSc
 -----END PGP SIGNATURE-----

Merge tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:

 - Conversion of slub_debug stack traces to stackdepot, allowing more
   useful debugfs-based inspection for e.g. memory leak debugging.
   Allocation and free debugfs info now includes full traces and is
   sorted by the unique trace frequency.

   The stackdepot conversion was already attempted last year but
   reverted by ae14c63a9f. The memory overhead (while not actually
   enabled on boot) has been meanwhile solved by making the large
   stackdepot allocation dynamic. The xfstest issues haven't been
   reproduced on current kernel locally nor in -next, so the slab cache
   layout changes that originally made that bug manifest were probably
   not the root cause.

 - Refactoring of dma-kmalloc caches creation.

 - Trivial cleanups such as removal of unused parameters, fixes and
   clarifications of comments.

 - Hyeonggon Yoo joins as a reviewer.

* tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  MAINTAINERS: add myself as reviewer for slab
  mm/slub: remove unused kmem_cache_order_objects max
  mm: slab: fix comment for __assume_kmalloc_alignment
  mm: slab: fix comment for ARCH_KMALLOC_MINALIGN
  mm/slub: remove unneeded return value of slab_pad_check
  mm/slab_common: move dma-kmalloc caches creation into new_kmalloc_cache()
  mm/slub: remove meaningless node check in ___slab_alloc()
  mm/slub: remove duplicate flag in allocate_slab()
  mm/slub: remove unused parameter in setup_object*()
  mm/slab.c: fix comments
  slab, documentation: add description of debugfs files for SLUB caches
  mm/slub: sort debugfs output by frequency of stack traces
  mm/slub: distinguish and print stack traces in debugfs files
  mm/slub: use stackdepot to save stack trace in objects
  mm/slub: move struct track init out of set_track()
  lib/stackdepot: allow requesting early initialization dynamically
  mm/slub, kunit: Make slub_kunit unaffected by user specified flags
  mm/slab: remove some unused functions
2022-05-25 10:24:04 -07:00
Linus Torvalds
caa2898416 linux/types.h: reinstate "__bitwise__" macro for user space use
Commit c724c866bb ("linux/types.h: remove unnecessary __bitwise__")
was right that there are no users of __bitwise__ in the kernel, but it
turns out there are user space users of it that do expect it.

It is, after all, in the uapi directory, so user space usage is to be
expected.

Instead of reverting the commit completely, let's just clarify the
situation so that it doesn't happen again, and have some in-code
explanations for why that "__bitwise__" still exists.

Reported-by: Jiri Slaby <jirislaby@kernel.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Link: https://lore.kernel.org/all/b5c0a68d-8387-4909-beea-f70ab9e6e3d5@kernel.org/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-25 10:08:59 -07:00
Sean Young
e5499dd725 media: lirc: revert removal of unused feature flags
Commit b2a90f4fcb ("media: lirc: remove unused lirc features") removed
feature flags which were never implemented, but they are still used by
the lirc daemon went built from source.

Reinstate these symbols in order not to break the lirc build.

Fixes: b2a90f4fcb ("media: lirc: remove unused lirc features")
Link: https://lore.kernel.org/all/a0470450-ecfd-2918-e04a-7b57c1fd7694@kernel.org/
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-25 09:51:36 -07:00
Kan Liang
86dca36907 perf/x86/intel: Fix event constraints for ICL
According to the latest event list, the event encoding 0x55
INST_DECODED.DECODERS and 0x56 UOPS_DECODED.DEC0 are only available on
the first 4 counters. Add them into the event constraints table.

Fixes: 6017608936 ("perf/x86/intel: Add Icelake support")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220525133952.1660658-1-kan.liang@linux.intel.com
2022-05-25 15:55:52 +02:00
Juerg Haefliger
108ea7eb3e perf/x86/Kconfig: Fix indentation in the Kconfig file
The convention for indentation seems to be a single tab. Help text is
further indented by an additional two whitespaces. Fix the lines that
violate these rules.

Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20220525133949.53730-1-juerg.haefliger@canonical.com
2022-05-25 15:54:26 +02:00
Linus Walleij
1a23accae8
ARM: ixp4xx: Consolidate Kconfig fixing issue
The IXP4xx Kconfig we ended up with for mach-ixp4xx creates
as kismet warning:

   WARNING: unmet direct dependencies detected for GPIO_IXP4XX
     Depends on [n]: GPIOLIB [=y] && HAS_IOMEM [=y] && ARCH_IXP4XX [=y] && OF [=n]
     Selected by [y]:
     - ARCH_IXP4XX [=y] && <choice>

This is because it is possible to select ARCH_IXP4XX witout
OF while that selects the GPIO driver that now depends on
OF.

Fix this by creating a single ARCH_IXP4XX kconfig that selects
USE_OF.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Imre Kaloz <kaloz@openwrt.org>
Cc: Krzysztof Halasa <khalasa@piap.pl>
Link: https://lore.kernel.org/r/20220522072356.34062-1-linus.walleij@linaro.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-05-25 15:54:11 +02:00
Srinivas Pandruvada
4fe4f15523 Documentation: admin-guide: PM: Add Out of Band mode
Update documentation for using the tool to support performance level
change via OOB (Out of Band) interface.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-25 15:48:26 +02:00
sunliming
20eb48885b x86/idt: Remove unused headers
Commit:

  4b9a8dca0e ("x86/idt: Remove the tracing IDT completely")

removed the 'tracing IDT' from arch/x86/kernel/tracepoint.c,
but left related headers included - remove them.

[ mingo: Tweak changelog. ]

Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20220525012827.93464-1-sunliming@kylinos.cn
2022-05-25 15:45:00 +02:00