Commit Graph

1057014 Commits

Author SHA1 Message Date
Andy Shevchenko
f5d8061484 kernel.h: drop unneeded <linux/kernel.h> inclusion from other headers
Patch series "kernel.h further split", v5.

kernel.h is a set of something which is not related to each other and
often used in non-crossed compilation units, especially when drivers
need only one or two macro definitions from it.

This patch (of 7):

There is no evidence we need kernel.h inclusion in certain headers.
Drop unneeded <linux/kernel.h> inclusion from other headers.

[sfr@canb.auug.org.au: bottom_half.h needs kernel]
  Link: https://lkml.kernel.org/r/20211015202908.1c417ae2@canb.auug.org.au

Link: https://lkml.kernel.org/r/20211013170417.87909-1-andriy.shevchenko@linux.intel.com
Link: https://lkml.kernel.org/r/20211013170417.87909-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:49 -08:00
Stephen Brennan
da4d6b9cf8 proc: allow pid_revalidate() during LOOKUP_RCU
Problem Description:

When running running ~128 parallel instances of

  TZ=/etc/localtime ps -fe >/dev/null

on a 128CPU machine, the %sys utilization reaches 97%, and perf shows
the following code path as being responsible for heavy contention on the
d_lockref spinlock:

      walk_component()
        lookup_fast()
          d_revalidate()
            pid_revalidate() // returns -ECHILD
          unlazy_child()
            lockref_get_not_dead(&nd->path.dentry->d_lockref) <-- contention

The reason is that pid_revalidate() is triggering a drop from RCU to ref
path walk mode.  All concurrent path lookups thus try to grab a
reference to the dentry for /proc/, before re-executing pid_revalidate()
and then stepping into the /proc/$pid directory.  Thus there is huge
spinlock contention.

This patch allows pid_revalidate() to execute in RCU mode, meaning that
the path lookup can successfully enter the /proc/$pid directory while
still in RCU mode.  Later on, the path lookup may still drop into ref
mode, but the contention will be much reduced at this point.

By applying this patch, %sys utilization falls to around 85% under the
same workload, and the number of ps processes executed per unit time
increases by 3x-4x.  Although this particular workload is a bit
contrived, we have seen some large collections of eager monitoring
scripts which produced similarly high %sys time due to contention in the
/proc directory.

As a result this patch, Al noted that several procfs methods which were
only called in ref-walk mode could now be called from RCU mode.  To
ensure that this patch is safe, I audited all the inode get_link and
permission() implementations, as well as dentry d_revalidate()
implementations, in fs/proc.  The purpose here is to ensure that they
either are safe to call in RCU (i.e.  don't sleep) or correctly bail out
of RCU mode if they don't support it.  My analysis shows that all
at-risk procfs methods are safe to call under RCU, and thus this patch
is safe.

Procfs RCU-walk Analysis:

This analysis is up-to-date with 5.15-rc3.  When called under RCU mode,
these functions have arguments as follows:

* get_link() receives a NULL dentry pointer when called in RCU mode.
* permission() receives MAY_NOT_BLOCK in the mode parameter when called
  from RCU.
* d_revalidate() receives LOOKUP_RCU in flags.

For the following functions, either they are trivially RCU safe, or they
explicitly bail at the beginning of the function when they run:

proc_ns_get_link       (bails out)
proc_get_link          (RCU safe)
proc_pid_get_link      (bails out)
map_files_d_revalidate (bails out)
map_misc_d_revalidate  (bails out)
proc_net_d_revalidate  (RCU safe)
proc_sys_revalidate    (bails out, also not under /proc/$pid)
tid_fd_revalidate      (bails out)
proc_sys_permission    (not under /proc/$pid)

The remainder of the functions require a bit more detail:

* proc_fd_permission: RCU safe. All of the body of this function is
  under rcu_read_lock(), except generic_permission() which declares
  itself RCU safe in its documentation string.
* proc_self_get_link uses GFP_ATOMIC in the RCU case, so it is RCU aware
  and otherwise looks safe. The same is true of proc_thread_self_get_link.
* proc_map_files_get_link: calls ns_capable, which calls capable(), and
  thus calls into the audit code (see note #1 below). The remainder is
  just a call to the trivially safe proc_pid_get_link().
* proc_pid_permission: calls ptrace_may_access(), which appears RCU
  safe, although it does call into the "security_ptrace_access_check()"
  hook, which looks safe under smack and selinux. Just the audit code is
  of concern. Also uses get_task_struct() and put_task_struct(), see
  note #2 below.
* proc_tid_comm_permission: Appears safe, though calls put_task_struct
  (see note #2 below).

Note #1:
  Most of the concern of RCU safety has centered around the audit code.
  However, since b17ec22fb3 ("selinux: slow_avc_audit has become
  non-blocking"), it's safe to call this code under RCU. So all of the
  above are safe by my estimation.

Note #2: get_task_struct() and put_task_struct():
  The majority of get_task_struct() is under RCU read lock, and in any
  case it is a simple increment. But put_task_struct() is complex, given
  that it could at some point free the task struct, and this process has
  many steps which I couldn't manually verify. However, several other
  places call put_task_struct() under RCU, so it appears safe to use
  here too (see kernel/hung_task.c:165 or rcu/tree-stall.h:296)

Patch description:

pid_revalidate() drops from RCU into REF lookup mode.  When many threads
are resolving paths within /proc in parallel, this can result in heavy
spinlock contention on d_lockref as each thread tries to grab a
reference to the /proc dentry (and drop it shortly thereafter).

Investigation indicates that it is not necessary to drop RCU in
pid_revalidate(), as no RCU data is modified and the function never
sleeps.  So, remove the LOOKUP_RCU check.

Link: https://lkml.kernel.org/r/20211004175629.292270-2-stephen.s.brennan@oracle.com
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:49 -08:00
David Hildenbrand
ce2814622e virtio-mem: kdump mode to sanitize /proc/vmcore access
Although virtio-mem currently supports reading unplugged memory in the
hypervisor, this will change in the future, indicated to the device via
a new feature flag.

We similarly sanitized /proc/kcore access recently.  [1]

Let's register a vmcore callback, to allow vmcore code to check if a PFN
belonging to a virtio-mem device is either currently plugged and should
be dumped or is currently unplugged and should not be accessed, instead
mapping the shared zeropage or returning zeroes when reading.

This is important when not capturing /proc/vmcore via tools like
"makedumpfile" that can identify logically unplugged virtio-mem memory
via PG_offline in the memmap, but simply by e.g., copying the file.

Distributions that support virtio-mem+kdump have to make sure that the
virtio_mem module will be part of the kdump kernel or the kdump initrd;
dracut was recently [2] extended to include virtio-mem in the generated
initrd.  As long as no special kdump kernels are used, this will
automatically make sure that virtio-mem will be around in the kdump
initrd and sanitize /proc/vmcore access -- with dracut.

With this series, we'll send one virtio-mem state request for every ~2
MiB chunk of virtio-mem memory indicated in the vmcore that we intend to
read/map.

In the future, we might want to allow building virtio-mem for kdump mode
only, even without CONFIG_MEMORY_HOTPLUG and friends: this way, we could
support special stripped-down kdump kernels that have many other config
options disabled; we'll tackle that once required.  Further, we might
want to try sensing bigger blocks (e.g., memory sections) first before
falling back to device blocks on demand.

Tested with Fedora rawhide, which contains a recent kexec-tools version
(considering "System RAM (virtio_mem)" when creating the vmcore header)
and a recent dracut version (including the virtio_mem module in the
kdump initrd).

Link: https://lkml.kernel.org/r/20210526093041.8800-1-david@redhat.com [1]
Link: https://github.com/dracutdevs/dracut/pull/1157 [2]
Link: https://lkml.kernel.org/r/20211005121430.30136-10-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:48 -08:00
David Hildenbrand
ffc763d0c3 virtio-mem: factor out hotplug specifics from virtio_mem_remove() into virtio_mem_deinit_hotplug()
Let's prepare for a new virtio-mem kdump mode in which we don't actually
hot(un)plug any memory but only observe the state of device blocks.

Link: https://lkml.kernel.org/r/20211005121430.30136-9-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:48 -08:00
David Hildenbrand
84e17e684e virtio-mem: factor out hotplug specifics from virtio_mem_probe() into virtio_mem_init_hotplug()
Let's prepare for a new virtio-mem kdump mode in which we don't actually
hot(un)plug any memory but only observe the state of device blocks.

Link: https://lkml.kernel.org/r/20211005121430.30136-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:48 -08:00
David Hildenbrand
94300fcf4c virtio-mem: factor out hotplug specifics from virtio_mem_init() into virtio_mem_init_hotplug()
Let's prepare for a new virtio-mem kdump mode in which we don't actually
hot(un)plug any memory but only observe the state of device blocks.

Link: https://lkml.kernel.org/r/20211005121430.30136-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:48 -08:00
David Hildenbrand
cc5f2704c9 proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks
Let's support multiple registered callbacks, making sure that
registering vmcore callbacks cannot fail.  Make the callback return a
bool instead of an int, handling how to deal with errors internally.
Drop unused HAVE_OLDMEM_PFN_IS_RAM.

We soon want to make use of this infrastructure from other drivers:
virtio-mem, registering one callback for each virtio-mem device, to
prevent reading unplugged virtio-mem memory.

Handle it via a generic vmcore_cb structure, prepared for future
extensions: for example, once we support virtio-mem on s390x where the
vmcore is completely constructed in the second kernel, we want to detect
and add plugged virtio-mem memory ranges to the vmcore in order for them
to get dumped properly.

Handle corner cases that are unexpected and shouldn't happen in sane
setups: registering a callback after the vmcore has already been opened
(warn only) and unregistering a callback after the vmcore has already been
opened (warn and essentially read only zeroes from that point on).

Link: https://lkml.kernel.org/r/20211005121430.30136-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:48 -08:00
David Hildenbrand
2c9feeaedf proc/vmcore: let pfn_is_ram() return a bool
The callback should deal with errors internally, it doesn't make sense
to expose these via pfn_is_ram().  We'll rework the callbacks next.
Right now we consider errors as if "it's RAM"; no functional change.

Link: https://lkml.kernel.org/r/20211005121430.30136-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:48 -08:00
David Hildenbrand
934fadf438 x86/xen: print a warning when HVMOP_get_mem_type fails
HVMOP_get_mem_type is not expected to fail, "This call failing is
indication of something going quite wrong and it would be good to know
about this." [1]

Let's add a pr_warn_once().

Link: https://lkml.kernel.org/r/3b935aa0-6d85-0bcd-100e-15098add3c4c@oracle.com [1]
Link: https://lkml.kernel.org/r/20211005121430.30136-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Young <dyoung@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:48 -08:00
David Hildenbrand
d452a48949 x86/xen: simplify xen_oldmem_pfn_is_ram()
Let's simplify return handling.

Link: https://lkml.kernel.org/r/20211005121430.30136-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:48 -08:00
David Hildenbrand
434b90f39e x86/xen: update xen_oldmem_pfn_is_ram() documentation
After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem,
this series tackles the last sane way how a VM could accidentially
access logically unplugged memory managed by a virtio-mem device:
/proc/vmcore

When dumping memory via "makedumpfile", PG_offline pages, used by
virtio-mem to flag logically unplugged memory, are already properly
excluded; however, especially when accessing/copying /proc/vmcore "the
usual way", we can still end up reading logically unplugged memory part
of a virtio-mem device.

Patch #1-#3 are cleanups.  Patch #4 extends the existing
oldmem_pfn_is_ram mechanism.  Patch #5-#7 are virtio-mem refactorings
for patch #8, which implements the virtio-mem logic to query the state
of device blocks.

Patch #8:
 "Although virtio-mem currently supports reading unplugged memory in the
  hypervisor, this will change in the future, indicated to the device
  via a new feature flag. We similarly sanitized /proc/kcore access
  recently.
  [...]
  Distributions that support virtio-mem+kdump have to make sure that the
  virtio_mem module will be part of the kdump kernel or the kdump
  initrd; dracut was recently [2] extended to include virtio-mem in the
  generated initrd. As long as no special kdump kernels are used, this
  will automatically make sure that virtio-mem will be around in the
  kdump initrd and sanitize /proc/vmcore access -- with dracut"

This is the last remaining bit to support
VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of
virtio-mem.

Note: this is best-effort.  We'll never be able to control what runs
inside the second kernel, really, but we also don't have to care: we
only care about sane setups where we don't want our VM getting zapped
once we touch the wrong memory location while dumping.  While we usually
expect sane setups to use "makedumfile", nothing really speaks against
just copying /proc/vmcore, especially in environments where HWpoisioning
isn't typically expected.  Also, we really don't want to put all our
trust completely on the memmap, so sanitizing also makes sense when just
using "makedumpfile".

[1] https://lkml.kernel.org/r/20210526093041.8800-1-david@redhat.com
[2] https://github.com/dracutdevs/dracut/pull/1157
[3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html

This patch (of 9):

The callback is only used for the vmcore nowadays.

Link: https://lkml.kernel.org/r/20211005121430.30136-1-david@redhat.com
Link: https://lkml.kernel.org/r/20211005121430.30136-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrvsky@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:48 -08:00
Florian Weimer
0658a0961b procfs: do not list TID 0 in /proc/<pid>/task
If a task exits concurrently, task_pid_nr_ns may return 0.

[akpm@linux-foundation.org: coding style tweaks]
[adobriyan@gmail.com: test that /proc/*/task doesn't contain "0"]
  Link: https://lkml.kernel.org/r/YV88AnVzHxPafQ9o@localhost.localdomain

Link: https://lkml.kernel.org/r/8735pn5dx7.fsf@oldenburg.str.redhat.com
Signed-off-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:48 -08:00
zhangyiru
83c1fd763b mm,hugetlb: remove mlock ulimit for SHM_HUGETLB
Commit 21a3c273f8 ("mm, hugetlb: add thread name and pid to
SHM_HUGETLB mlock rlimit warning") marked this as deprecated in 2012,
but it is not deleted yet.

Mike says he still sees that message in log files on occasion, so maybe we
should preserve this warning.

Also remove hugetlbfs related user_shm_unlock in ipc/shm.c and remove the
user_shm_unlock after out.

Link: https://lkml.kernel.org/r/20211103105857.25041-1-zhangyiru3@huawei.com
Signed-off-by: zhangyiru <zhangyiru3@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Liu Zixian <liuzixian4@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: wuxu.wu <wuxu.wu@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:48 -08:00
Johannes Weiner
51b8c1fe25 vfs: keep inodes with page cache off the inode shrinker LRU
Historically (pre-2.5), the inode shrinker used to reclaim only empty
inodes and skip over those that still contained page cache.  This caused
problems on highmem hosts: struct inode could put fill lowmem zones
before the cache was getting reclaimed in the highmem zones.

To address this, the inode shrinker started to strip page cache to
facilitate reclaiming lowmem.  However, this comes with its own set of
problems: the shrinkers may drop actively used page cache just because
the inodes are not currently open or dirty - think working with a large
git tree.  It further doesn't respect cgroup memory protection settings
and can cause priority inversions between containers.

Nowadays, the page cache also holds non-resident info for evicted cache
pages in order to detect refaults.  We've come to rely heavily on this
data inside reclaim for protecting the cache workingset and driving swap
behavior.  We also use it to quantify and report workload health through
psi.  The latter in turn is used for fleet health monitoring, as well as
driving automated memory sizing of workloads and containers, proactive
reclaim and memory offloading schemes.

The consequences of dropping page cache prematurely is that we're seeing
subtle and not-so-subtle failures in all of the above-mentioned
scenarios, with the workload generally entering unexpected thrashing
states while losing the ability to reliably detect it.

To fix this on non-highmem systems at least, going back to rotating
inodes on the LRU isn't feasible.  We've tried (commit a76cf1a474
("mm: don't reclaim inodes with many attached pages")) and failed
(commit 69056ee6a8 ("Revert "mm: don't reclaim inodes with many
attached pages"")).

The issue is mostly that shrinker pools attract pressure based on their
size, and when objects get skipped the shrinkers remember this as
deferred reclaim work.  This accumulates excessive pressure on the
remaining inodes, and we can quickly eat into heavily used ones, or
dirty ones that require IO to reclaim, when there potentially is plenty
of cold, clean cache around still.

Instead, this patch keeps populated inodes off the inode LRU in the
first place - just like an open file or dirty state would.  An otherwise
clean and unused inode then gets queued when the last cache entry
disappears.  This solves the problem without reintroducing the reclaim
issues, and generally is a bit more scalable than having to wade through
potentially hundreds of thousands of busy inodes.

Locking is a bit tricky because the locks protecting the inode state
(i_lock) and the inode LRU (lru_list.lock) don't nest inside the
irq-safe page cache lock (i_pages.xa_lock).  Page cache deletions are
serialized through i_lock, taken before the i_pages lock, to make sure
depopulated inodes are queued reliably.  Additions may race with
deletions, but we'll check again in the shrinker.  If additions race
with the shrinker itself, we're protected by the i_lock: if find_inode()
or iput() win, the shrinker will bail on the elevated i_count or
I_REFERENCED; if the shrinker wins and goes ahead with the inode, it
will set I_FREEING and inhibit further igets(), which will cause the
other side to create a new instance of the inode instead.

Link: https://lkml.kernel.org/r/20210614211904.14420-4-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:48 -08:00
Ming Lei
26af1cd003 nvme: wait until quiesce is done
NVMe uses one atomic flag to check if quiesce is needed. If quiesce is
started, the helper returns immediately. This way is wrong, since we
have to wait until quiesce is done.

Fixes: e70feb8b3e ("blk-mq: support concurrent queue quiesce/unquiesce")
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211109071144.181581-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-09 08:14:27 -07:00
Ming Lei
93542fbfa7 scsi: make sure that request queue queiesce and unquiesce balanced
For fixing queue quiesce race between driver and block layer(elevator
switch, update nr_requests, ...), we need to support concurrent quiesce
and unquiesce, which requires the two call balanced.

It isn't easy to audit that in all scsi drivers, especially the two may
be called from different contexts, so do it in scsi core with one
per-device atomic variable to balance quiesce and unquiesce.

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: e70feb8b3e ("blk-mq: support concurrent queue quiesce/unquiesce")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211109071144.181581-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-09 08:14:27 -07:00
Ming Lei
d2b9f12b0f scsi: avoid to quiesce sdev->request_queue two times
For fixing queue quiesce race between driver and block layer(elevator
switch, update nr_requests, ...), we need to support concurrent quiesce
and unquiesce, which requires the two to be balanced.

blk_mq_quiesce_queue() calls blk_mq_quiesce_queue_nowait() for updating
quiesce depth and marking the flag, then scsi_internal_device_block() calls
blk_mq_quiesce_queue_nowait() two times actually.

Fix the double quiesce and keep quiesce and unquiesce balanced.

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: e70feb8b3e ("blk-mq: support concurrent queue quiesce/unquiesce")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211109071144.181581-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-09 08:14:27 -07:00
Ming Lei
9ef4d0209c blk-mq: add one API for waiting until quiesce is done
Some drivers(NVMe, SCSI) need to call quiesce and unquiesce in pair, but it
is hard to switch to this style, so these drivers need one atomic flag for
helping to balance quiesce and unquiesce.

When quiesce is in-progress, the driver still needs to wait until
the quiesce is done, so add API of blk_mq_wait_quiesce_done() for
these drivers.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211109071144.181581-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-09 08:14:27 -07:00
Kishon Vijay Abraham I
eb91224e47 dmaengine: ti: k3-udma: Set r/tchan or rflow to NULL if request fail
udma_get_*() checks if rchan/tchan/rflow is already allocated by checking
if it has a NON NULL value. For the error cases, rchan/tchan/rflow will
have error value and udma_get_*() considers this as already allocated
(PASS) since the error values are NON NULL. This results in NULL pointer
dereference error while de-referencing rchan/tchan/rflow.

Reset the value of rchan/tchan/rflow to NULL if a channel request fails.

CC: stable@vger.kernel.org
Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Link: https://lore.kernel.org/r/20211031032411.27235-3-kishon@ti.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-11-09 11:24:06 +05:30
Kishon Vijay Abraham I
5c6c6d60e4 dmaengine: ti: k3-udma: Set bchan to NULL if a channel request fail
bcdma_get_*() checks if bchan is already allocated by checking if it
has a NON NULL value. For the error cases, bchan will have error value
and bcdma_get_*() considers this as already allocated (PASS) since the
error values are NON NULL. This results in NULL pointer dereference
error while de-referencing bchan.

Reset the value of bchan to NULL if a channel request fails.

CC: stable@vger.kernel.org
Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Link: https://lore.kernel.org/r/20211031032411.27235-2-kishon@ti.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-11-09 11:24:06 +05:30
Arnd Bergmann
2498363310 dmaengine: stm32-dma: avoid 64-bit division in stm32_dma_get_max_width
Using the % operator on a 64-bit variable is expensive and can
cause a link failure:

arm-linux-gnueabi-ld: drivers/dma/stm32-dma.o: in function `stm32_dma_get_max_width':
stm32-dma.c:(.text+0x170): undefined reference to `__aeabi_uldivmod'
arm-linux-gnueabi-ld: drivers/dma/stm32-dma.o: in function `stm32_dma_set_xfer_param':
stm32-dma.c:(.text+0x1cd4): undefined reference to `__aeabi_uldivmod'

As we know that we just want to check the alignment in
stm32_dma_get_max_width(), there is no need for a full division, and
using a simple mask is a faster replacement.

Same in stm32_dma_set_xfer_param(), change this to only allow burst
transfers if the address is a multiple of the length.
stm32_dma_get_best_burst just after will take buf_len into account to fix
burst in case of misalignment.

Fixes: b20fd5fa31 ("dmaengine: stm32-dma: fix stm32_dma_get_max_width")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Amelie Delaunay <amelie.delaunay@foss.st.com>
Link: https://lore.kernel.org/r/20211103153312.41483-1-amelie.delaunay@foss.st.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-11-09 11:20:35 +05:30
Linus Torvalds
d2f38a3c65 - Fix-ups
- Standardise *_exit() and *_remove() return values; ili9320, vgg2432a4
 
  - Bug Fixes
    - Do not override maximum brightness; backlight
    - Propagate errors from get_brightness(); backlight
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdrbJNaO+IJqU8IdIUa+KL4f8d2EFAmGJTUMACgkQUa+KL4f8
 d2FQ+w/+PMsuCZwyf1tcIfUTZJA5y5n8vnsLLzIkkjLlnaNbtD+y85gZzLWa0CgC
 H3nyKrcV/rbaTC5mAM+meKcmDxfbCGYDL7bleIZ2aQd1YtmvGnmspGnNwl8vHTjA
 DN8eagNssW+V1i/uPbKN76RRaOMleIwm/zjA4TZATMs6Q1MddDeJN4g5mIqbcd1i
 k5aE2Jc9ZqPuE18w7I5gOuibGo8If1IroeTJ1V2/npJJhlWAA+nJzH2bGG7I1OQm
 4AjrJ+n8twKhAbaFhi3IYcQhpbGACGo/HZ3SGS3sPjlqo2+/vdrsTqjCkiQREfZ1
 5QmOHk1/X0bFTcO5u6rhldq3qbS3qg7DwRzTb8LmY3bcPVHj5JD8jJcbnHgzXkOz
 UncS0Y5sMImn+BKDVVvsosMH45XnZtngyYEEbgpzMI3JXGf9LxilxCVTZ8rrfogF
 TYbM+i1bzs3Tvl3mKaecSqvsKJgJUVNKFOSb0oGlXd9WaEb3WmTgFi6CfATeokif
 buTdPGt2rD2WcanNFV8A32z3byuvS8C7EXRkprhbUVjkO//rOam7HgR8SBP6rkRo
 IYs2CUAviawfiCypGPUCjBz6JwgcG9uQrdpFBrWEp26dkx+m/MP5OvkuwgjIPNrI
 4Bu12kU1mTRrR9CenJP1Ka/1fKQbQaFsNxiKM3C7SYOnt90XMRs=
 =PbxD
 -----END PGP SIGNATURE-----

Merge tag 'backlight-next-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight

Pull backlight updates from Lee Jones:
 "Fix-ups:
   - Standardise *_exit() and *_remove() return values in ili9320 and
     vgg2432a4

  Bug Fixes:
   - Do not override maximum brightness
   - Propagate errors from get_brightness()"

* tag 'backlight-next-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
  video: backlight: ili9320: Make ili9320_remove() return void
  backlight: Propagate errors from get_brightness()
  video: backlight: Drop maximum brightness override for brightness zero
2021-11-08 12:21:28 -08:00
Linus Torvalds
3a9b0a46e1 - Remove Drivers
- Remove support for TI TPS80031/TPS80032 PMICs
 
  - New Device Support
    - Add support for Magnetic Reader to TI AM335x
    - Add support for DA9063_EA to Dialog DA9063
    - Add support for SC2730 PMIC to Spreadtrum SC27xx
    - Add support for MacBookPro16,2 ICL-N UART Intel LPSS PCI
    - Add support for lots of new PMICS in QCom SPMI PMIC
    - Add support for ADC to Diolan DLN2
 
  - New Functionality
    - Add support for Power Off to Rockchip RK817
 
  - Fix-ups
    - Simplify Regmap passing to child devices; hi6421-spmi-pmic
    - SPDX licensing updates; ti_am335x_tscadc
    - Improve error handling; ti_am335x_tscadc
    - Expedite clock search; ti_am335x_tscadc
    - Generic simplifications; ti_am335x_tscadc
    - Use generic macros/defines; ti_am335x_tscadc
    - Remove unused code; ti_am335x_tscadc, cros_ec_dev
    - Convert to GPIOD; wcd934x
    - Add namespacing; ti_am335x_tscadc
    - Restrict compilation to relevant arches; intel_pmt
    - Provide better description/documentation; exynos_lpass
    - Add SPI device ID table; altera-a10sr, motorola-cpcap, sprd-sc27xx-spi
    - Change IRQ handling; qcom-pm8xxx
    - Split out I2C and SPI code; arizona
    - Explicitly include used headers; altera-a10sr
    - Convert sysfs show() function to; sysfs_emit
    - Standardise *_exit() and *_remove() return values; mc13xxx, stmpe, tps65912
    - Trivial (style/spelling/whitespace) fixups; ti_am335x_tscadc, qcom-spmi-pmic,
                                                  max77686-private
    - Device Tree fix-ups; ti,am3359-tscadc, samsung,s2mps11, samsung,s2mpa01,
                           samsung,s5m8767, brcm,misc, brcm,cru, syscon, qcom,tcsr,
 			  xylon,logicvc, max77686, x-powers,ac100, x-powers,axp152,
 			  x-powers,axp209-gpio, syscon, qcom,spmi-pmic
 
  - Bug Fixes
    - Balance refcounting (get/put); ti_am335x_tscadc, mfd-core
    - Fix IRQ trigger type; sec-irq, max77693, max14577
    - Repair off-by-one; altera-sysmgr
    - Add explicit 'select MFD_CORE' to MFD_SIMPLE_MFD_I2C
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdrbJNaO+IJqU8IdIUa+KL4f8d2EFAmGJTIAACgkQUa+KL4f8
 d2FYsRAAhcTUP7PH5gWko1mQnCzh6h3Q7iQ1MHEokZgIvqc/U2Zmxu57cF9f3jOt
 goZdVsU7x6qiMD4SfmInyEp32Emo1pbUTVz6kB3o0G+YACPHOU17xyKuh0FnzQkm
 yu/EbEDYNPbNWx9BTA9wgjSOTzCrKMBSd/p9zPzq9M69ihAf2uE9sn5Hbmso1Pdu
 tSJ7XYqWVwYzZh8OVzQd6lEIDkA+o+/gR4nCgxqAvGiXQq6yVVOCpnNzj4GrAcep
 hkuQVkg14+rmXRbLiZsmc1V+yT13bueKu2fD96gMFpXI8NkR1KZ6QRInI6FtJcl/
 m2LGPUuICpd2IiKRa1XtXFZWcMbZ2JVjJSWArgfHj7YBs9+0KcRsbpfHHirpcf14
 9LFy4TzjX2A1K0vvKhHSTAhh13HFcvWyd0GCrEhLRmapeiLDXohkUHGMVFVedXzE
 tQLCEByjcL+/OCJiQ4Jwk1aaU2cAVEXtvYuciXcBOtHkfaQR/bOYwjRm4Z3AdZyU
 zLYMkw/LWvzAaV3Rh1zP6W47WLFHbeMgTmApFOSxAbRsmun0loasVzXWrkvxZlYF
 p39l4UcSOIK08PzxqF9ZEM/LtUglShbZbg2wf0VSHzomA+oIsxT7fN16vPHLYDYL
 tsQ5fYVN0a3j4ltKFeQl7l2HV/ZzUI/Q6iGmMia5sFbwRN8tlZM=
 =SJ7N
 -----END PGP SIGNATURE-----

Merge tag 'mfd-next-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd

Pull MFD updates from Lee Jones:
 "Removed Drivers:
   - Remove support for TI TPS80031/TPS80032 PMICs

  New Device Support:
   - Add support for Magnetic Reader to TI AM335x
   - Add support for DA9063_EA to Dialog DA9063
   - Add support for SC2730 PMIC to Spreadtrum SC27xx
   - Add support for MacBookPro16,2 ICL-N UART Intel LPSS PCI
   - Add support for lots of new PMICS in QCom SPMI PMIC
   - Add support for ADC to Diolan DLN2

  New Functionality:
   - Add support for Power Off to Rockchip RK817

  Fix-ups:
   - Simplify Regmap passing to child devices in hi6421-spmi-pmic
   - SPDX licensing updates in ti_am335x_tscadc
   - Improve error handling in ti_am335x_tscadc
   - Expedite clock search in ti_am335x_tscadc
   - Generic simplifications in ti_am335x_tscadc
   - Use generic macros/defines in ti_am335x_tscadc
   - Remove unused code in ti_am335x_tscadc, cros_ec_dev
   - Convert to GPIOD in wcd934x
   - Add namespacing in ti_am335x_tscadc
   - Restrict compilation to relevant arches in intel_pmt
   - Provide better description/documentation in exynos_lpass
   - Add SPI device ID table in altera-a10sr, motorola-cpcap,
     sprd-sc27xx-spi
   - Change IRQ handling in qcom-pm8xxx
   - Split out I2C and SPI code in arizona
   - Explicitly include used headers in altera-a10sr
   - Convert sysfs show() function to in sysfs_emit
   - Standardise *_exit() and *_remove() return values in mc13xxx,
     stmpe, tps65912
   - Trivial (style/spelling/whitespace) fixups in ti_am335x_tscadc,
     qcom-spmi-pmic, max77686-private
   - Device Tree fix-ups in ti,am3359-tscadc, samsung,s2mps11,
     samsung,s2mpa01, samsung,s5m8767, brcm,misc, brcm,cru, syscon,
     qcom,tcsr, xylon,logicvc, max77686, x-powers,ac100,
     x-powers,axp152, x-powers,axp209-gpio, syscon, qcom,spmi-pmic

  Bug Fixes:
   - Balance refcounting (get/put) in ti_am335x_tscadc, mfd-core
   - Fix IRQ trigger type in sec-irq, max77693, max14577
   - Repair off-by-one in altera-sysmgr
   - Add explicit 'select MFD_CORE' to MFD_SIMPLE_MFD_I2C"

* tag 'mfd-next-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (95 commits)
  mfd: simple-mfd-i2c: Select MFD_CORE to fix build error
  mfd: tps80031: Remove driver
  mfd: max77686: Correct tab-based alignment of register addresses
  mfd: wcd934x: Replace legacy gpio interface for gpiod
  dt-bindings: mfd: qcom: pm8xxx: Add pm8018 compatible
  mfd: dln2: Add cell for initializing DLN2 ADC
  mfd: qcom-spmi-pmic: Add missing PMICs supported by socinfo
  mfd: qcom-spmi-pmic: Document ten more PMICs in the binding
  mfd: qcom-spmi-pmic: Sort compatibles in the driver
  mfd: qcom-spmi-pmic: Sort the compatibles in the binding
  mfd: janz-cmoio: Replace snprintf in show functions with sysfs_emit
  mfd: altera-a10sr: Include linux/module.h
  mfd: tps65912: Make tps65912_device_exit() return void
  mfd: stmpe: Make stmpe_remove() return void
  mfd: mc13xxx: Make mc13xxx_common_exit() return void
  dt-bindings: mfd: syscon: Add samsung,exynosautov9-sysreg compatible
  mfd: altera-sysmgr: Fix a mistake caused by resource_size conversion
  dt-bindings: gpio: Convert X-Powers AXP209 GPIO binding to a schema
  dt-bindings: mfd: syscon: Add rk3368 QoS register compatible
  mfd: arizona: Split of_match table into I2C and SPI versions
  ...
2021-11-08 12:07:52 -08:00
Linus Torvalds
d20f7a09e5 gpio updates for v5.16
- new driver: gpio-modepin (plus relevant change in zynqmp firmware)
 - add interrupt support to gpio-virtio
 - enable the 'gpio-line-names' property in the DT bindings for gpio-rockchip
 - use the subsystem helpers where applicable in gpio-uniphier instead of
   accessing IRQ structures directly
 - code shrink in gpio-xilinx
 - add interrupt to gpio-mlxbf2 (and include the removal of custom interrupt
   code from the mellanox ethernet driver)
 - support multiple interrupts per bank in gpio-tegra186 (and force one interrupt
   per bank in older models)
 - fix GPIO line IRQ offset calculation in gpio-realtek-otto
 - drop unneeded MODULE_ALIAS expansions in multiple drivers
 - code cleanup in gpio-aggregator
 - minor improvements in gpio-max730x and gpio-mc33880
 - Kconfig cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmGJH+MACgkQEacuoBRx
 13JjGg//YKoAby35hcclft5xJYDwnx2or001TJ+nFLOQ9d24gUI7+UMFVnZ2ifMP
 9F/yO0934pkHNUqcTuIMbeALW6Cg2DglXJFCQSXT6Ng5ETjENOhLHRa4bRdoIEim
 HC+YJg/DS6CpSkMAH09ljZfkTPz3B2DHt+Y4phIDg2PIrt+UUsN/RC685B4jSwds
 pcAwdP1mqBV1da8uRjtfc7UcFSxR6/lvBBWQSo9RAEse5R6Rmm7XIOKBlQ+Exdts
 jAmVqxz7BnfuYug9Y3y3fOiAQYoRvZBaSjkeMJALub3GEd3W30vAKoQhq7gC9ea3
 XvCj7V6C59JVp9aQUXo1cL6VqcesSGMWoRi2vll8/7gluXV2gCoNPVb+9KwxG9Ed
 +xwXBpCsOVQCXl8jgIHBWNfxOS31wafWlK8Ng42Cjov+uQfsccgufTvhQXBumKbM
 oECSOgjg9k3/AMpym3buOgAzqSec5R1iiSWHNPT+P6cSFXa2Bl3A8X0RgF/TwXPC
 Bn9BDfSiYfOcD6y6yx2GPMCrCXNJRWgfJ/2j3Jsb66XtGwtBDhmK7rcqcg9qIws9
 mWWyISiJgO0gfk24hIIw7OxeKCidm3iVskryJAQUfTX+BBcR5nZfiuF3EeDXZv95
 whkSoN7J5fmMYDDJ7wd0pFwbW5fe5C/i/2tjOlNBgdXsNgAv/r4=
 =mKh0
 -----END PGP SIGNATURE-----

Merge tag 'gpio-updates-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio updates from Bartosz Golaszewski:
 "We have a single new driver, new features in others and some cleanups
  all over the place.

  Nothing really stands out and it is all relatively small.

   - new driver: gpio-modepin (plus relevant change in zynqmp firmware)

   - add interrupt support to gpio-virtio

   - enable the 'gpio-line-names' property in the DT bindings for
     gpio-rockchip

   - use the subsystem helpers where applicable in gpio-uniphier instead
     of accessing IRQ structures directly

   - code shrink in gpio-xilinx

   - add interrupt to gpio-mlxbf2 (and include the removal of custom
     interrupt code from the mellanox ethernet driver)

   - support multiple interrupts per bank in gpio-tegra186 (and force
     one interrupt per bank in older models)

   - fix GPIO line IRQ offset calculation in gpio-realtek-otto

   - drop unneeded MODULE_ALIAS expansions in multiple drivers

   - code cleanup in gpio-aggregator

   - minor improvements in gpio-max730x and gpio-mc33880

   - Kconfig cleanups"

* tag 'gpio-updates-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  virtio_gpio: drop packed attribute
  gpio: virtio: Add IRQ support
  gpio: realtek-otto: fix GPIO line IRQ offset
  gpio: clean up Kconfig file
  net: mellanox: mlxbf_gige: Replace non-standard interrupt handling
  gpio: mlxbf2: Introduce IRQ support
  gpio: mc33880: Drop if with an always false condition
  gpio: max730x: Make __max730x_remove() return void
  gpio: aggregator: Wrap access to gpiochip_fwd.tmp[]
  gpio: modepin: Add driver support for modepin GPIO controller
  dt-bindings: gpio: zynqmp: Add binding documentation for modepin
  firmware: zynqmp: Add MMIO read and write support for PS_MODE pin
  gpio: tps65218: drop unneeded MODULE_ALIAS
  gpio: max77620: drop unneeded MODULE_ALIAS
  gpio: xilinx: simplify getting .driver_data
  gpio: tegra186: Support multiple interrupts per bank
  gpio: tegra186: Force one interrupt per bank
  gpio: uniphier: Use helper functions to get private data from IRQ data
  gpio: uniphier: Use helper function to get IRQ hardware number
  dt-bindings: gpio: add gpio-line-names to rockchip,gpio-bank.yaml
2021-11-08 11:55:21 -08:00
Linus Torvalds
dd72945c43 cxl for v5.16
- Fix support for platforms that do not enumerate every ACPI0016 (CXL
   Host Bridge) in the CHBS (ACPI Host Bridge Structure).
 
 - Introduce a common pci_find_dvsec_capability() helper, clean up open
   coded implementations in various drivers.
 
 - Add 'cxl_test' for regression testing CXL subsystem ABIs. 'cxl_test'
   is a module built from tools/testing/cxl/ that mocks up a CXL topology
   to augment the nascent support for emulation of CXL devices in QEMU.
 
 - Convert libnvdimm to use the uuid API.
 
 - Complete the definition of CXL namespace labels in libnvdimm.
 
 - Tunnel libnvdimm label operations from nd_ioctl() back to the CXL
   mailbox driver. Enable 'ndctl {read,write}-labels' for CXL.
 
 - Continue to sort and refactor functionality into distinct driver and
   core-infrastructure buckets. For example, mailbox handling is now a
   generic core capability consumed by the PCI and cxl_test drivers.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYYRtyQAKCRDfioYZHlFs
 Z7UsAP9DzUN6IWWnYk1R95YXYVxFriRtRsBjujAqTg49EMghawEAoHaA9lxO3Hho
 l25TLYUOmB/zFTlUbe6YQptMJZ5YLwY=
 =im9j
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl updates from Dan Williams:
 "More preparation and plumbing work in the CXL subsystem.

  From an end user perspective the highlight here is lighting up the CXL
  Persistent Memory related commands (label read / write) with the
  generic ioctl() front-end in LIBNVDIMM.

  Otherwise, the ability to instantiate new persistent and volatile
  memory regions is still on track for v5.17.

  Summary:

   - Fix support for platforms that do not enumerate every ACPI0016 (CXL
     Host Bridge) in the CHBS (ACPI Host Bridge Structure).

   - Introduce a common pci_find_dvsec_capability() helper, clean up
     open coded implementations in various drivers.

   - Add 'cxl_test' for regression testing CXL subsystem ABIs.
     'cxl_test' is a module built from tools/testing/cxl/ that mocks up
     a CXL topology to augment the nascent support for emulation of CXL
     devices in QEMU.

   - Convert libnvdimm to use the uuid API.

   - Complete the definition of CXL namespace labels in libnvdimm.

   - Tunnel libnvdimm label operations from nd_ioctl() back to the CXL
     mailbox driver. Enable 'ndctl {read,write}-labels' for CXL.

   - Continue to sort and refactor functionality into distinct driver
     and core-infrastructure buckets. For example, mailbox handling is
     now a generic core capability consumed by the PCI and cxl_test
     drivers"

* tag 'cxl-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (34 commits)
  ocxl: Use pci core's DVSEC functionality
  cxl/pci: Use pci core's DVSEC functionality
  PCI: Add pci_find_dvsec_capability to find designated VSEC
  cxl/pci: Split cxl_pci_setup_regs()
  cxl/pci: Add @base to cxl_register_map
  cxl/pci: Make more use of cxl_register_map
  cxl/pci: Remove pci request/release regions
  cxl/pci: Fix NULL vs ERR_PTR confusion
  cxl/pci: Remove dev_dbg for unknown register blocks
  cxl/pci: Convert register block identifiers to an enum
  cxl/acpi: Do not fail cxl_acpi_probe() based on a missing CHBS
  cxl/pci: Disambiguate cxl_pci further from cxl_mem
  Documentation/cxl: Add bus internal docs
  cxl/core: Split decoder setup into alloc + add
  tools/testing/cxl: Introduce a mock memory device + driver
  cxl/mbox: Move command definitions to common location
  cxl/bus: Populate the target list at decoder create
  tools/testing/cxl: Introduce a mocked-up CXL port hierarchy
  cxl/pmem: Add support for multiple nvdimm-bridge objects
  cxl/pmem: Translate NVDIMM label commands to CXL label commands
  ...
2021-11-08 11:49:48 -08:00
Linus Torvalds
dab334c98b Merge branch 'i2c/for-mergewindow' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:

 - big refactoring of the PASEMI driver to support the Apple M1

 - huge improvements to the XIIC in terms of locking and SMP safety

 - refactoring and clean ups for the i801 driver

... and the usual bunch of small driver updates

* 'i2c/for-mergewindow' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (43 commits)
  i2c: amd-mp2-plat: ACPI: Use ACPI_COMPANION() directly
  i2c: i801: Add support for Intel Ice Lake PCH-N
  i2c: virtio: update the maintainer to Conghui
  i2c: xlr: Fix a resource leak in the error handling path of 'xlr_i2c_probe()'
  i2c: qup: move to use request_irq by IRQF_NO_AUTOEN flag
  i2c: qup: fix a trivial typo
  i2c: tegra: Ensure that device is suspended before driver is removed
  i2c: i801: Fix incorrect and needless software PEC disabling
  i2c: mediatek: Dump i2c/dma register when a timeout occurs
  i2c: mediatek: Reset the handshake signal between i2c and dma
  i2c: mlxcpld: Allow flexible polling time setting for I2C transactions
  i2c: pasemi: Set enable bit for Apple variant
  i2c: pasemi: Add Apple platform driver
  i2c: pasemi: Refactor _probe to use devm_*
  i2c: pasemi: Allow to configure bus frequency
  i2c: pasemi: Move common reset code to own function
  i2c: pasemi: Split pci driver to its own file
  i2c: pasemi: Split off common probing code
  i2c: pasemi: Remove usage of pci_dev
  i2c: pasemi: Use dev_name instead of port number
  ...
2021-11-08 11:46:10 -08:00
Linus Torvalds
206825f50f Core:
* Remove obsolete macros only used by the old nand_ecclayout struct
 * Don't remove debugfs directory if device is in use
 * MAINTAINERS:
   - Add entry for Qualcomm NAND controller driver
   - Update the devicetree documentation path of hyperbus
 
 MTD devices:
 * block2mtd:
   - Add support for an optional custom MTD label
   - Minor refactor to avoid hard coded constant
 * mtdswap: Remove redundant assignment of pointer eb
 
 CFI:
 * Fixup CFI on ixp4xx
 
 Raw NAND controller drivers:
 * Arasan:
   - Prevent an unsupported configuration
 * Xway, Socrates: plat_nand, Pasemi, Orion, mpc5121, GPIO, Au1550nd, AMS-Delta:
   - Keep the driver compatible with on-die ECC engines
 * cs553x, lpc32xx_slc, ndfc, sharpsl, tmio, txx9ndfmc:
   - Revert the commits: "Fix external use of SW Hamming ECC helper"
   - And let callers use the bare Hamming helpers
 * Fsmc: Fix use of SM ORDER
 * Intel:
   - Fix potential buffer overflow in probe
 * xway, vf610, txx9ndfm, tegra, stm32, plat_nand, oxnas, omap, mtk, hisi504,
   gpmi, gpio, denali, bcm6368, atmel:
   - Make use of the helper function devm_platform_ioremap_resource{,byname}()
 
 Onenand drivers:
 * Samsung: Drop Exynos4 and describe driver in KConfig
 
 Raw NAND chip drivers:
 * Hynix: Add support for H27UCG8T2ETR-BC MLC NAND
 
 SPI NOR core:
 * Add spi-nor device tree binding under SPI NOR maintainers
 
 SPI NOR manufacturer drivers:
 * Enable locking for n25q128a13
 
 SPI NOR controller drivers:
 * Use devm_platform_ioremap_resource_byname()
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEE9HuaYnbmDhq/XIDIJWrqGEe9VoQFAmGIAdEACgkQJWrqGEe9
 VoQHEAgAkFvFxIOyPSwAJfyeEklWmGSh+cm+X9EqWxZ5d6iwFE3FZH4X4XiqdyVY
 +rzy37o6Tp3pQVYDqxmPM8GGoHy8wYPR5h/U1IozbBaNeCJma1HyK4Sjb/cuX1eQ
 23EM1pWT8N9CbRG2S9mz5C9uHcP9mImtyvU4WhTLKOEc9XGbpuj+k1cacSuEEbeF
 rZklOa7tvUiOFIDfr7Kf+DfxbpYabgB2dJgWPEtZmKmefWMiODHgvBJo0MT6MKr7
 kOlIkgq5zYIFRNaxApWLI3JmH9+lsw6MHBe+cCqemV6q2OZlMV/5VGFElEywJt+4
 L3VmlJQpMiCEMo2Cgc1jigHKryk4Xw==
 =h6o3
 -----END PGP SIGNATURE-----

Merge tag 'mtd/for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull mtd updates from Miquel Raynal:
 "Core:
   - Remove obsolete macros only used by the old nand_ecclayout struct
   - Don't remove debugfs directory if device is in use
   - MAINTAINERS:
      - Add entry for Qualcomm NAND controller driver
      - Update the devicetree documentation path of hyperbus

  MTD devices:
   - block2mtd:
      - Add support for an optional custom MTD label
      - Minor refactor to avoid hard coded constant
   - mtdswap: Remove redundant assignment of pointer eb

  CFI:
   - Fixup CFI on ixp4xx

  Raw NAND controller drivers:
   - Arasan:
      - Prevent an unsupported configuration
   - Xway, Socrates: plat_nand, Pasemi, Orion, mpc5121, GPIO, Au1550nd,
     AMS-Delta:
      - Keep the driver compatible with on-die ECC engines
   - cs553x, lpc32xx_slc, ndfc, sharpsl, tmio, txx9ndfmc:
      - Revert the commits: "Fix external use of SW Hamming ECC helper"
      - And let callers use the bare Hamming helpers
   - Fsmc: Fix use of SM ORDER
   - Intel:
      - Fix potential buffer overflow in probe
   - xway, vf610, txx9ndfm, tegra, stm32, plat_nand, oxnas, omap, mtk,
     hisi504, gpmi, gpio, denali, bcm6368, atmel:
      - Make use of the helper function devm_platform_ioremap_resource{,byname}()

  Onenand drivers:
   - Samsung: Drop Exynos4 and describe driver in KConfig

  Raw NAND chip drivers:
   - Hynix: Add support for H27UCG8T2ETR-BC MLC NAND

  SPI NOR core:
   - Add spi-nor device tree binding under SPI NOR maintainers

  SPI NOR manufacturer drivers:
   - Enable locking for n25q128a13

  SPI NOR controller drivers:
   - Use devm_platform_ioremap_resource_byname()"

* tag 'mtd/for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (50 commits)
  mtd: core: don't remove debugfs directory if device is in use
  MAINTAINERS: Update the devicetree documentation path of hyperbus
  mtd: block2mtd: add support for an optional custom MTD label
  mtd: block2mtd: minor refactor to avoid hard coded constant
  mtd: fixup CFI on ixp4xx
  mtd: rawnand: arasan: Prevent an unsupported configuration
  MAINTAINERS: Add entry for Qualcomm NAND controller driver
  mtd: rawnand: hynix: Add support for H27UCG8T2ETR-BC MLC NAND
  mtd: rawnand: xway: Keep the driver compatible with on-die ECC engines
  mtd: rawnand: socrates: Keep the driver compatible with on-die ECC engines
  mtd: rawnand: plat_nand: Keep the driver compatible with on-die ECC engines
  mtd: rawnand: pasemi: Keep the driver compatible with on-die ECC engines
  mtd: rawnand: orion: Keep the driver compatible with on-die ECC engines
  mtd: rawnand: mpc5121: Keep the driver compatible with on-die ECC engines
  mtd: rawnand: gpio: Keep the driver compatible with on-die ECC engines
  mtd: rawnand: au1550nd: Keep the driver compatible with on-die ECC engines
  mtd: rawnand: ams-delta: Keep the driver compatible with on-die ECC engines
  Revert "mtd: rawnand: cs553x: Fix external use of SW Hamming ECC helper"
  Revert "mtd: rawnand: lpc32xx_slc: Fix external use of SW Hamming ECC helper"
  Revert "mtd: rawnand: ndfc: Fix external use of SW Hamming ECC helper"
  ...
2021-11-08 11:37:39 -08:00
Linus Torvalds
05b8cd3db7 Add 'tools/perf/libbpf/' to ignored files
Commit 6b491a86b7 ("perf build: Install libbpf headers locally when
building") installed copies of the libbpf headers into the build tree,
causing unnecessary noise from 'git status' after a perf tools build.

Add the 'libbpf/' subdirectory to the .gitignore file to silence it all
again.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-08 11:33:35 -08:00
Linus Torvalds
e851dfae43 kgdb patches for 5.16
A single patch this cycle. We replace some open-coded routines to
 classify task states with the scheduler's own function to do this.
 Alongside the obvious benefits of removing funky code and aligning
 more exactly with the scheduler's task classification, this also
 fixes a long standing compiler warning by removing the open-coded
 routines that generated the warning.
 
 Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAmGJA90ACgkQfOMlXTn3
 iKFJcw/9FLy94FS+y/F8xpTU89d2j92f8q1mxS9g7ToDzDOiIPLyNazbX+4PsXVQ
 FGgLpTqzNZX3+D5eAnOA/BNwWXtsvdxpsNnkY5ZCsVY5kZ0zsBYHe1O5CM2TbcMg
 bOvJRQVI/FjydlrwqxIz9gAD7FmT/QvyecIbHZm/zFiCxdQwZy3rFwREd5ENsjoG
 wumCCCH8Gh/afi9Pu3ZKHoZggNy/gmtSP3h3wmyoQneVFIJ4Vw5J61GFCvMPD+pN
 wuAXWpuzWaND5IPTr4aZMKHNSaxqADQoEpNWxkgRh0cNL4NGBKsdLZMcqTTiyWww
 TJSDtQKqocQB99eouwzQoA8SBsZwRvKRf/33QUXrWCAjl5YRK+9fSd8+dEf9Zd0o
 A3sh99ecmHXknY6K2uO7NFjUPLSA/QeMGBzNx9lt7RoL+14tjqZkrAvWXooZzBY3
 j39gwI1kSplmmCSoXwoW3AFVcCLJcGzE9qh0NUmZgt3kv8K1SUo3gxotKs8KwKj/
 xVozOokmZV2ZuCTf8oIw7ntLwIFjiUaYBE7JY+c8mT8VWCbs5ztyOb11I35YYT0V
 InXMDICLxZBD85eNOHyPC0fAud5emfboHl5GSUxo2hPgrRKuBmqElGtxG9CC8DLR
 SItPjKfrYI1CJd4uoFX54nC3GmwLVSAq3xDwpYsN4A4lLbJJytc=
 =vIfI
 -----END PGP SIGNATURE-----

Merge tag 'kgdb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux

Pull kgdb update from Daniel Thompson:
 "A single patch this cycle.

  We replace some open-coded routines to classify task states with the
  scheduler's own function to do this. Alongside the obvious benefits of
  removing funky code and aligning more exactly with the scheduler's
  task classification, this also fixes a long standing compiler warning
  by removing the open-coded routines that generated the warning"

* tag 'kgdb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  kdb: Adopt scheduler's task classification
2021-11-08 09:35:43 -08:00
Linus Torvalds
a2b03e48e9 OpenRISC updates for 5.16
This includes 2 minor cleanups, plus a bug fix for OpenRISC TLB flush
 code that allows the the SMP kernel to boot again.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE2cRzVK74bBA6Je/xw7McLV5mJ+QFAmGIOh4ACgkQw7McLV5m
 J+Ts7hAApLDmupgPlJfc9hsIIxiCZcMnP5vyF2gNSkaDVkWkLkM24WuAfcCusEUE
 VIrI/K6AV8wQUfG9n3sAQ/v5bkKrLCkt3UYsOViidHJsrTeW7CO8v+Nzair3gbNa
 zKZEs6EyoGrd9l5nMv3D0z/PFWnTCMehRBHYYp6D/chgT+2cpQsVsq9SbN3gV5CR
 sdFrGOzWSemtWCqVibdsKa4j0IqLVNRAPv1MYC7jECOkRvMkbkXFWZK8Q8ijs0Mv
 th4nt9SJOSH+OAzBhXoH7bW9BUwfCEwnBYXhsidHVwf1S6+5/yFj24zQAz/1YZ21
 ydTS6PHS66oFqa5QITNbfNuqeprrnb+8JavkYvJnWiPfg8yJ2gURZ/4pCHS8iMsB
 mQpLaQlTsEHODj7Vc4GSTlbzWDgtCSB6v0fOPJP01Bzf+z2wsqG6dLJVrWMl6nT9
 0sNvfnU9egne8UhbTJ8fot9SISRjc/sFMXKnMMdcY+7KWdw+Kh4t7/hL5ZbKCZ8H
 Pg3NNTA37puqTn9qvd0ZA5Ey9L0bSjeQpGa2A+nutE3zfRL9MQFWWrpHPgfJyoZe
 VOZSK0dtSfYtTsCnno5S2px2XqPpXIf0P5LiXzvtc+x7JjZsP3G4jZxB8ql+ZaQF
 IeBI4Kvg4QPU2T/TUwt5VIMEL/I3yZ63/0L+d1xnIqaVZVK0RDQ=
 =c5nj
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://github.com/openrisc/linux

Pull OpenRISC updates from Stafford Horne:
 "This includes two minor cleanups, plus a bug fix for OpenRISC TLB
  flush code that allows the the SMP kernel to boot again"

* tag 'for-linus' of git://github.com/openrisc/linux:
  openrisc: fix SMP tlb flush NULL pointer dereference
  openrisc: signal: remove unused DEBUG_SIG macro
  openrisc: time: don't mark comment as kernel-doc
2021-11-08 09:31:25 -08:00
Linus Torvalds
bbdbeb0048 perf tools changes for v5.16:
perf annotate:
 
 - Add riscv64 support.
 
 - Add fusion logic for AMD microarchs.
 
 perf record:
 
 - Add an option to control the synthesizing behavior:
 
     --synth <no|all|task|mmap|cgroup>
                       Fine-tune event synthesis: default=all
 
 core:
 
 - Allow controlling synthesizing PERF_RECORD_ metadata events during record.
 
 - perf.data reader prep work for multithreaded processing.
 
 - Fix missing exclude_{host,guest} setting in PMUs that don't support it and
   that were causing the feature detection code to disable it for all events,
   even the ones in PMUs that support it.
 
 - Fix the default use of precise events on AMD, that were always falling back
   to non-precise because perf_event_attr.exclude_guest=1 was set and IBS does
   not have filtering capability, refusing precise + exclude_guest.
 
 - Add bitfield_swap() to handle branch_stack endian issue.
 
 perf script:
 
 - Show binary offsets for userspace addresses in callchains.
 
 - Support instruction latency via new "ins_lat" selectable field.
 
 - Add dlfilter-show-cycles
 
 perf inject:
 
 - Add vmlinux and ignore-vmlinux arguments, similar to other tools.
 
 perf list:
 
 - Display PMU prefix for partially supported hybrid cache events.
 
 - Display hybrid PMU events with cpu type.
 
 perf stat:
 
 - Improve metrics documentation of data structures.
 
 - Fix memory leaks in the metric code.
 
 - Use NAN for missing event IDs.
 
 - Don't compute unused events.
 
 - Fix memory leak on error path.
 
 - Encode and use metric-id as a metric qualifier.
 
 - Allow metrics with no events.
 
 - Avoid events for an 'if' constant result.
 
 - Only add a referenced metric once.
 
 - Simplify metric_refs calculation.
 
 - Allow modifiers on metrics.
 
 perf test:
 
 - Add workload test of metric and metric groups.
 
 - Workload test of all PMUs.
 
 - vmlinux-kallsyms: Ignore hidden symbols.
 
 - Add pmu-event test for event described as "config=".
 
 - Verify more event members in pmu-events test.
 
 - Add endian test for struct branch_flags on the sample-parsing test.
 
 - Improve temp file cleanup in several tests.
 
 perf daemon:
 
 - Address MSAN warnings on send_cmd().
 
 perf kmem:
 
 - Improve man page for record options
 
 perf srcline:
 
 - Use long-running addr2line per DSO, greatly speeding up the 'srcline' sort order.
 
 perf symbols:
 
 - Ignore $a/$d symbols for ARM modules.
 
 - Fix /proc/kcore access on 32 bit systems.
 
 Kernel UAPI copies:
 
 - Update copy of linux/socket.h with the kernel sources, no change in tooling output.
 
 libbpf:
 
 - Pull in bpf_program__get_prog_info_linear() from libbpf, too much specific to perf.
 
 - Deprecate bpf_map__resize() in favor of bpf_map_set_max_entries()
 
 - Install libbpf headers locally when building.
 
 - Bump minimum LLVM C++ std to GNU++14.
 
 libperf:
 
 - Use binary search in perf_cpu_map__idx() as array are sorted.
 
 libtracefs:
 
 - Enable libtracefs dynamic linking.
 
 libtraceevent:
 
 - Increase logging when verbose.
 
 Arch specific:
 
 PowerPC:
 
 - Add support to expose instruction and data address registers as part of
   extended regs.
 
 Vendor events:
 
 JSON parser:
 
 - Support ConfigCode to set the config= in PMUs like:
 
   $ cat /sys/bus/event_source/devices/hisi_sccl1_ddrc3/events/act_cmd
   config=0x5
 
 - Make the JSON parser more conformant when in strict mode.
 
 All JSON files:
 
 - Fix all remaining invalid JSON files.
 
 ARM:
 
 - Syntax corrections in Neoverse N1 json.
 
 - Categorise the Neoverse V1 counters.
 
 - Add new armv8 PMU events.
 
 - Revise hip08 uncore events.
 
 Hardware tracing:
 
 auxtrace:
 
 - Add missing Z option to ITRACE_HELP.
 
 - Add itrace A option to approximate IPC.
 
 - Add itrace d+o option to direct debug log to stdout.
 
 Intel PT:
 
 - Add support for PERF_RECORD_AUX_OUTPUT_HW_ID
 
 - Support itrace A option to approximate IPC
 
 - Support itrace d+o option to direct debug log to stdout.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYYg7RwAKCRCyPKLppCJ+
 J1KgAQCGC3802gsI38/xwli1SLBNHsZ9DEy2nX5Ikw/f64K+0QEA6DsU0hpRTR0D
 i8ZFa2zNX748n+pU2WX6E7AjO01hrQo=
 =sXHg
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.16-2021-11-07-without-bpftool-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools updates from Arnaldo Carvalho de Melo:
 "perf annotate:
   - Add riscv64 support.
   - Add fusion logic for AMD microarchs.

  perf record:
   - Add an option to control the synthesizing behavior:
       --synth <no|all|task|mmap|cgroup>

  core:
   - Allow controlling synthesizing PERF_RECORD_ metadata events during
     record.
   - perf.data reader prep work for multithreaded processing.
   - Fix missing exclude_{host,guest} setting in PMUs that don't support
     it and that were causing the feature detection code to disable it
     for all events, even the ones in PMUs that support it.
   - Fix the default use of precise events on AMD, that were always
     falling back to non-precise because perf_event_attr.exclude_guest=1
     was set and IBS does not have filtering capability, refusing
     precise + exclude_guest.
   - Add bitfield_swap() to handle branch_stack endian issue.

  perf script:
   - Show binary offsets for userspace addresses in callchains.
   - Support instruction latency via new "ins_lat" selectable field.
   - Add dlfilter-show-cycles

  perf inject:
   - Add vmlinux and ignore-vmlinux arguments, similar to other tools.

  perf list:
   - Display PMU prefix for partially supported hybrid cache events.
   - Display hybrid PMU events with cpu type.

  perf stat:
   - Improve metrics documentation of data structures.
   - Fix memory leaks in the metric code.
   - Use NAN for missing event IDs.
   - Don't compute unused events.
   - Fix memory leak on error path.
   - Encode and use metric-id as a metric qualifier.
   - Allow metrics with no events.
   - Avoid events for an 'if' constant result.
   - Only add a referenced metric once.
   - Simplify metric_refs calculation.
   - Allow modifiers on metrics.

  perf test:
   - Add workload test of metric and metric groups.
   - Workload test of all PMUs.
   - vmlinux-kallsyms: Ignore hidden symbols.
   - Add pmu-event test for event described as "config=".
   - Verify more event members in pmu-events test.
   - Add endian test for struct branch_flags on the sample-parsing test.
   - Improve temp file cleanup in several tests.

  perf daemon:
   - Address MSAN warnings on send_cmd().

  perf kmem:
   - Improve man page for record options

  perf srcline:
   - Use long-running addr2line per DSO, greatly speeding up the
     'srcline' sort order.

  perf symbols:
   - Ignore $a/$d symbols for ARM modules.
   - Fix /proc/kcore access on 32 bit systems.

  Kernel UAPI copies:
   - Update copy of linux/socket.h with the kernel sources, no change in
     tooling output.

  libbpf:
   - Pull in bpf_program__get_prog_info_linear() from libbpf, too much
     specific to perf.
   - Deprecate bpf_map__resize() in favor of bpf_map_set_max_entries()
   - Install libbpf headers locally when building.
   - Bump minimum LLVM C++ std to GNU++14.

  libperf:
   - Use binary search in perf_cpu_map__idx() as array are sorted.

  libtracefs:
   - Enable libtracefs dynamic linking.

  libtraceevent:
   - Increase logging when verbose.

  Arch specific:

   * PowerPC:
      - Add support to expose instruction and data address registers as
        part of extended regs.

  Vendor events:

   * JSON parser:
      - Support ConfigCode to set the config= in PMUs
      - Make the JSON parser more conformant when in strict mode.

   * All JSON files:
      - Fix all remaining invalid JSON files.

   * ARM:
      - Syntax corrections in Neoverse N1 json.
      - Categorise the Neoverse V1 counters.
      - Add new armv8 PMU events.
      - Revise hip08 uncore events.

  Hardware tracing:

   * auxtrace:
      - Add missing Z option to ITRACE_HELP.
      - Add itrace A option to approximate IPC.
      - Add itrace d+o option to direct debug log to stdout.

   * Intel PT:
      - Add support for PERF_RECORD_AUX_OUTPUT_HW_ID
      - Support itrace A option to approximate IPC
      - Support itrace d+o option to direct debug log to stdout"

* tag 'perf-tools-for-v5.16-2021-11-07-without-bpftool-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (120 commits)
  perf build: Install libbpf headers locally when building
  perf MANIFEST: Add bpftool files to allow building with BUILD_BPF_SKEL=1
  perf metric: Fix memory leaks
  perf parse-event: Add init and exit to parse_event_error
  perf parse-events: Rename parse_events_error functions
  perf stat: Fix memory leak on error path
  perf tools: Use __BYTE_ORDER__
  perf inject: Add vmlinux and ignore-vmlinux arguments
  perf tools: Check vmlinux/kallsyms arguments in all tools
  perf tools: Refactor out kernel symbol argument sanity checking
  perf symbols: Ignore $a/$d symbols for ARM modules
  perf evsel: Don't set exclude_guest by default
  perf evsel: Fix missing exclude_{host,guest} setting
  perf bpf: Add missing free to bpf_event__print_bpf_prog_info()
  perf beauty: Update copy of linux/socket.h with the kernel sources
  perf clang: Fixes for more recent LLVM/clang
  tools: Bump minimum LLVM C++ std to GNU++14
  perf bpf: Pull in bpf_program__get_prog_info_linear()
  Revert "perf bench futex: Add support for 32-bit systems with 64-bit time_t"
  perf test sample-parsing: Add endian test for struct branch_flags
  ...
2021-11-08 09:25:26 -08:00
Linus Torvalds
1e9ed9360f Kbuild updates for v5.16
- Remove the global -isystem compiler flag, which was made possible by
    the introduction of <linux/stdarg.h>
 
  - Improve the Kconfig help to print the location in the top menu level
 
  - Fix "FORCE prerequisite is missing" build warning for sparc
 
  - Add new build targets, tarzst-pkg and perf-tarzst-src-pkg, which generate
    a zstd-compressed tarball
 
  - Prevent gen_init_cpio tool from generating a corrupted cpio when
    KBUILD_BUILD_TIMESTAMP is set to 2106-02-07 or later
 
  - Misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmGGkysVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGgZkQAIX4i9Tt6pyl/2xGDGkzUqjprfoH
 QUIo1DoUclLUygoakrrrX3EnZLWrslgPTKjQxdiV6RA6xHfe4cYgNTSq8zM9lsPT
 lu+B4nEDqoXQ5gyLxMlnjS3FRQTNYIeBZEhSAIiW8TENdLKlKc+NYdoj7th50dO0
 SkXRa2dpWHa6t7ZRqHIHMpUWA7gm0w22ZbgQmyUv1CDGO4IHPLqe2b2PMsrzhSZ1
 yypP1l6aQVKuP0hN9aytbTRqDxUd0uOzBf00PK5zx23hjdwZ9wmZrFTKDf9fAu/+
 nR7gBsa5YoYNQh3UkayZXjR5dClmgsCXZ25OXI7YucQp/8OJ5fadfn1NFpJHsw56
 n5cckbHIXgnFUcel5YlkR6qTHjpzdr9vHm90MmiuX99b3oy9czl6pY3qkNfRkllQ
 v7ME5L1qlw3P3ia1KA+H4zW/LIJ8p5cbKBwaY22m3kY3bTx7PiOfMlep4UVqxXSb
 0/OqxSsmYg5LlmwEQ0SSsx45hE0o9nG/cdjkHu1jUOUHxYfpt1T4MTILeGUwmjzd
 TydJym5MZyXBawu4NVB3QLoKm5Jt2BXtyaWOtq74VSrs77roNCdYuQWJ+1aBf2Pg
 0s4CVC2cC7KlxJDImoqswZATGXPMfbiVDcuVSSukYRgBMeCBPUzRhB8YP36BZyD3
 9vFYmqSujtUU7nWb
 =ATFN
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Remove the global -isystem compiler flag, which was made possible by
   the introduction of <linux/stdarg.h>

 - Improve the Kconfig help to print the location in the top menu level

 - Fix "FORCE prerequisite is missing" build warning for sparc

 - Add new build targets, tarzst-pkg and perf-tarzst-src-pkg, which
   generate a zstd-compressed tarball

 - Prevent gen_init_cpio tool from generating a corrupted cpio when
   KBUILD_BUILD_TIMESTAMP is set to 2106-02-07 or later

 - Misc cleanups

* tag 'kbuild-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits)
  kbuild: use more subdir- for visiting subdirectories while cleaning
  sh: remove meaningless archclean line
  initramfs: Check timestamp to prevent broken cpio archive
  kbuild: split DEBUG_CFLAGS out to scripts/Makefile.debug
  gen_init_cpio: add static const qualifiers
  kbuild: Add make tarzst-pkg build option
  scripts: update the comments of kallsyms support
  sparc: Add missing "FORCE" target when using if_changed
  kconfig: refactor conf_touch_dep()
  kconfig: refactor conf_write_dep()
  kconfig: refactor conf_write_autoconf()
  kconfig: add conf_get_autoheader_name()
  kconfig: move sym_escape_string_value() to confdata.c
  kconfig: refactor listnewconfig code
  kconfig: refactor conf_write_symbol()
  kconfig: refactor conf_write_heading()
  kconfig: remove 'const' from the return type of sym_escape_string_value()
  kconfig: rename a variable in the lexer to a clearer name
  kconfig: narrow the scope of variables in the lexer
  kconfig: Create links to main menu items in search
  ...
2021-11-08 09:15:45 -08:00
Linus Torvalds
67b7e1f241 modules patches for 5.16-rc1
As requested by Jessica I'm stepping in to help with modules
 maintenance. This is my first pull request to you.
 
 I've collected only two patches for modules for the 5.16-rc1 merge
 window. These patches are from Shuah Khan as she debugged some corner
 case error with modules. The error messages are improved for
 elf_validity_check(). While doing this work a corner case fix was
 spotted on validate_section_offset() due to a possible overflow bug
 on 64-bit. The impact of this fix is low given this just limits
 module section headers placed within the 32-bit boundary, and we
 obviously don't have insane module sizes. Even if a specially crafted
 module is constructed later checks would invalidate the module right
 away.
 
 I've let this sit through 0-day testing since October 15th with no
 issues found.
 
 Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmGFrvcSHG1jZ3JvZkBr
 ZXJuZWwub3JnAAoJEM4jHQowkoinhFAP/1BBXuM/vevC1IdZaEU4M8pg07NOpkZt
 PYJc8CxWKTtEg5hrLJMqOexXGwvAg/nq28IFWvUKh3bGtEghPyrQu6+I4mXsjnjJ
 t9/AO+BOYU14DJGDAYEuReNsaAcyeRooHLriuUaNvhhaN9q+v+FRyBWNphmA6Tz7
 VkCtmCNMFJZlhd9Cu4jOZpJe6CIe9gZ0czYfRshAl/3ZRSQjYaddtbYf1Cs8Vwah
 by4o2YyvctrRzeOj/Fy+kbqZw2St39nZ5fKYwijRn1ZwHRQo6NQqrlMeS8rI0LgG
 1YwWgNWO1FjaPzyIFcAhk2bUF2TxEf5/eVpXn2qXHnmVZ55oBPP/O7Th0/5OK9gD
 utOMbO1nqBLBXUyX/1dO/UT36XcrqtUP0Y9VgjIvj9n8Y82RGYmBScH/TOU1f7A7
 sH56sW9/3YvIOe8AShBHJ7IKqZXU0inIGasFYwKKm2pAOLtajaC9Sr5fqVbuyfNF
 J2+nXipVzjI0f9SGTqmE41jynFGln6nfd1pgOOiysg9ZqxieINB0J8l0OHe6fZz/
 zU4TehXZHE9DApP8D+rVpP0ltwR2YWs2u0zRqHr/0GEWYH00JZu2ymDR13W7izSp
 KiiveBxhwBpewgV5cyua8TDyeKhn3mEJFNmijlaq4yq1P2oKeWTQRDRZjwUP8EZY
 s16oV+BW7Kp+
 =Evek
 -----END PGP SIGNATURE-----

Merge tag 'modules-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull module updates from Luis Chamberlain:
 "As requested by Jessica I'm stepping in to help with modules
  maintenance. This is my first pull request to you.

  I've collected only two patches for modules for the 5.16-rc1 merge
  window. These patches are from Shuah Khan as she debugged some corner
  case error with modules. The error messages are improved for
  elf_validity_check(). While doing this work a corner case fix was
  spotted on validate_section_offset() due to a possible overflow bug on
  64-bit. The impact of this fix is low given this just limits module
  section headers placed within the 32-bit boundary, and we obviously
  don't have insane module sizes. Even if a specially crafted module is
  constructed later checks would invalidate the module right away.

  I've let this sit through 0-day testing since October 15th with no
  issues found"

* tag 'modules-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  module: change to print useful messages from elf_validity_check()
  module: fix validate_section_offset() overflow bug on 64-bit
2021-11-08 09:04:59 -08:00
Arnd Bergmann
501586ea59 xen/balloon: fix unused-variable warning
In configurations with CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=n
and CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y, gcc warns about an
unused variable:

drivers/xen/balloon.c:83:12: error: 'xen_hotplug_unpopulated' defined but not used [-Werror=unused-variable]

Since this is always zero when CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
is disabled, turn it into a preprocessor constant in that case.

Fixes: 121f2faca2 ("xen/balloon: rename alloc/free_xenballooned_pages")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211108111408.3940366-1-arnd@kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-11-08 10:40:17 -06:00
Pavel Begunkov
bad119b9a0 io_uring: honour zeroes as io-wq worker limits
When we pass in zero as an io-wq worker number limit it shouldn't
actually change the limits but return the old value, follow that
behaviour with deferred limits setup as well.

Cc: stable@kernel.org # 5.15
Reported-by: Beld Zhang <beldzhang@gmail.com>
Fixes: e139a1ec92 ("io_uring: apply max_workers limit to all future users")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1b222a92f7a78a24b042763805e891a4cdd4b544.1636384034.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-08 08:39:48 -07:00
Ye Bin
a846a8e6c9 blk-mq: don't free tags if the tag_set is used by other device in queue initialztion
We got UAF report on v5.10 as follows:
[ 1446.674930] ==================================================================
[ 1446.675970] BUG: KASAN: use-after-free in blk_mq_get_driver_tag+0x9a4/0xa90
[ 1446.676902] Read of size 8 at addr ffff8880185afd10 by task kworker/1:2/12348
[ 1446.677851]
[ 1446.678073] CPU: 1 PID: 12348 Comm: kworker/1:2 Not tainted 5.10.0-10177-gc9c81b1e346a #2
[ 1446.679168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 1446.680692] Workqueue: kthrotld blk_throtl_dispatch_work_fn
[ 1446.681448] Call Trace:
[ 1446.681800]  dump_stack+0x9b/0xce
[ 1446.682916]  print_address_description.constprop.6+0x3e/0x60
[ 1446.685999]  kasan_report.cold.9+0x22/0x3a
[ 1446.687186]  blk_mq_get_driver_tag+0x9a4/0xa90
[ 1446.687785]  blk_mq_dispatch_rq_list+0x21a/0x1d40
[ 1446.692576]  __blk_mq_do_dispatch_sched+0x394/0x830
[ 1446.695758]  __blk_mq_sched_dispatch_requests+0x398/0x4f0
[ 1446.698279]  blk_mq_sched_dispatch_requests+0xdf/0x140
[ 1446.698967]  __blk_mq_run_hw_queue+0xc0/0x270
[ 1446.699561]  __blk_mq_delay_run_hw_queue+0x4cc/0x550
[ 1446.701407]  blk_mq_run_hw_queue+0x13b/0x2b0
[ 1446.702593]  blk_mq_sched_insert_requests+0x1de/0x390
[ 1446.703309]  blk_mq_flush_plug_list+0x4b4/0x760
[ 1446.705408]  blk_flush_plug_list+0x2c5/0x480
[ 1446.708471]  blk_finish_plug+0x55/0xa0
[ 1446.708980]  blk_throtl_dispatch_work_fn+0x23b/0x2e0
[ 1446.711236]  process_one_work+0x6d4/0xfe0
[ 1446.711778]  worker_thread+0x91/0xc80
[ 1446.713400]  kthread+0x32d/0x3f0
[ 1446.714362]  ret_from_fork+0x1f/0x30
[ 1446.714846]
[ 1446.715062] Allocated by task 1:
[ 1446.715509]  kasan_save_stack+0x19/0x40
[ 1446.716026]  __kasan_kmalloc.constprop.1+0xc1/0xd0
[ 1446.716673]  blk_mq_init_tags+0x6d/0x330
[ 1446.717207]  blk_mq_alloc_rq_map+0x50/0x1c0
[ 1446.717769]  __blk_mq_alloc_map_and_request+0xe5/0x320
[ 1446.718459]  blk_mq_alloc_tag_set+0x679/0xdc0
[ 1446.719050]  scsi_add_host_with_dma.cold.3+0xa0/0x5db
[ 1446.719736]  virtscsi_probe+0x7bf/0xbd0
[ 1446.720265]  virtio_dev_probe+0x402/0x6c0
[ 1446.720808]  really_probe+0x276/0xde0
[ 1446.721320]  driver_probe_device+0x267/0x3d0
[ 1446.721892]  device_driver_attach+0xfe/0x140
[ 1446.722491]  __driver_attach+0x13a/0x2c0
[ 1446.723037]  bus_for_each_dev+0x146/0x1c0
[ 1446.723603]  bus_add_driver+0x3fc/0x680
[ 1446.724145]  driver_register+0x1c0/0x400
[ 1446.724693]  init+0xa2/0xe8
[ 1446.725091]  do_one_initcall+0x9e/0x310
[ 1446.725626]  kernel_init_freeable+0xc56/0xcb9
[ 1446.726231]  kernel_init+0x11/0x198
[ 1446.726714]  ret_from_fork+0x1f/0x30
[ 1446.727212]
[ 1446.727433] Freed by task 26992:
[ 1446.727882]  kasan_save_stack+0x19/0x40
[ 1446.728420]  kasan_set_track+0x1c/0x30
[ 1446.728943]  kasan_set_free_info+0x1b/0x30
[ 1446.729517]  __kasan_slab_free+0x111/0x160
[ 1446.730084]  kfree+0xb8/0x520
[ 1446.730507]  blk_mq_free_map_and_requests+0x10b/0x1b0
[ 1446.731206]  blk_mq_realloc_hw_ctxs+0x8cb/0x15b0
[ 1446.731844]  blk_mq_init_allocated_queue+0x374/0x1380
[ 1446.732540]  blk_mq_init_queue_data+0x7f/0xd0
[ 1446.733155]  scsi_mq_alloc_queue+0x45/0x170
[ 1446.733730]  scsi_alloc_sdev+0x73c/0xb20
[ 1446.734281]  scsi_probe_and_add_lun+0x9a6/0x2d90
[ 1446.734916]  __scsi_scan_target+0x208/0xc50
[ 1446.735500]  scsi_scan_channel.part.3+0x113/0x170
[ 1446.736149]  scsi_scan_host_selected+0x25a/0x360
[ 1446.736783]  store_scan+0x290/0x2d0
[ 1446.737275]  dev_attr_store+0x55/0x80
[ 1446.737782]  sysfs_kf_write+0x132/0x190
[ 1446.738313]  kernfs_fop_write_iter+0x319/0x4b0
[ 1446.738921]  new_sync_write+0x40e/0x5c0
[ 1446.739429]  vfs_write+0x519/0x720
[ 1446.739877]  ksys_write+0xf8/0x1f0
[ 1446.740332]  do_syscall_64+0x2d/0x40
[ 1446.740802]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1446.741462]
[ 1446.741670] The buggy address belongs to the object at ffff8880185afd00
[ 1446.741670]  which belongs to the cache kmalloc-256 of size 256
[ 1446.743276] The buggy address is located 16 bytes inside of
[ 1446.743276]  256-byte region [ffff8880185afd00, ffff8880185afe00)
[ 1446.744765] The buggy address belongs to the page:
[ 1446.745416] page:ffffea0000616b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x185ac
[ 1446.746694] head:ffffea0000616b00 order:2 compound_mapcount:0 compound_pincount:0
[ 1446.747719] flags: 0x1fffff80010200(slab|head)
[ 1446.748337] raw: 001fffff80010200 ffffea00006a3208 ffffea000061bf08 ffff88801004f240
[ 1446.749404] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
[ 1446.750455] page dumped because: kasan: bad access detected
[ 1446.751227]
[ 1446.751445] Memory state around the buggy address:
[ 1446.752102]  ffff8880185afc00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1446.753090]  ffff8880185afc80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1446.754079] >ffff8880185afd00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1446.755065]                          ^
[ 1446.755589]  ffff8880185afd80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1446.756574]  ffff8880185afe00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1446.757566] ==================================================================

Flag 'BLK_MQ_F_TAG_QUEUE_SHARED' will be set if the second device on the
same host initializes it's queue successfully. However, if the second
device failed to allocate memory in blk_mq_alloc_and_init_hctx() from
blk_mq_realloc_hw_ctxs() from blk_mq_init_allocated_queue(),
__blk_mq_free_map_and_rqs() will be called on error path, and if
'BLK_MQ_TAG_HCTX_SHARED' is not set, 'tag_set->tags' will be freed
while it's still used by the first device.

To fix this issue we move release newly allocated hardware context from
blk_mq_realloc_hw_ctxs to __blk_mq_update_nr_hw_queues. As there is needn't to
release hardware context in blk_mq_init_allocated_queue.

Fixes: 868f2f0b72 ("blk-mq: dynamic h/w context count")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211108074019.1058843-1-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-08 06:23:49 -07:00
Coly Li
2878feaed5 bcache: Revert "bcache: use bvec_virt"
This reverts commit 2fd3e5efe7.

The above commit replaces page_address(bv->bv_page) by bvec_virt(bv) to
avoid directly access to bv->bv_page, but in situation bv->bv_offset is
not zero and page_address(bv->bv_page) is not equal to bvec_virt(bv). In
such case a memory corruption may happen because memory in next page is
tainted by following line in do_btree_node_write(),
	memcpy(bvec_virt(bv), addr, PAGE_SIZE);

This patch reverts the mentioned commit to avoid the memory corruption.

Fixes: 2fd3e5efe7 ("bcache: use bvec_virt")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # 5.15
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211103151041.70516-1-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-08 06:23:17 -07:00
Arnd Bergmann
c7c386fbc2 arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functions
gcc warns about undefined behavior the vmalloc code when building
with CONFIG_ARM64_PA_BITS_52, when the 'idx++' in the argument to
__phys_to_pte_val() is evaluated twice:

mm/vmalloc.c: In function 'vmap_pfn_apply':
mm/vmalloc.c:2800:58: error: operation on 'data->idx' may be undefined [-Werror=sequence-point]
 2800 |         *pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
      |                                                 ~~~~~~~~~^~
arch/arm64/include/asm/pgtable-types.h:25:37: note: in definition of macro '__pte'
   25 | #define __pte(x)        ((pte_t) { (x) } )
      |                                     ^
arch/arm64/include/asm/pgtable.h:80:15: note: in expansion of macro '__phys_to_pte_val'
   80 |         __pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
      |               ^~~~~~~~~~~~~~~~~
mm/vmalloc.c:2800:30: note: in expansion of macro 'pfn_pte'
 2800 |         *pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
      |                              ^~~~~~~

I have no idea why this never showed up earlier, but the safest
workaround appears to be changing those macros into inline functions
so the arguments get evaluated only once.

Cc: Matthew Wilcox <willy@infradead.org>
Fixes: 75387b9263 ("arm64: handle 52-bit physical addresses in page table entries")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211105075414.2553155-1-arnd@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-11-08 10:05:54 +00:00
Qian Cai
c6975d7cab arm64: Track no early_pgtable_alloc() for kmemleak
After switched page size from 64KB to 4KB on several arm64 servers here,
kmemleak starts to run out of early memory pool due to a huge number of
those early_pgtable_alloc() calls:

  kmemleak_alloc_phys()
  memblock_alloc_range_nid()
  memblock_phys_alloc_range()
  early_pgtable_alloc()
  init_pmd()
  alloc_init_pud()
  __create_pgd_mapping()
  __map_memblock()
  paging_init()
  setup_arch()
  start_kernel()

Increased the default value of DEBUG_KMEMLEAK_MEM_POOL_SIZE by 4 times
won't be enough for a server with 200GB+ memory. There isn't much
interesting to check memory leaks for those early page tables and those
early memory mappings should not reference to other memory. Hence, no
kmemleak false positives, and we can safely skip tracking those early
allocations from kmemleak like we did in the commit fed84c7852
("mm/memblock.c: skip kmemleak for kasan_init()") without needing to
introduce complications to automatically scale the value depends on the
runtime memory size etc. After the patch, the default value of
DEBUG_KMEMLEAK_MEM_POOL_SIZE becomes sufficient again.

Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20211105150509.7826-1-quic_qiancai@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-11-08 10:05:22 +00:00
Peter Collingbourne
aedad3e1c6 arm64: mte: change PR_MTE_TCF_NONE back into an unsigned long
This constant was previously an unsigned long, but was changed
into an int in commit 433c38f40f ("arm64: mte: change ASYNC and
SYNC TCF settings into bitfields"). This ended up causing spurious
unsigned-signed comparison warnings in expressions such as:

(x & PR_MTE_TCF_MASK) != PR_MTE_TCF_NONE

Therefore, change it back into an unsigned long to silence these
warnings.

Link: https://linux-review.googlesource.com/id/I07a72310db30227a5b7d789d0b817d78b657c639
Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://lore.kernel.org/r/20211105230829.2254790-1-pcc@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-11-08 10:04:33 +00:00
Masahiro Yamada
34688c7691 arm64: vdso: remove -nostdlib compiler flag
The -nostdlib option requests the compiler to not use the standard
system startup files or libraries when linking. It is effective only
when $(CC) is used as a linker driver.

Since commit 691efbedc6 ("arm64: vdso: use $(LD) instead of $(CC)
to link VDSO"), $(LD) is directly used, hence -nostdlib is unneeded.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20211107161802.323125-1-masahiroy@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-11-08 10:02:57 +00:00
Reiji Watanabe
9dc232a8ab arm64: arm64_ftr_reg->name may not be a human-readable string
The id argument of ARM64_FTR_REG_OVERRIDE() is used for two purposes:
one as the system register encoding (used for the sys_id field of
__ftr_reg_entry), and the other as the register name (stringified
and used for the name field of arm64_ftr_reg), which is debug
information. The id argument is supposed to be a macro that
indicates an encoding of the register (eg. SYS_ID_AA64PFR0_EL1, etc).

ARM64_FTR_REG(), which also has the same id argument,
uses ARM64_FTR_REG_OVERRIDE() and passes the id to the macro.
Since the id argument is completely macro-expanded before it is
substituted into a macro body of ARM64_FTR_REG_OVERRIDE(),
the stringified id in the body of ARM64_FTR_REG_OVERRIDE is not
a human-readable register name, but a string of numeric bitwise
operations.

Fix this so that human-readable register names are available as
debug information.

Fixes: 8f266a5d87 ("arm64: cpufeature: Add global feature override facility")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211101045421.2215822-1-reijiw@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-11-08 10:02:36 +00:00
Linus Torvalds
c107fb9b4f Add gitignore file for samples/fanotify/ subdirectory
Commit 5451093081 ("samples: Add fs error monitoring example") added a
new sample program, but didn't teach git to ignore the new generated
files, causing unnecessary noise from 'git status' after a full build.

Add the 'fs-monitor' sample executable to the .gitignore for this
subdirectory to silence it all again.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-07 11:19:24 -08:00
Linus Torvalds
6b75d88fa8 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
 "Hot-fix for I2C"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: xgene-slimpro: Fix wrong pointer passed to PTR_ERR()
2021-11-07 10:52:04 -08:00
Linus Torvalds
e582e08ec0 Auxdisplay improvements:
- 4-digit 7-segment and quad alphanumeric display support for
     the ht16k33 driver, allowing the user to display and scroll
     text messages, from Geert Uytterhoeven.
 
   - An assortment of fixes and cleanups from Geert Uytterhoeven.
 
   - Header cleanups from Mianhan Liu.
 
   - Whitespace cleanup from Huiquan Deng.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmGGj/EACgkQGXyLc2ht
 IW0bNg/+PIA7158LvIcNxpPpCwAdaZZznmawvwjSH7Gzg+sPgHH7VJ50OtoTpZnm
 vP7k9/l9NTxkgarpiKsDQtjEx5lK8CObVODdLvN6YccrXVIl4/CQLJdx3XK6qOe6
 TQ+qNVQrpWUjPoj3sHI0mlo1641X1h7T4UViLxL+5CNZ8bh0G2dTur7YYpIVSKZd
 Kk9457xP6g4BomhEFKiTIUz16H+522h6yvrreChDVRf/TBFxPfRMNMrGQu6aznVh
 Ape1p44/muna8dKtPJXOmoQ9gagve6Y/D5SZLROFpCNKv7MwgnspnLZa0j8kwRGS
 qi3xeVJSb6I9GZYKHOlv9BhX8dP4fxobJngeIbTMX/j0OLRQg4KH/VrL8FCnFgmW
 WDfPhoee2FAyqcCk+dxBrcTkoKb50oEAX9IVhGubjnhvhD7z75DjqEkh+vW+D6um
 S+lylNyg7Rl1RLxEsC1b5Z8TLnyeytLGMzkUOg+4FiSVYOeDo/SJkMl2kEtCi3vq
 HIko+3liGJP4dNqfk6nSliM7apUP1/LKD6cmW72CKfeDbxgCOrvKL7SPyP+5J1uo
 URaWIVbY22MMOWKYjIB47JBNiT6/QKBNrtqLu9CTcd4hUqK84+T5ME+Ca/QpuNIP
 YmXCUxEF21SKLVTFuQwJFluqgAmZ8DDQyhd2lk1TuIQ+kht5hpo=
 =aMxr
 -----END PGP SIGNATURE-----

Merge tag 'auxdisplay-for-linus-v5.16' of git://github.com/ojeda/linux

Pull auxdisplay updates from Miguel Ojeda:

 - 4-digit 7-segment and quad alphanumeric display support for the
   ht16k33 driver, allowing the user to display and scroll text
   messages, from Geert Uytterhoeven.

 - An assortment of fixes and cleanups from Geert Uytterhoeven.

 - Header cleanups from Mianhan Liu.

 - Whitespace cleanup from Huiquan Deng.

* tag 'auxdisplay-for-linus-v5.16' of git://github.com/ojeda/linux: (26 commits)
  MAINTAINERS: Add DT Bindings for Auxiliary Display Drivers
  auxdisplay: cfag12864bfb: code indent should use tabs where possible
  auxdisplay: ht16k33: remove superfluous header files
  auxdisplay: ks0108: remove superfluous header files
  auxdisplay: cfag12864bfb: remove superfluous header files
  auxdisplay: ht16k33: Make use of device properties
  auxdisplay: ht16k33: Add LED support
  dt-bindings: auxdisplay: ht16k33: Document LED subnode
  auxdisplay: ht16k33: Add support for segment displays
  auxdisplay: ht16k33: Extract frame buffer probing
  auxdisplay: ht16k33: Extract ht16k33_brightness_set()
  auxdisplay: ht16k33: Move delayed work
  auxdisplay: ht16k33: Add helper variable dev
  auxdisplay: ht16k33: Convert to simple i2c probe function
  auxdisplay: ht16k33: Remove unneeded error check in keypad probe()
  auxdisplay: ht16k33: Use HT16K33_FB_SIZE in ht16k33_initialize()
  auxdisplay: ht16k33: Fix frame buffer device blanking
  auxdisplay: ht16k33: Connect backlight to fbdev
  auxdisplay: linedisp: Add support for changing scroll rate
  auxdisplay: linedisp: Use kmemdup_nul() helper
  ...
2021-11-07 10:47:27 -08:00
Quentin Monnet
6b491a86b7 perf build: Install libbpf headers locally when building
API headers from libbpf should not be accessed directly from the
library's source directory. Instead, they should be exported with "make
install_headers". Let's adjust perf's Makefile to install those headers
locally when building libbpf.

v2:

- Fix $(LIBBPF_OUTPUT) when $(OUTPUT) is null.
- Make sure the recipe for $(LIBBPF_OUTPUT) is not under a "ifdef".

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20211107002445.4790-1-quentin@isovalent.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 15:39:28 -03:00
Arnaldo Carvalho de Melo
f174940488 perf MANIFEST: Add bpftool files to allow building with BUILD_BPF_SKEL=1
We need bpftool and required kernel/bpf/disasm.[ch] to bootstrap the
cgroups, bperf and other BPF skels used by perf.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 15:39:28 -03:00
Ian Rogers
aba8c5e380 perf metric: Fix memory leaks
Certain error paths may leak memory as caught by address sanitizer.
Ensure this is cleaned up to make sure address/leak sanitizer is happy.

Fixes: 5ecd5a0c7d ("perf metrics: Modify setup and deduplication")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211107090002.3784612-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 15:39:28 -03:00
Ian Rogers
07eafd4e05 perf parse-event: Add init and exit to parse_event_error
parse_events() may succeed but leave string memory allocations reachable
in the error.

Add an init/exit that must be called to initialize and clean up the
error. This fixes a leak in metricgroup parse_ids.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211107090002.3784612-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 15:39:25 -03:00
Ian Rogers
6c1912898e perf parse-events: Rename parse_events_error functions
Group error functions and name after the data type they manipulate.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211107090002.3784612-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 15:38:54 -03:00