linux

Author	SHA1	Message	Date
Andy Shevchenko	1fcbd5deac	include/linux/sbitmap.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/20211027150437.79921-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:49 -08:00
Andy Shevchenko	5f6286a608	include/linux/delay.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. [akpm@linux-foundation.org: cxd2880_common.h needs bits.h for GENMASK()] [andriy.shevchenko@linux.intel.com: delay.h: fix for removed kernel.h] Link: https://lkml.kernel.org/r/20211028170143.56523-1-andriy.shevchenko@linux.intel.com [akpm@linux-foundation.org: include/linux/fwnode.h needs bits.h for BIT()] Link: https://lkml.kernel.org/r/20211027150324.79827-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:49 -08:00
Andy Shevchenko	28b2e8f320	include/media/media-entity.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/20211013170417.87909-8-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:49 -08:00
Andy Shevchenko	c540f95959	include/linux/plist.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/20211013170417.87909-7-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:49 -08:00
Andy Shevchenko	50b09d6145	include/linux/llist.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/20211013170417.87909-6-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:49 -08:00
Andy Shevchenko	cd7187e112	include/linux/list.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/20211013170417.87909-5-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:49 -08:00
Andy Shevchenko	ec54c28920	include/kunit/test.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/20211013170417.87909-4-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:49 -08:00
Andy Shevchenko	d2a8ebbf81	kernel.h: split out container_of() and typeof_member() macros kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt cleaning it up by splitting out container_of() and typeof_member() macros. For time being include new header back to kernel.h to avoid twisted indirected includes for existing users. Note, there are _a lot_ of headers and modules that include kernel.h solely for one of these macros and this allows to unburden compiler for the twisted inclusion paths and to make new code cleaner in the future. Link: https://lkml.kernel.org/r/20211013170417.87909-3-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:49 -08:00
Andy Shevchenko	f5d8061484	kernel.h: drop unneeded <linux/kernel.h> inclusion from other headers Patch series "kernel.h further split", v5. kernel.h is a set of something which is not related to each other and often used in non-crossed compilation units, especially when drivers need only one or two macro definitions from it. This patch (of 7): There is no evidence we need kernel.h inclusion in certain headers. Drop unneeded <linux/kernel.h> inclusion from other headers. [sfr@canb.auug.org.au: bottom_half.h needs kernel] Link: https://lkml.kernel.org/r/20211015202908.1c417ae2@canb.auug.org.au Link: https://lkml.kernel.org/r/20211013170417.87909-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20211013170417.87909-2-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:49 -08:00
Stephen Brennan	da4d6b9cf8	proc: allow pid_revalidate() during LOOKUP_RCU Problem Description: When running running ~128 parallel instances of TZ=/etc/localtime ps -fe >/dev/null on a 128CPU machine, the %sys utilization reaches 97%, and perf shows the following code path as being responsible for heavy contention on the d_lockref spinlock: walk_component() lookup_fast() d_revalidate() pid_revalidate() // returns -ECHILD unlazy_child() lockref_get_not_dead(&nd->path.dentry->d_lockref) <-- contention The reason is that pid_revalidate() is triggering a drop from RCU to ref path walk mode. All concurrent path lookups thus try to grab a reference to the dentry for /proc/, before re-executing pid_revalidate() and then stepping into the /proc/$pid directory. Thus there is huge spinlock contention. This patch allows pid_revalidate() to execute in RCU mode, meaning that the path lookup can successfully enter the /proc/$pid directory while still in RCU mode. Later on, the path lookup may still drop into ref mode, but the contention will be much reduced at this point. By applying this patch, %sys utilization falls to around 85% under the same workload, and the number of ps processes executed per unit time increases by 3x-4x. Although this particular workload is a bit contrived, we have seen some large collections of eager monitoring scripts which produced similarly high %sys time due to contention in the /proc directory. As a result this patch, Al noted that several procfs methods which were only called in ref-walk mode could now be called from RCU mode. To ensure that this patch is safe, I audited all the inode get_link and permission() implementations, as well as dentry d_revalidate() implementations, in fs/proc. The purpose here is to ensure that they either are safe to call in RCU (i.e. don't sleep) or correctly bail out of RCU mode if they don't support it. My analysis shows that all at-risk procfs methods are safe to call under RCU, and thus this patch is safe. Procfs RCU-walk Analysis: This analysis is up-to-date with 5.15-rc3. When called under RCU mode, these functions have arguments as follows: * get_link() receives a NULL dentry pointer when called in RCU mode. * permission() receives MAY_NOT_BLOCK in the mode parameter when called from RCU. * d_revalidate() receives LOOKUP_RCU in flags. For the following functions, either they are trivially RCU safe, or they explicitly bail at the beginning of the function when they run: proc_ns_get_link (bails out) proc_get_link (RCU safe) proc_pid_get_link (bails out) map_files_d_revalidate (bails out) map_misc_d_revalidate (bails out) proc_net_d_revalidate (RCU safe) proc_sys_revalidate (bails out, also not under /proc/$pid) tid_fd_revalidate (bails out) proc_sys_permission (not under /proc/$pid) The remainder of the functions require a bit more detail: * proc_fd_permission: RCU safe. All of the body of this function is under rcu_read_lock(), except generic_permission() which declares itself RCU safe in its documentation string. * proc_self_get_link uses GFP_ATOMIC in the RCU case, so it is RCU aware and otherwise looks safe. The same is true of proc_thread_self_get_link. * proc_map_files_get_link: calls ns_capable, which calls capable(), and thus calls into the audit code (see note #1 below). The remainder is just a call to the trivially safe proc_pid_get_link(). * proc_pid_permission: calls ptrace_may_access(), which appears RCU safe, although it does call into the "security_ptrace_access_check()" hook, which looks safe under smack and selinux. Just the audit code is of concern. Also uses get_task_struct() and put_task_struct(), see note #2 below. * proc_tid_comm_permission: Appears safe, though calls put_task_struct (see note #2 below). Note #1: Most of the concern of RCU safety has centered around the audit code. However, since `b17ec22fb3` ("selinux: slow_avc_audit has become non-blocking"), it's safe to call this code under RCU. So all of the above are safe by my estimation. Note #2: get_task_struct() and put_task_struct(): The majority of get_task_struct() is under RCU read lock, and in any case it is a simple increment. But put_task_struct() is complex, given that it could at some point free the task struct, and this process has many steps which I couldn't manually verify. However, several other places call put_task_struct() under RCU, so it appears safe to use here too (see kernel/hung_task.c:165 or rcu/tree-stall.h:296) Patch description: pid_revalidate() drops from RCU into REF lookup mode. When many threads are resolving paths within /proc in parallel, this can result in heavy spinlock contention on d_lockref as each thread tries to grab a reference to the /proc dentry (and drop it shortly thereafter). Investigation indicates that it is not necessary to drop RCU in pid_revalidate(), as no RCU data is modified and the function never sleeps. So, remove the LOOKUP_RCU check. Link: https://lkml.kernel.org/r/20211004175629.292270-2-stephen.s.brennan@oracle.com Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:49 -08:00
David Hildenbrand	ce2814622e	virtio-mem: kdump mode to sanitize /proc/vmcore access Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [1] Let's register a vmcore callback, to allow vmcore code to check if a PFN belonging to a virtio-mem device is either currently plugged and should be dumped or is currently unplugged and should not be accessed, instead mapping the shared zeropage or returning zeroes when reading. This is important when not capturing /proc/vmcore via tools like "makedumpfile" that can identify logically unplugged virtio-mem memory via PG_offline in the memmap, but simply by e.g., copying the file. Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut. With this series, we'll send one virtio-mem state request for every ~2 MiB chunk of virtio-mem memory indicated in the vmcore that we intend to read/map. In the future, we might want to allow building virtio-mem for kdump mode only, even without CONFIG_MEMORY_HOTPLUG and friends: this way, we could support special stripped-down kdump kernels that have many other config options disabled; we'll tackle that once required. Further, we might want to try sensing bigger blocks (e.g., memory sections) first before falling back to device blocks on demand. Tested with Fedora rawhide, which contains a recent kexec-tools version (considering "System RAM (virtio_mem)" when creating the vmcore header) and a recent dracut version (including the virtio_mem module in the kdump initrd). Link: https://lkml.kernel.org/r/20210526093041.8800-1-david@redhat.com [1] Link: https://github.com/dracutdevs/dracut/pull/1157 [2] Link: https://lkml.kernel.org/r/20211005121430.30136-10-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00
David Hildenbrand	ffc763d0c3	virtio-mem: factor out hotplug specifics from virtio_mem_remove() into virtio_mem_deinit_hotplug() Let's prepare for a new virtio-mem kdump mode in which we don't actually hot(un)plug any memory but only observe the state of device blocks. Link: https://lkml.kernel.org/r/20211005121430.30136-9-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00
David Hildenbrand	84e17e684e	virtio-mem: factor out hotplug specifics from virtio_mem_probe() into virtio_mem_init_hotplug() Let's prepare for a new virtio-mem kdump mode in which we don't actually hot(un)plug any memory but only observe the state of device blocks. Link: https://lkml.kernel.org/r/20211005121430.30136-8-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00
David Hildenbrand	94300fcf4c	virtio-mem: factor out hotplug specifics from virtio_mem_init() into virtio_mem_init_hotplug() Let's prepare for a new virtio-mem kdump mode in which we don't actually hot(un)plug any memory but only observe the state of device blocks. Link: https://lkml.kernel.org/r/20211005121430.30136-7-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00
David Hildenbrand	cc5f2704c9	proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks Let's support multiple registered callbacks, making sure that registering vmcore callbacks cannot fail. Make the callback return a bool instead of an int, handling how to deal with errors internally. Drop unused HAVE_OLDMEM_PFN_IS_RAM. We soon want to make use of this infrastructure from other drivers: virtio-mem, registering one callback for each virtio-mem device, to prevent reading unplugged virtio-mem memory. Handle it via a generic vmcore_cb structure, prepared for future extensions: for example, once we support virtio-mem on s390x where the vmcore is completely constructed in the second kernel, we want to detect and add plugged virtio-mem memory ranges to the vmcore in order for them to get dumped properly. Handle corner cases that are unexpected and shouldn't happen in sane setups: registering a callback after the vmcore has already been opened (warn only) and unregistering a callback after the vmcore has already been opened (warn and essentially read only zeroes from that point on). Link: https://lkml.kernel.org/r/20211005121430.30136-6-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00
David Hildenbrand	2c9feeaedf	proc/vmcore: let pfn_is_ram() return a bool The callback should deal with errors internally, it doesn't make sense to expose these via pfn_is_ram(). We'll rework the callbacks next. Right now we consider errors as if "it's RAM"; no functional change. Link: https://lkml.kernel.org/r/20211005121430.30136-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00
David Hildenbrand	934fadf438	x86/xen: print a warning when HVMOP_get_mem_type fails HVMOP_get_mem_type is not expected to fail, "This call failing is indication of something going quite wrong and it would be good to know about this." [1] Let's add a pr_warn_once(). Link: https://lkml.kernel.org/r/3b935aa0-6d85-0bcd-100e-15098add3c4c@oracle.com [1] Link: https://lkml.kernel.org/r/20211005121430.30136-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00
David Hildenbrand	d452a48949	x86/xen: simplify xen_oldmem_pfn_is_ram() Let's simplify return handling. Link: https://lkml.kernel.org/r/20211005121430.30136-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00
David Hildenbrand	434b90f39e	x86/xen: update xen_oldmem_pfn_is_ram() documentation After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem, this series tackles the last sane way how a VM could accidentially access logically unplugged memory managed by a virtio-mem device: /proc/vmcore When dumping memory via "makedumpfile", PG_offline pages, used by virtio-mem to flag logically unplugged memory, are already properly excluded; however, especially when accessing/copying /proc/vmcore "the usual way", we can still end up reading logically unplugged memory part of a virtio-mem device. Patch #1-#3 are cleanups. Patch #4 extends the existing oldmem_pfn_is_ram mechanism. Patch #5-#7 are virtio-mem refactorings for patch #8, which implements the virtio-mem logic to query the state of device blocks. Patch #8: "Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [...] Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut" This is the last remaining bit to support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of virtio-mem. Note: this is best-effort. We'll never be able to control what runs inside the second kernel, really, but we also don't have to care: we only care about sane setups where we don't want our VM getting zapped once we touch the wrong memory location while dumping. While we usually expect sane setups to use "makedumfile", nothing really speaks against just copying /proc/vmcore, especially in environments where HWpoisioning isn't typically expected. Also, we really don't want to put all our trust completely on the memmap, so sanitizing also makes sense when just using "makedumpfile". [1] https://lkml.kernel.org/r/20210526093041.8800-1-david@redhat.com [2] https://github.com/dracutdevs/dracut/pull/1157 [3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html This patch (of 9): The callback is only used for the vmcore nowadays. Link: https://lkml.kernel.org/r/20211005121430.30136-1-david@redhat.com Link: https://lkml.kernel.org/r/20211005121430.30136-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Boris Ostrovsky <boris.ostrvsky@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Mike Rapoport <rppt@kernel.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00
Florian Weimer	0658a0961b	procfs: do not list TID 0 in /proc/<pid>/task If a task exits concurrently, task_pid_nr_ns may return 0. [akpm@linux-foundation.org: coding style tweaks] [adobriyan@gmail.com: test that /proc/*/task doesn't contain "0"] Link: https://lkml.kernel.org/r/YV88AnVzHxPafQ9o@localhost.localdomain Link: https://lkml.kernel.org/r/8735pn5dx7.fsf@oldenburg.str.redhat.com Signed-off-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00
zhangyiru	83c1fd763b	mm,hugetlb: remove mlock ulimit for SHM_HUGETLB Commit `21a3c273f8` ("mm, hugetlb: add thread name and pid to SHM_HUGETLB mlock rlimit warning") marked this as deprecated in 2012, but it is not deleted yet. Mike says he still sees that message in log files on occasion, so maybe we should preserve this warning. Also remove hugetlbfs related user_shm_unlock in ipc/shm.c and remove the user_shm_unlock after out. Link: https://lkml.kernel.org/r/20211103105857.25041-1-zhangyiru3@huawei.com Signed-off-by: zhangyiru <zhangyiru3@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Liu Zixian <liuzixian4@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: wuxu.wu <wuxu.wu@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00
Johannes Weiner	51b8c1fe25	vfs: keep inodes with page cache off the inode shrinker LRU Historically (pre-2.5), the inode shrinker used to reclaim only empty inodes and skip over those that still contained page cache. This caused problems on highmem hosts: struct inode could put fill lowmem zones before the cache was getting reclaimed in the highmem zones. To address this, the inode shrinker started to strip page cache to facilitate reclaiming lowmem. However, this comes with its own set of problems: the shrinkers may drop actively used page cache just because the inodes are not currently open or dirty - think working with a large git tree. It further doesn't respect cgroup memory protection settings and can cause priority inversions between containers. Nowadays, the page cache also holds non-resident info for evicted cache pages in order to detect refaults. We've come to rely heavily on this data inside reclaim for protecting the cache workingset and driving swap behavior. We also use it to quantify and report workload health through psi. The latter in turn is used for fleet health monitoring, as well as driving automated memory sizing of workloads and containers, proactive reclaim and memory offloading schemes. The consequences of dropping page cache prematurely is that we're seeing subtle and not-so-subtle failures in all of the above-mentioned scenarios, with the workload generally entering unexpected thrashing states while losing the ability to reliably detect it. To fix this on non-highmem systems at least, going back to rotating inodes on the LRU isn't feasible. We've tried (commit `a76cf1a474` ("mm: don't reclaim inodes with many attached pages")) and failed (commit `69056ee6a8` ("Revert "mm: don't reclaim inodes with many attached pages"")). The issue is mostly that shrinker pools attract pressure based on their size, and when objects get skipped the shrinkers remember this as deferred reclaim work. This accumulates excessive pressure on the remaining inodes, and we can quickly eat into heavily used ones, or dirty ones that require IO to reclaim, when there potentially is plenty of cold, clean cache around still. Instead, this patch keeps populated inodes off the inode LRU in the first place - just like an open file or dirty state would. An otherwise clean and unused inode then gets queued when the last cache entry disappears. This solves the problem without reintroducing the reclaim issues, and generally is a bit more scalable than having to wade through potentially hundreds of thousands of busy inodes. Locking is a bit tricky because the locks protecting the inode state (i_lock) and the inode LRU (lru_list.lock) don't nest inside the irq-safe page cache lock (i_pages.xa_lock). Page cache deletions are serialized through i_lock, taken before the i_pages lock, to make sure depopulated inodes are queued reliably. Additions may race with deletions, but we'll check again in the shrinker. If additions race with the shrinker itself, we're protected by the i_lock: if find_inode() or iput() win, the shrinker will bail on the elevated i_count or I_REFERENCED; if the shrinker wins and goes ahead with the inode, it will set I_FREEING and inhibit further igets(), which will cause the other side to create a new instance of the inode instead. Link: https://lkml.kernel.org/r/20210614211904.14420-4-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:48 -08:00
Ming Lei	26af1cd003	nvme: wait until quiesce is done NVMe uses one atomic flag to check if quiesce is needed. If quiesce is started, the helper returns immediately. This way is wrong, since we have to wait until quiesce is done. Fixes: `e70feb8b3e` ("blk-mq: support concurrent queue quiesce/unquiesce") Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211109071144.181581-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-09 08:14:27 -07:00
Ming Lei	93542fbfa7	scsi: make sure that request queue queiesce and unquiesce balanced For fixing queue quiesce race between driver and block layer(elevator switch, update nr_requests, ...), we need to support concurrent quiesce and unquiesce, which requires the two call balanced. It isn't easy to audit that in all scsi drivers, especially the two may be called from different contexts, so do it in scsi core with one per-device atomic variable to balance quiesce and unquiesce. Reported-by: Yi Zhang <yi.zhang@redhat.com> Fixes: `e70feb8b3e` ("blk-mq: support concurrent queue quiesce/unquiesce") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211109071144.181581-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-09 08:14:27 -07:00
Ming Lei	d2b9f12b0f	scsi: avoid to quiesce sdev->request_queue two times For fixing queue quiesce race between driver and block layer(elevator switch, update nr_requests, ...), we need to support concurrent quiesce and unquiesce, which requires the two to be balanced. blk_mq_quiesce_queue() calls blk_mq_quiesce_queue_nowait() for updating quiesce depth and marking the flag, then scsi_internal_device_block() calls blk_mq_quiesce_queue_nowait() two times actually. Fix the double quiesce and keep quiesce and unquiesce balanced. Reported-by: Yi Zhang <yi.zhang@redhat.com> Fixes: `e70feb8b3e` ("blk-mq: support concurrent queue quiesce/unquiesce") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211109071144.181581-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-09 08:14:27 -07:00
Ming Lei	9ef4d0209c	blk-mq: add one API for waiting until quiesce is done Some drivers(NVMe, SCSI) need to call quiesce and unquiesce in pair, but it is hard to switch to this style, so these drivers need one atomic flag for helping to balance quiesce and unquiesce. When quiesce is in-progress, the driver still needs to wait until the quiesce is done, so add API of blk_mq_wait_quiesce_done() for these drivers. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211109071144.181581-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-09 08:14:27 -07:00
Arnd Bergmann	9758aba854	amt: add IPV6 Kconfig dependency This driver cannot be built-in if IPV6 is a loadable module: x86_64-linux-ld: drivers/net/amt.o: in function `amt_build_mld_gq': amt.c:(.text+0x2e7d): undefined reference to `ipv6_dev_get_saddr' Add the idiomatic Kconfig dependency that all such modules have. Fixes: `b9022b53ad` ("amt: add control plane of amt interface") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-09 14:00:13 +00:00
Dan Carpenter	1c360cc1cc	gve: Fix off by one in gve_tx_timeout() The priv->ntfy_blocks[] has "priv->num_ntfy_blks" elements so this > needs to be >= to prevent an off by one bug. The priv->ntfy_blocks[] array is allocated in gve_alloc_notify_blocks(). Fixes: `87a7f321bb` ("gve: Recover from queue stall due to missed IRQ") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-09 13:58:46 +00:00
Filipe Manana	51bd9563b6	btrfs: fix deadlock due to page faults during direct IO reads and writes If we do a direct IO read or write when the buffer given by the user is memory mapped to the file range we are going to do IO, we end up ending in a deadlock. This is triggered by the new test case generic/647 from fstests. For a direct IO read we get a trace like this: [967.872718] INFO: task mmap-rw-fault:12176 blocked for more than 120 seconds. [967.874161] Not tainted 5.14.0-rc7-btrfs-next-95 #1 [967.874909] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [967.875983] task:mmap-rw-fault state:D stack: 0 pid:12176 ppid: 11884 flags:0x00000000 [967.875992] Call Trace: [967.875999] __schedule+0x3ca/0xe10 [967.876015] schedule+0x43/0xe0 [967.876020] wait_extent_bit.constprop.0+0x1eb/0x260 [btrfs] [967.876109] ? do_wait_intr_irq+0xb0/0xb0 [967.876118] lock_extent_bits+0x37/0x90 [btrfs] [967.876150] btrfs_lock_and_flush_ordered_range+0xa9/0x120 [btrfs] [967.876184] ? extent_readahead+0xa7/0x530 [btrfs] [967.876214] extent_readahead+0x32d/0x530 [btrfs] [967.876253] ? lru_cache_add+0x104/0x220 [967.876255] ? kvm_sched_clock_read+0x14/0x40 [967.876258] ? sched_clock_cpu+0xd/0x110 [967.876263] ? lock_release+0x155/0x4a0 [967.876271] read_pages+0x86/0x270 [967.876274] ? lru_cache_add+0x125/0x220 [967.876281] page_cache_ra_unbounded+0x1a3/0x220 [967.876291] filemap_fault+0x626/0xa20 [967.876303] __do_fault+0x36/0xf0 [967.876308] __handle_mm_fault+0x83f/0x15f0 [967.876322] handle_mm_fault+0x9e/0x260 [967.876327] __get_user_pages+0x204/0x620 [967.876332] ? get_user_pages_unlocked+0x69/0x340 [967.876340] get_user_pages_unlocked+0xd3/0x340 [967.876349] internal_get_user_pages_fast+0xbca/0xdc0 [967.876366] iov_iter_get_pages+0x8d/0x3a0 [967.876374] bio_iov_iter_get_pages+0x82/0x4a0 [967.876379] ? lock_release+0x155/0x4a0 [967.876387] iomap_dio_bio_actor+0x232/0x410 [967.876396] iomap_apply+0x12a/0x4a0 [967.876398] ? iomap_dio_rw+0x30/0x30 [967.876414] __iomap_dio_rw+0x29f/0x5e0 [967.876415] ? iomap_dio_rw+0x30/0x30 [967.876420] ? lock_acquired+0xf3/0x420 [967.876429] iomap_dio_rw+0xa/0x30 [967.876431] btrfs_file_read_iter+0x10b/0x140 [btrfs] [967.876460] new_sync_read+0x118/0x1a0 [967.876472] vfs_read+0x128/0x1b0 [967.876477] __x64_sys_pread64+0x90/0xc0 [967.876483] do_syscall_64+0x3b/0xc0 [967.876487] entry_SYSCALL_64_after_hwframe+0x44/0xae [967.876490] RIP: 0033:0x7fb6f2c038d6 [967.876493] RSP: 002b:00007fffddf586b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000011 [967.876496] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007fb6f2c038d6 [967.876498] RDX: 0000000000001000 RSI: 00007fb6f2c17000 RDI: 0000000000000003 [967.876499] RBP: 0000000000001000 R08: 0000000000000003 R09: 0000000000000000 [967.876501] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000003 [967.876502] R13: 0000000000000000 R14: 00007fb6f2c17000 R15: 0000000000000000 This happens because at btrfs_dio_iomap_begin() we lock the extent range and return with it locked - we only unlock in the endio callback, at end_bio_extent_readpage() -> endio_readpage_release_extent(). Then after iomap called the btrfs_dio_iomap_begin() callback, it triggers the page faults that resulting in reading the pages, through the readahead callback btrfs_readahead(), and through there we end to attempt to lock again the same extent range (or a subrange of what we locked before), resulting in the deadlock. For a direct IO write, the scenario is a bit different, and it results in trace like this: [1132.442520] run fstests generic/647 at 2021-08-31 18:53:35 [1330.349355] INFO: task mmap-rw-fault:184017 blocked for more than 120 seconds. [1330.350540] Not tainted 5.14.0-rc7-btrfs-next-95 #1 [1330.351158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1330.351900] task:mmap-rw-fault state:D stack: 0 pid:184017 ppid:183725 flags:0x00000000 [1330.351906] Call Trace: [1330.351913] __schedule+0x3ca/0xe10 [1330.351930] schedule+0x43/0xe0 [1330.351935] btrfs_start_ordered_extent+0x108/0x1c0 [btrfs] [1330.352020] ? do_wait_intr_irq+0xb0/0xb0 [1330.352028] btrfs_lock_and_flush_ordered_range+0x8c/0x120 [btrfs] [1330.352064] ? extent_readahead+0xa7/0x530 [btrfs] [1330.352094] extent_readahead+0x32d/0x530 [btrfs] [1330.352133] ? lru_cache_add+0x104/0x220 [1330.352135] ? kvm_sched_clock_read+0x14/0x40 [1330.352138] ? sched_clock_cpu+0xd/0x110 [1330.352143] ? lock_release+0x155/0x4a0 [1330.352151] read_pages+0x86/0x270 [1330.352155] ? lru_cache_add+0x125/0x220 [1330.352162] page_cache_ra_unbounded+0x1a3/0x220 [1330.352172] filemap_fault+0x626/0xa20 [1330.352176] ? filemap_map_pages+0x18b/0x660 [1330.352184] __do_fault+0x36/0xf0 [1330.352189] __handle_mm_fault+0x1253/0x15f0 [1330.352203] handle_mm_fault+0x9e/0x260 [1330.352208] __get_user_pages+0x204/0x620 [1330.352212] ? get_user_pages_unlocked+0x69/0x340 [1330.352220] get_user_pages_unlocked+0xd3/0x340 [1330.352229] internal_get_user_pages_fast+0xbca/0xdc0 [1330.352246] iov_iter_get_pages+0x8d/0x3a0 [1330.352254] bio_iov_iter_get_pages+0x82/0x4a0 [1330.352259] ? lock_release+0x155/0x4a0 [1330.352266] iomap_dio_bio_actor+0x232/0x410 [1330.352275] iomap_apply+0x12a/0x4a0 [1330.352278] ? iomap_dio_rw+0x30/0x30 [1330.352292] __iomap_dio_rw+0x29f/0x5e0 [1330.352294] ? iomap_dio_rw+0x30/0x30 [1330.352306] btrfs_file_write_iter+0x238/0x480 [btrfs] [1330.352339] new_sync_write+0x11f/0x1b0 [1330.352344] ? NF_HOOK_LIST.constprop.0.cold+0x31/0x3e [1330.352354] vfs_write+0x292/0x3c0 [1330.352359] __x64_sys_pwrite64+0x90/0xc0 [1330.352365] do_syscall_64+0x3b/0xc0 [1330.352369] entry_SYSCALL_64_after_hwframe+0x44/0xae [1330.352372] RIP: 0033:0x7f4b0a580986 [1330.352379] RSP: 002b:00007ffd34d75418 EFLAGS: 00000246 ORIG_RAX: 0000000000000012 [1330.352382] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f4b0a580986 [1330.352383] RDX: 0000000000001000 RSI: 00007f4b0a3a4000 RDI: 0000000000000003 [1330.352385] RBP: 00007f4b0a3a4000 R08: 0000000000000003 R09: 0000000000000000 [1330.352386] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 [1330.352387] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Unlike for reads, at btrfs_dio_iomap_begin() we return with the extent range unlocked, but later when the page faults are triggered and we try to read the extents, we end up btrfs_lock_and_flush_ordered_range() where we find the ordered extent for our write, created by the iomap callback btrfs_dio_iomap_begin(), and we wait for it to complete, which makes us deadlock since we can't complete the ordered extent without reading the pages (the iomap code only submits the bio after the pages are faulted in). Fix this by setting the nofault attribute of the given iov_iter and retry the direct IO read/write if we get an -EFAULT error returned from iomap. For reads, also disable page faults completely, this is because when we read from a hole or a prealloc extent, we can still trigger page faults due to the call to iov_iter_zero() done by iomap - at the moment, it is oblivious to the value of the ->nofault attribute of an iov_iter. We also need to keep track of the number of bytes written or read, and pass it to iomap_dio_rw(), as well as use the new flag IOMAP_DIO_PARTIAL. This depends on the iov_iter and iomap changes introduced in commit `c03098d4b9` ("Merge tag 'gfs2-v5.15-rc5-mmap-fault' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2"). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-11-09 13:46:07 +01:00
Lin Ma	0b9111922b	hamradio: defer 6pack kfree after unregister_netdev There is a possible race condition (use-after-free) like below (USE) \| (FREE) dev_queue_xmit \| __dev_queue_xmit \| __dev_xmit_skb \| sch_direct_xmit \| ... xmit_one \| netdev_start_xmit \| tty_ldisc_kill __netdev_start_xmit \| 6pack_close sp_xmit \| kfree sp_encaps \| \| According to the patch "defer ax25 kfree after unregister_netdev", this patch reorder the kfree after the unregister_netdev to avoid the possible UAF as the unregister_netdev() is well synchronized and won't return if there is a running routine. Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-09 11:49:12 +00:00
Lin Ma	3e0588c291	hamradio: defer ax25 kfree after unregister_netdev There is a possible race condition (use-after-free) like below (USE) \| (FREE) ax25_sendmsg \| ax25_queue_xmit \| dev_queue_xmit \| __dev_queue_xmit \| __dev_xmit_skb \| sch_direct_xmit \| ... xmit_one \| netdev_start_xmit \| tty_ldisc_kill __netdev_start_xmit \| mkiss_close ax_xmit \| kfree ax_encaps \| \| Even though there are two synchronization primitives before the kfree: 1. wait_for_completion(&ax->dead). This can prevent the race with routines from mkiss_ioctl. However, it cannot stop the routine coming from upper layer, i.e., the ax25_sendmsg. 2. netif_stop_queue(ax->dev). It seems that this line of code aims to halt the transmit queue but it fails to stop the routine that already being xmit. This patch reorder the kfree after the unregister_netdev to avoid the possible UAF as the unregister_netdev() is well synchronized and won't return if there is a running routine. Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-09 11:48:39 +00:00
Jean Sacren	54f0bad668	net: sungem_phy: fix code indentation Remove extra space in front of the return statement. Fixes: `eb5b5b2ff9` ("sungem_phy: support bcm5461 phy, autoneg.") Signed-off-by: Jean Sacren <sakiwit@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-09 11:45:54 +00:00
Kishon Vijay Abraham I	eb91224e47	dmaengine: ti: k3-udma: Set r/tchan or rflow to NULL if request fail udma_get_() checks if rchan/tchan/rflow is already allocated by checking if it has a NON NULL value. For the error cases, rchan/tchan/rflow will have error value and udma_get_() considers this as already allocated (PASS) since the error values are NON NULL. This results in NULL pointer dereference error while de-referencing rchan/tchan/rflow. Reset the value of rchan/tchan/rflow to NULL if a channel request fails. CC: stable@vger.kernel.org Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Link: https://lore.kernel.org/r/20211031032411.27235-3-kishon@ti.com Signed-off-by: Vinod Koul <vkoul@kernel.org>	2021-11-09 11:24:06 +05:30
Kishon Vijay Abraham I	5c6c6d60e4	dmaengine: ti: k3-udma: Set bchan to NULL if a channel request fail bcdma_get_() checks if bchan is already allocated by checking if it has a NON NULL value. For the error cases, bchan will have error value and bcdma_get_() considers this as already allocated (PASS) since the error values are NON NULL. This results in NULL pointer dereference error while de-referencing bchan. Reset the value of bchan to NULL if a channel request fails. CC: stable@vger.kernel.org Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Link: https://lore.kernel.org/r/20211031032411.27235-2-kishon@ti.com Signed-off-by: Vinod Koul <vkoul@kernel.org>	2021-11-09 11:24:06 +05:30
Arnd Bergmann	2498363310	dmaengine: stm32-dma: avoid 64-bit division in stm32_dma_get_max_width Using the % operator on a 64-bit variable is expensive and can cause a link failure: arm-linux-gnueabi-ld: drivers/dma/stm32-dma.o: in function `stm32_dma_get_max_width': stm32-dma.c:(.text+0x170): undefined reference to `__aeabi_uldivmod' arm-linux-gnueabi-ld: drivers/dma/stm32-dma.o: in function `stm32_dma_set_xfer_param': stm32-dma.c:(.text+0x1cd4): undefined reference to `__aeabi_uldivmod' As we know that we just want to check the alignment in stm32_dma_get_max_width(), there is no need for a full division, and using a simple mask is a faster replacement. Same in stm32_dma_set_xfer_param(), change this to only allow burst transfers if the address is a multiple of the length. stm32_dma_get_best_burst just after will take buf_len into account to fix burst in case of misalignment. Fixes: `b20fd5fa31` ("dmaengine: stm32-dma: fix stm32_dma_get_max_width") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Amelie Delaunay <amelie.delaunay@foss.st.com> Link: https://lore.kernel.org/r/20211103153312.41483-1-amelie.delaunay@foss.st.com Signed-off-by: Vinod Koul <vkoul@kernel.org>	2021-11-09 11:20:35 +05:30
Jussi Maki	b2c4618162	bpf, sockmap: sk_skb data_end access incorrect when src_reg = dst_reg The current conversion of skb->data_end reads like this: ; data_end = (void)(long)skb->data_end; 559: (79) r1 = (u64 )(r2 +200) ; r1 = skb->data 560: (61) r11 = (u32 )(r2 +112) ; r11 = skb->len 561: (0f) r1 += r11 562: (61) r11 = (u32 )(r2 +116) 563: (1f) r1 -= r11 But similar to the case in `84f44df664` ("bpf: sock_ops sk access may stomp registers when dst_reg = src_reg"), the code will read an incorrect skb->len when src == dst. In this case we end up generating this xlated code: ; data_end = (void)(long)skb->data_end; 559: (79) r1 = (u64 )(r1 +200) ; r1 = skb->data 560: (61) r11 = (u32 )(r1 +112) ; r11 = (skb->data)->len 561: (0f) r1 += r11 562: (61) r11 = (u32 )(r1 +116) 563: (1f) r1 -= r11 ... where line 560 is the reading 4B of (skb->data + 112) instead of the intended skb->len Here the skb pointer in r1 gets set to skb->data and the later deref for skb->len ends up following skb->data instead of skb. This fixes the issue similarly to the patch mentioned above by creating an additional temporary variable and using to store the register when dst_reg = src_reg. We name the variable bpf_temp_reg and place it in the cb context for sk_skb. Then we restore from the temp to ensure nothing is lost. Fixes: `16137b09a6` ("bpf: Compute data_end dynamically with JIT code") Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20211103204736.248403-6-john.fastabend@gmail.com	2021-11-09 01:05:34 +01:00
John Fastabend	e0dc3b93bd	bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding Strparser is reusing the qdisc_skb_cb struct to stash the skb message handling progress, e.g. offset and length of the skb. First this is poorly named and inherits a struct from qdisc that doesn't reflect the actual usage of cb[] at this layer. But, more importantly strparser is using the following to access its metadata. (struct _strp_msg )((void )skb->cb + offsetof(struct qdisc_skb_cb, data)) Where _strp_msg is defined as: struct _strp_msg { struct strp_msg strp; /* 0 8 / int accum_len; / 8 4 / / size: 12, cachelines: 1, members: 2 / / last cacheline: 12 bytes */ }; So we use 12 bytes of ->data[] in struct. However in BPF code running parser and verdict the user has read capabilities into the data[] array as well. Its not too problematic, but we should not be exposing internal state to BPF program. If its really needed then we can use the probe_read() APIs which allow reading kernel memory. And I don't believe cb[] layer poses any API breakage by moving this around because programs can't depend on cb[] across layers. In order to fix another issue with a ctx rewrite we need to stash a temp variable somewhere. To make this work cleanly this patch builds a cb struct for sk_skb types called sk_skb_cb struct. Then we can use this consistently in the strparser, sockmap space. Additionally we can start allowing ->cb[] write access after this. Fixes: `604326b41a` ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20211103204736.248403-5-john.fastabend@gmail.com	2021-11-09 01:05:28 +01:00
John Fastabend	c5d2177a72	bpf, sockmap: Fix race in ingress receive verdict with redirect to self A socket in a sockmap may have different combinations of programs attached depending on configuration. There can be no programs in which case the socket acts as a sink only. There can be a TX program in this case a BPF program is attached to sending side, but no RX program is attached. There can be an RX program only where sends have no BPF program attached, but receives are hooked with BPF. And finally, both TX and RX programs may be attached. Giving us the permutations: None, Tx, Rx, and TxRx To date most of our use cases have been TX case being used as a fast datapath to directly copy between local application and a userspace proxy. Or Rx cases and TxRX applications that are operating an in kernel based proxy. The traffic in the first case where we hook applications into a userspace application looks like this: AppA redirect AppB Tx <-----------> Rx \| \| + + TCP <--> lo <--> TCP In this case all traffic from AppA (after 3whs) is copied into the AppB ingress queue and no traffic is ever on the TCP recieive_queue. In the second case the application never receives, except in some rare error cases, traffic on the actual user space socket. Instead the send happens in the kernel. AppProxy socket pool sk0 ------------->{sk1,sk2, skn} ^ \| \| \| \| v ingress lb egress TCP TCP Here because traffic is never read off the socket with userspace recv() APIs there is only ever one reader on the sk receive_queue. Namely the BPF programs. However, we've started to introduce a third configuration where the BPF program on receive should process the data, but then the normal case is to push the data into the receive queue of AppB. AppB recv() (userspace) ----------------------- tcp_bpf_recvmsg() (kernel) \| \| \| \| \| \| ingress_msgQ \| \| \| RX_BPF \| \| \| v v sk->receive_queue This is different from the App{A,B} redirect because traffic is first received on the sk->receive_queue. Now for the issue. The tcp_bpf_recvmsg() handler first checks the ingress_msg queue for any data handled by the BPF rx program and returned with PASS code so that it was enqueued on the ingress msg queue. Then if no data exists on that queue it checks the socket receive queue. Unfortunately, this is the same receive_queue the BPF program is reading data off of. So we get a race. Its possible for the recvmsg() hook to pull data off the receive_queue before the BPF hook has a chance to read it. It typically happens when an application is banging on recv() and getting EAGAINs. Until they manage to race with the RX BPF program. To fix this we note that before this patch at attach time when the socket is loaded into the map we check if it needs a TX program or just the base set of proto bpf hooks. Then it uses the above general RX hook regardless of if we have a BPF program attached at rx or not. This patch now extends this check to handle all cases enumerated above, TX, RX, TXRX, and none. And to fix above race when an RX program is attached we use a new hook that is nearly identical to the old one except now we do not let the recv() call skip the RX BPF program. Now only the BPF program pulls data from sk->receive_queue and recv() only pulls data from the ingress msgQ post BPF program handling. With this resolved our AppB from above has been up and running for many hours without detecting any errors. We do this by correlating counters in RX BPF events and the AppB to ensure data is never skipping the BPF program. Selftests, was not able to detect this because we only run them for a short period of time on well ordered send/recvs so we don't get any of the noise we see in real application environments. Fixes: `51199405f9` ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20211103204736.248403-4-john.fastabend@gmail.com	2021-11-09 00:58:26 +01:00
John Fastabend	b8b8315e39	bpf, sockmap: Remove unhash handler for BPF sockmap usage We do not need to handle unhash from BPF side we can simply wait for the close to happen. The original concern was a socket could transition from ESTABLISHED state to a new state while the BPF hook was still attached. But, we convinced ourself this is no longer possible and we also improved BPF sockmap to handle listen sockets so this is no longer a problem. More importantly though there are cases where unhash is called when data is in the receive queue. The BPF unhash logic will flush this data which is wrong. To be correct it should keep the data in the receive queue and allow a receiving application to continue reading the data. This may happen when tcp_abort() is received for example. Instead of complicating the logic in unhash simply moving all this to tcp_close() hook solves this. Fixes: `51199405f9` ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20211103204736.248403-3-john.fastabend@gmail.com	2021-11-09 00:57:19 +01:00
John Fastabend	40a34121ac	bpf, sockmap: Use stricter sk state checks in sk_lookup_assign In order to fix an issue with sockets in TCP sockmap redirect cases we plan to allow CLOSE state sockets to exist in the sockmap. However, the check in bpf_sk_lookup_assign() currently only invalidates sockets in the TCP_ESTABLISHED case relying on the checks on sockmap insert to ensure we never SOCK_CLOSE state sockets in the map. To prepare for this change we flip the logic in bpf_sk_lookup_assign() to explicitly test for the accepted cases. Namely, a tcp socket in TCP_LISTEN or a udp socket in TCP_CLOSE state. This also makes the code more resilent to future changes. Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20211103204736.248403-2-john.fastabend@gmail.com	2021-11-09 00:56:35 +01:00
Linus Torvalds	d2f38a3c65	- Fix-ups - Standardise _exit() and _remove() return values; ili9320, vgg2432a4 - Bug Fixes - Do not override maximum brightness; backlight - Propagate errors from get_brightness(); backlight -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdrbJNaO+IJqU8IdIUa+KL4f8d2EFAmGJTUMACgkQUa+KL4f8 d2FQ+w/+PMsuCZwyf1tcIfUTZJA5y5n8vnsLLzIkkjLlnaNbtD+y85gZzLWa0CgC H3nyKrcV/rbaTC5mAM+meKcmDxfbCGYDL7bleIZ2aQd1YtmvGnmspGnNwl8vHTjA DN8eagNssW+V1i/uPbKN76RRaOMleIwm/zjA4TZATMs6Q1MddDeJN4g5mIqbcd1i k5aE2Jc9ZqPuE18w7I5gOuibGo8If1IroeTJ1V2/npJJhlWAA+nJzH2bGG7I1OQm 4AjrJ+n8twKhAbaFhi3IYcQhpbGACGo/HZ3SGS3sPjlqo2+/vdrsTqjCkiQREfZ1 5QmOHk1/X0bFTcO5u6rhldq3qbS3qg7DwRzTb8LmY3bcPVHj5JD8jJcbnHgzXkOz UncS0Y5sMImn+BKDVVvsosMH45XnZtngyYEEbgpzMI3JXGf9LxilxCVTZ8rrfogF TYbM+i1bzs3Tvl3mKaecSqvsKJgJUVNKFOSb0oGlXd9WaEb3WmTgFi6CfATeokif buTdPGt2rD2WcanNFV8A32z3byuvS8C7EXRkprhbUVjkO//rOam7HgR8SBP6rkRo IYs2CUAviawfiCypGPUCjBz6JwgcG9uQrdpFBrWEp26dkx+m/MP5OvkuwgjIPNrI 4Bu12kU1mTRrR9CenJP1Ka/1fKQbQaFsNxiKM3C7SYOnt90XMRs= =PbxD -----END PGP SIGNATURE----- Merge tag 'backlight-next-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight Pull backlight updates from Lee Jones: "Fix-ups: - Standardise _exit() and _remove() return values in ili9320 and vgg2432a4 Bug Fixes: - Do not override maximum brightness - Propagate errors from get_brightness()" * tag 'backlight-next-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight: video: backlight: ili9320: Make ili9320_remove() return void backlight: Propagate errors from get_brightness() video: backlight: Drop maximum brightness override for brightness zero	2021-11-08 12:21:28 -08:00
Linus Torvalds	3a9b0a46e1	- Remove Drivers - Remove support for TI TPS80031/TPS80032 PMICs - New Device Support - Add support for Magnetic Reader to TI AM335x - Add support for DA9063_EA to Dialog DA9063 - Add support for SC2730 PMIC to Spreadtrum SC27xx - Add support for MacBookPro16,2 ICL-N UART Intel LPSS PCI - Add support for lots of new PMICS in QCom SPMI PMIC - Add support for ADC to Diolan DLN2 - New Functionality - Add support for Power Off to Rockchip RK817 - Fix-ups - Simplify Regmap passing to child devices; hi6421-spmi-pmic - SPDX licensing updates; ti_am335x_tscadc - Improve error handling; ti_am335x_tscadc - Expedite clock search; ti_am335x_tscadc - Generic simplifications; ti_am335x_tscadc - Use generic macros/defines; ti_am335x_tscadc - Remove unused code; ti_am335x_tscadc, cros_ec_dev - Convert to GPIOD; wcd934x - Add namespacing; ti_am335x_tscadc - Restrict compilation to relevant arches; intel_pmt - Provide better description/documentation; exynos_lpass - Add SPI device ID table; altera-a10sr, motorola-cpcap, sprd-sc27xx-spi - Change IRQ handling; qcom-pm8xxx - Split out I2C and SPI code; arizona - Explicitly include used headers; altera-a10sr - Convert sysfs show() function to; sysfs_emit - Standardise _exit() and _remove() return values; mc13xxx, stmpe, tps65912 - Trivial (style/spelling/whitespace) fixups; ti_am335x_tscadc, qcom-spmi-pmic, max77686-private - Device Tree fix-ups; ti,am3359-tscadc, samsung,s2mps11, samsung,s2mpa01, samsung,s5m8767, brcm,misc, brcm,cru, syscon, qcom,tcsr, xylon,logicvc, max77686, x-powers,ac100, x-powers,axp152, x-powers,axp209-gpio, syscon, qcom,spmi-pmic - Bug Fixes - Balance refcounting (get/put); ti_am335x_tscadc, mfd-core - Fix IRQ trigger type; sec-irq, max77693, max14577 - Repair off-by-one; altera-sysmgr - Add explicit 'select MFD_CORE' to MFD_SIMPLE_MFD_I2C -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEdrbJNaO+IJqU8IdIUa+KL4f8d2EFAmGJTIAACgkQUa+KL4f8 d2FYsRAAhcTUP7PH5gWko1mQnCzh6h3Q7iQ1MHEokZgIvqc/U2Zmxu57cF9f3jOt goZdVsU7x6qiMD4SfmInyEp32Emo1pbUTVz6kB3o0G+YACPHOU17xyKuh0FnzQkm yu/EbEDYNPbNWx9BTA9wgjSOTzCrKMBSd/p9zPzq9M69ihAf2uE9sn5Hbmso1Pdu tSJ7XYqWVwYzZh8OVzQd6lEIDkA+o+/gR4nCgxqAvGiXQq6yVVOCpnNzj4GrAcep hkuQVkg14+rmXRbLiZsmc1V+yT13bueKu2fD96gMFpXI8NkR1KZ6QRInI6FtJcl/ m2LGPUuICpd2IiKRa1XtXFZWcMbZ2JVjJSWArgfHj7YBs9+0KcRsbpfHHirpcf14 9LFy4TzjX2A1K0vvKhHSTAhh13HFcvWyd0GCrEhLRmapeiLDXohkUHGMVFVedXzE tQLCEByjcL+/OCJiQ4Jwk1aaU2cAVEXtvYuciXcBOtHkfaQR/bOYwjRm4Z3AdZyU zLYMkw/LWvzAaV3Rh1zP6W47WLFHbeMgTmApFOSxAbRsmun0loasVzXWrkvxZlYF p39l4UcSOIK08PzxqF9ZEM/LtUglShbZbg2wf0VSHzomA+oIsxT7fN16vPHLYDYL tsQ5fYVN0a3j4ltKFeQl7l2HV/ZzUI/Q6iGmMia5sFbwRN8tlZM= =SJ7N -----END PGP SIGNATURE----- Merge tag 'mfd-next-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD updates from Lee Jones: "Removed Drivers: - Remove support for TI TPS80031/TPS80032 PMICs New Device Support: - Add support for Magnetic Reader to TI AM335x - Add support for DA9063_EA to Dialog DA9063 - Add support for SC2730 PMIC to Spreadtrum SC27xx - Add support for MacBookPro16,2 ICL-N UART Intel LPSS PCI - Add support for lots of new PMICS in QCom SPMI PMIC - Add support for ADC to Diolan DLN2 New Functionality: - Add support for Power Off to Rockchip RK817 Fix-ups: - Simplify Regmap passing to child devices in hi6421-spmi-pmic - SPDX licensing updates in ti_am335x_tscadc - Improve error handling in ti_am335x_tscadc - Expedite clock search in ti_am335x_tscadc - Generic simplifications in ti_am335x_tscadc - Use generic macros/defines in ti_am335x_tscadc - Remove unused code in ti_am335x_tscadc, cros_ec_dev - Convert to GPIOD in wcd934x - Add namespacing in ti_am335x_tscadc - Restrict compilation to relevant arches in intel_pmt - Provide better description/documentation in exynos_lpass - Add SPI device ID table in altera-a10sr, motorola-cpcap, sprd-sc27xx-spi - Change IRQ handling in qcom-pm8xxx - Split out I2C and SPI code in arizona - Explicitly include used headers in altera-a10sr - Convert sysfs show() function to in sysfs_emit - Standardise _exit() and _remove() return values in mc13xxx, stmpe, tps65912 - Trivial (style/spelling/whitespace) fixups in ti_am335x_tscadc, qcom-spmi-pmic, max77686-private - Device Tree fix-ups in ti,am3359-tscadc, samsung,s2mps11, samsung,s2mpa01, samsung,s5m8767, brcm,misc, brcm,cru, syscon, qcom,tcsr, xylon,logicvc, max77686, x-powers,ac100, x-powers,axp152, x-powers,axp209-gpio, syscon, qcom,spmi-pmic Bug Fixes: - Balance refcounting (get/put) in ti_am335x_tscadc, mfd-core - Fix IRQ trigger type in sec-irq, max77693, max14577 - Repair off-by-one in altera-sysmgr - Add explicit 'select MFD_CORE' to MFD_SIMPLE_MFD_I2C" * tag 'mfd-next-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (95 commits) mfd: simple-mfd-i2c: Select MFD_CORE to fix build error mfd: tps80031: Remove driver mfd: max77686: Correct tab-based alignment of register addresses mfd: wcd934x: Replace legacy gpio interface for gpiod dt-bindings: mfd: qcom: pm8xxx: Add pm8018 compatible mfd: dln2: Add cell for initializing DLN2 ADC mfd: qcom-spmi-pmic: Add missing PMICs supported by socinfo mfd: qcom-spmi-pmic: Document ten more PMICs in the binding mfd: qcom-spmi-pmic: Sort compatibles in the driver mfd: qcom-spmi-pmic: Sort the compatibles in the binding mfd: janz-cmoio: Replace snprintf in show functions with sysfs_emit mfd: altera-a10sr: Include linux/module.h mfd: tps65912: Make tps65912_device_exit() return void mfd: stmpe: Make stmpe_remove() return void mfd: mc13xxx: Make mc13xxx_common_exit() return void dt-bindings: mfd: syscon: Add samsung,exynosautov9-sysreg compatible mfd: altera-sysmgr: Fix a mistake caused by resource_size conversion dt-bindings: gpio: Convert X-Powers AXP209 GPIO binding to a schema dt-bindings: mfd: syscon: Add rk3368 QoS register compatible mfd: arizona: Split of_match table into I2C and SPI versions ...	2021-11-08 12:07:52 -08:00
Linus Torvalds	d20f7a09e5	gpio updates for v5.16 - new driver: gpio-modepin (plus relevant change in zynqmp firmware) - add interrupt support to gpio-virtio - enable the 'gpio-line-names' property in the DT bindings for gpio-rockchip - use the subsystem helpers where applicable in gpio-uniphier instead of accessing IRQ structures directly - code shrink in gpio-xilinx - add interrupt to gpio-mlxbf2 (and include the removal of custom interrupt code from the mellanox ethernet driver) - support multiple interrupts per bank in gpio-tegra186 (and force one interrupt per bank in older models) - fix GPIO line IRQ offset calculation in gpio-realtek-otto - drop unneeded MODULE_ALIAS expansions in multiple drivers - code cleanup in gpio-aggregator - minor improvements in gpio-max730x and gpio-mc33880 - Kconfig cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmGJH+MACgkQEacuoBRx 13JjGg//YKoAby35hcclft5xJYDwnx2or001TJ+nFLOQ9d24gUI7+UMFVnZ2ifMP 9F/yO0934pkHNUqcTuIMbeALW6Cg2DglXJFCQSXT6Ng5ETjENOhLHRa4bRdoIEim HC+YJg/DS6CpSkMAH09ljZfkTPz3B2DHt+Y4phIDg2PIrt+UUsN/RC685B4jSwds pcAwdP1mqBV1da8uRjtfc7UcFSxR6/lvBBWQSo9RAEse5R6Rmm7XIOKBlQ+Exdts jAmVqxz7BnfuYug9Y3y3fOiAQYoRvZBaSjkeMJALub3GEd3W30vAKoQhq7gC9ea3 XvCj7V6C59JVp9aQUXo1cL6VqcesSGMWoRi2vll8/7gluXV2gCoNPVb+9KwxG9Ed +xwXBpCsOVQCXl8jgIHBWNfxOS31wafWlK8Ng42Cjov+uQfsccgufTvhQXBumKbM oECSOgjg9k3/AMpym3buOgAzqSec5R1iiSWHNPT+P6cSFXa2Bl3A8X0RgF/TwXPC Bn9BDfSiYfOcD6y6yx2GPMCrCXNJRWgfJ/2j3Jsb66XtGwtBDhmK7rcqcg9qIws9 mWWyISiJgO0gfk24hIIw7OxeKCidm3iVskryJAQUfTX+BBcR5nZfiuF3EeDXZv95 whkSoN7J5fmMYDDJ7wd0pFwbW5fe5C/i/2tjOlNBgdXsNgAv/r4= =mKh0 -----END PGP SIGNATURE----- Merge tag 'gpio-updates-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio updates from Bartosz Golaszewski: "We have a single new driver, new features in others and some cleanups all over the place. Nothing really stands out and it is all relatively small. - new driver: gpio-modepin (plus relevant change in zynqmp firmware) - add interrupt support to gpio-virtio - enable the 'gpio-line-names' property in the DT bindings for gpio-rockchip - use the subsystem helpers where applicable in gpio-uniphier instead of accessing IRQ structures directly - code shrink in gpio-xilinx - add interrupt to gpio-mlxbf2 (and include the removal of custom interrupt code from the mellanox ethernet driver) - support multiple interrupts per bank in gpio-tegra186 (and force one interrupt per bank in older models) - fix GPIO line IRQ offset calculation in gpio-realtek-otto - drop unneeded MODULE_ALIAS expansions in multiple drivers - code cleanup in gpio-aggregator - minor improvements in gpio-max730x and gpio-mc33880 - Kconfig cleanups" * tag 'gpio-updates-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: virtio_gpio: drop packed attribute gpio: virtio: Add IRQ support gpio: realtek-otto: fix GPIO line IRQ offset gpio: clean up Kconfig file net: mellanox: mlxbf_gige: Replace non-standard interrupt handling gpio: mlxbf2: Introduce IRQ support gpio: mc33880: Drop if with an always false condition gpio: max730x: Make __max730x_remove() return void gpio: aggregator: Wrap access to gpiochip_fwd.tmp[] gpio: modepin: Add driver support for modepin GPIO controller dt-bindings: gpio: zynqmp: Add binding documentation for modepin firmware: zynqmp: Add MMIO read and write support for PS_MODE pin gpio: tps65218: drop unneeded MODULE_ALIAS gpio: max77620: drop unneeded MODULE_ALIAS gpio: xilinx: simplify getting .driver_data gpio: tegra186: Support multiple interrupts per bank gpio: tegra186: Force one interrupt per bank gpio: uniphier: Use helper functions to get private data from IRQ data gpio: uniphier: Use helper function to get IRQ hardware number dt-bindings: gpio: add gpio-line-names to rockchip,gpio-bank.yaml	2021-11-08 11:55:21 -08:00
Linus Torvalds	dd72945c43	cxl for v5.16 - Fix support for platforms that do not enumerate every ACPI0016 (CXL Host Bridge) in the CHBS (ACPI Host Bridge Structure). - Introduce a common pci_find_dvsec_capability() helper, clean up open coded implementations in various drivers. - Add 'cxl_test' for regression testing CXL subsystem ABIs. 'cxl_test' is a module built from tools/testing/cxl/ that mocks up a CXL topology to augment the nascent support for emulation of CXL devices in QEMU. - Convert libnvdimm to use the uuid API. - Complete the definition of CXL namespace labels in libnvdimm. - Tunnel libnvdimm label operations from nd_ioctl() back to the CXL mailbox driver. Enable 'ndctl {read,write}-labels' for CXL. - Continue to sort and refactor functionality into distinct driver and core-infrastructure buckets. For example, mailbox handling is now a generic core capability consumed by the PCI and cxl_test drivers. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYYRtyQAKCRDfioYZHlFs Z7UsAP9DzUN6IWWnYk1R95YXYVxFriRtRsBjujAqTg49EMghawEAoHaA9lxO3Hho l25TLYUOmB/zFTlUbe6YQptMJZ5YLwY= =im9j -----END PGP SIGNATURE----- Merge tag 'cxl-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl updates from Dan Williams: "More preparation and plumbing work in the CXL subsystem. From an end user perspective the highlight here is lighting up the CXL Persistent Memory related commands (label read / write) with the generic ioctl() front-end in LIBNVDIMM. Otherwise, the ability to instantiate new persistent and volatile memory regions is still on track for v5.17. Summary: - Fix support for platforms that do not enumerate every ACPI0016 (CXL Host Bridge) in the CHBS (ACPI Host Bridge Structure). - Introduce a common pci_find_dvsec_capability() helper, clean up open coded implementations in various drivers. - Add 'cxl_test' for regression testing CXL subsystem ABIs. 'cxl_test' is a module built from tools/testing/cxl/ that mocks up a CXL topology to augment the nascent support for emulation of CXL devices in QEMU. - Convert libnvdimm to use the uuid API. - Complete the definition of CXL namespace labels in libnvdimm. - Tunnel libnvdimm label operations from nd_ioctl() back to the CXL mailbox driver. Enable 'ndctl {read,write}-labels' for CXL. - Continue to sort and refactor functionality into distinct driver and core-infrastructure buckets. For example, mailbox handling is now a generic core capability consumed by the PCI and cxl_test drivers" * tag 'cxl-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (34 commits) ocxl: Use pci core's DVSEC functionality cxl/pci: Use pci core's DVSEC functionality PCI: Add pci_find_dvsec_capability to find designated VSEC cxl/pci: Split cxl_pci_setup_regs() cxl/pci: Add @base to cxl_register_map cxl/pci: Make more use of cxl_register_map cxl/pci: Remove pci request/release regions cxl/pci: Fix NULL vs ERR_PTR confusion cxl/pci: Remove dev_dbg for unknown register blocks cxl/pci: Convert register block identifiers to an enum cxl/acpi: Do not fail cxl_acpi_probe() based on a missing CHBS cxl/pci: Disambiguate cxl_pci further from cxl_mem Documentation/cxl: Add bus internal docs cxl/core: Split decoder setup into alloc + add tools/testing/cxl: Introduce a mock memory device + driver cxl/mbox: Move command definitions to common location cxl/bus: Populate the target list at decoder create tools/testing/cxl: Introduce a mocked-up CXL port hierarchy cxl/pmem: Add support for multiple nvdimm-bridge objects cxl/pmem: Translate NVDIMM label commands to CXL label commands ...	2021-11-08 11:49:48 -08:00
Linus Torvalds	dab334c98b	Merge branch 'i2c/for-mergewindow' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: - big refactoring of the PASEMI driver to support the Apple M1 - huge improvements to the XIIC in terms of locking and SMP safety - refactoring and clean ups for the i801 driver ... and the usual bunch of small driver updates * 'i2c/for-mergewindow' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (43 commits) i2c: amd-mp2-plat: ACPI: Use ACPI_COMPANION() directly i2c: i801: Add support for Intel Ice Lake PCH-N i2c: virtio: update the maintainer to Conghui i2c: xlr: Fix a resource leak in the error handling path of 'xlr_i2c_probe()' i2c: qup: move to use request_irq by IRQF_NO_AUTOEN flag i2c: qup: fix a trivial typo i2c: tegra: Ensure that device is suspended before driver is removed i2c: i801: Fix incorrect and needless software PEC disabling i2c: mediatek: Dump i2c/dma register when a timeout occurs i2c: mediatek: Reset the handshake signal between i2c and dma i2c: mlxcpld: Allow flexible polling time setting for I2C transactions i2c: pasemi: Set enable bit for Apple variant i2c: pasemi: Add Apple platform driver i2c: pasemi: Refactor _probe to use devm_* i2c: pasemi: Allow to configure bus frequency i2c: pasemi: Move common reset code to own function i2c: pasemi: Split pci driver to its own file i2c: pasemi: Split off common probing code i2c: pasemi: Remove usage of pci_dev i2c: pasemi: Use dev_name instead of port number ...	2021-11-08 11:46:10 -08:00
Linus Torvalds	206825f50f	Core: * Remove obsolete macros only used by the old nand_ecclayout struct * Don't remove debugfs directory if device is in use * MAINTAINERS: - Add entry for Qualcomm NAND controller driver - Update the devicetree documentation path of hyperbus MTD devices: * block2mtd: - Add support for an optional custom MTD label - Minor refactor to avoid hard coded constant * mtdswap: Remove redundant assignment of pointer eb CFI: * Fixup CFI on ixp4xx Raw NAND controller drivers: * Arasan: - Prevent an unsupported configuration * Xway, Socrates: plat_nand, Pasemi, Orion, mpc5121, GPIO, Au1550nd, AMS-Delta: - Keep the driver compatible with on-die ECC engines * cs553x, lpc32xx_slc, ndfc, sharpsl, tmio, txx9ndfmc: - Revert the commits: "Fix external use of SW Hamming ECC helper" - And let callers use the bare Hamming helpers * Fsmc: Fix use of SM ORDER * Intel: - Fix potential buffer overflow in probe * xway, vf610, txx9ndfm, tegra, stm32, plat_nand, oxnas, omap, mtk, hisi504, gpmi, gpio, denali, bcm6368, atmel: - Make use of the helper function devm_platform_ioremap_resource{,byname}() Onenand drivers: * Samsung: Drop Exynos4 and describe driver in KConfig Raw NAND chip drivers: * Hynix: Add support for H27UCG8T2ETR-BC MLC NAND SPI NOR core: * Add spi-nor device tree binding under SPI NOR maintainers SPI NOR manufacturer drivers: * Enable locking for n25q128a13 SPI NOR controller drivers: * Use devm_platform_ioremap_resource_byname() -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEE9HuaYnbmDhq/XIDIJWrqGEe9VoQFAmGIAdEACgkQJWrqGEe9 VoQHEAgAkFvFxIOyPSwAJfyeEklWmGSh+cm+X9EqWxZ5d6iwFE3FZH4X4XiqdyVY +rzy37o6Tp3pQVYDqxmPM8GGoHy8wYPR5h/U1IozbBaNeCJma1HyK4Sjb/cuX1eQ 23EM1pWT8N9CbRG2S9mz5C9uHcP9mImtyvU4WhTLKOEc9XGbpuj+k1cacSuEEbeF rZklOa7tvUiOFIDfr7Kf+DfxbpYabgB2dJgWPEtZmKmefWMiODHgvBJo0MT6MKr7 kOlIkgq5zYIFRNaxApWLI3JmH9+lsw6MHBe+cCqemV6q2OZlMV/5VGFElEywJt+4 L3VmlJQpMiCEMo2Cgc1jigHKryk4Xw== =h6o3 -----END PGP SIGNATURE----- Merge tag 'mtd/for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull mtd updates from Miquel Raynal: "Core: - Remove obsolete macros only used by the old nand_ecclayout struct - Don't remove debugfs directory if device is in use - MAINTAINERS: - Add entry for Qualcomm NAND controller driver - Update the devicetree documentation path of hyperbus MTD devices: - block2mtd: - Add support for an optional custom MTD label - Minor refactor to avoid hard coded constant - mtdswap: Remove redundant assignment of pointer eb CFI: - Fixup CFI on ixp4xx Raw NAND controller drivers: - Arasan: - Prevent an unsupported configuration - Xway, Socrates: plat_nand, Pasemi, Orion, mpc5121, GPIO, Au1550nd, AMS-Delta: - Keep the driver compatible with on-die ECC engines - cs553x, lpc32xx_slc, ndfc, sharpsl, tmio, txx9ndfmc: - Revert the commits: "Fix external use of SW Hamming ECC helper" - And let callers use the bare Hamming helpers - Fsmc: Fix use of SM ORDER - Intel: - Fix potential buffer overflow in probe - xway, vf610, txx9ndfm, tegra, stm32, plat_nand, oxnas, omap, mtk, hisi504, gpmi, gpio, denali, bcm6368, atmel: - Make use of the helper function devm_platform_ioremap_resource{,byname}() Onenand drivers: - Samsung: Drop Exynos4 and describe driver in KConfig Raw NAND chip drivers: - Hynix: Add support for H27UCG8T2ETR-BC MLC NAND SPI NOR core: - Add spi-nor device tree binding under SPI NOR maintainers SPI NOR manufacturer drivers: - Enable locking for n25q128a13 SPI NOR controller drivers: - Use devm_platform_ioremap_resource_byname()" * tag 'mtd/for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (50 commits) mtd: core: don't remove debugfs directory if device is in use MAINTAINERS: Update the devicetree documentation path of hyperbus mtd: block2mtd: add support for an optional custom MTD label mtd: block2mtd: minor refactor to avoid hard coded constant mtd: fixup CFI on ixp4xx mtd: rawnand: arasan: Prevent an unsupported configuration MAINTAINERS: Add entry for Qualcomm NAND controller driver mtd: rawnand: hynix: Add support for H27UCG8T2ETR-BC MLC NAND mtd: rawnand: xway: Keep the driver compatible with on-die ECC engines mtd: rawnand: socrates: Keep the driver compatible with on-die ECC engines mtd: rawnand: plat_nand: Keep the driver compatible with on-die ECC engines mtd: rawnand: pasemi: Keep the driver compatible with on-die ECC engines mtd: rawnand: orion: Keep the driver compatible with on-die ECC engines mtd: rawnand: mpc5121: Keep the driver compatible with on-die ECC engines mtd: rawnand: gpio: Keep the driver compatible with on-die ECC engines mtd: rawnand: au1550nd: Keep the driver compatible with on-die ECC engines mtd: rawnand: ams-delta: Keep the driver compatible with on-die ECC engines Revert "mtd: rawnand: cs553x: Fix external use of SW Hamming ECC helper" Revert "mtd: rawnand: lpc32xx_slc: Fix external use of SW Hamming ECC helper" Revert "mtd: rawnand: ndfc: Fix external use of SW Hamming ECC helper" ...	2021-11-08 11:37:39 -08:00
Linus Torvalds	05b8cd3db7	Add 'tools/perf/libbpf/' to ignored files Commit `6b491a86b7` ("perf build: Install libbpf headers locally when building") installed copies of the libbpf headers into the build tree, causing unnecessary noise from 'git status' after a perf tools build. Add the 'libbpf/' subdirectory to the .gitignore file to silence it all again. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-08 11:33:35 -08:00
Linus Torvalds	e851dfae43	kgdb patches for 5.16 A single patch this cycle. We replace some open-coded routines to classify task states with the scheduler's own function to do this. Alongside the obvious benefits of removing funky code and aligning more exactly with the scheduler's task classification, this also fixes a long standing compiler warning by removing the open-coded routines that generated the warning. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAmGJA90ACgkQfOMlXTn3 iKFJcw/9FLy94FS+y/F8xpTU89d2j92f8q1mxS9g7ToDzDOiIPLyNazbX+4PsXVQ FGgLpTqzNZX3+D5eAnOA/BNwWXtsvdxpsNnkY5ZCsVY5kZ0zsBYHe1O5CM2TbcMg bOvJRQVI/FjydlrwqxIz9gAD7FmT/QvyecIbHZm/zFiCxdQwZy3rFwREd5ENsjoG wumCCCH8Gh/afi9Pu3ZKHoZggNy/gmtSP3h3wmyoQneVFIJ4Vw5J61GFCvMPD+pN wuAXWpuzWaND5IPTr4aZMKHNSaxqADQoEpNWxkgRh0cNL4NGBKsdLZMcqTTiyWww TJSDtQKqocQB99eouwzQoA8SBsZwRvKRf/33QUXrWCAjl5YRK+9fSd8+dEf9Zd0o A3sh99ecmHXknY6K2uO7NFjUPLSA/QeMGBzNx9lt7RoL+14tjqZkrAvWXooZzBY3 j39gwI1kSplmmCSoXwoW3AFVcCLJcGzE9qh0NUmZgt3kv8K1SUo3gxotKs8KwKj/ xVozOokmZV2ZuCTf8oIw7ntLwIFjiUaYBE7JY+c8mT8VWCbs5ztyOb11I35YYT0V InXMDICLxZBD85eNOHyPC0fAud5emfboHl5GSUxo2hPgrRKuBmqElGtxG9CC8DLR SItPjKfrYI1CJd4uoFX54nC3GmwLVSAq3xDwpYsN4A4lLbJJytc= =vIfI -----END PGP SIGNATURE----- Merge tag 'kgdb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb update from Daniel Thompson: "A single patch this cycle. We replace some open-coded routines to classify task states with the scheduler's own function to do this. Alongside the obvious benefits of removing funky code and aligning more exactly with the scheduler's task classification, this also fixes a long standing compiler warning by removing the open-coded routines that generated the warning" * tag 'kgdb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kdb: Adopt scheduler's task classification	2021-11-08 09:35:43 -08:00
Linus Torvalds	a2b03e48e9	OpenRISC updates for 5.16 This includes 2 minor cleanups, plus a bug fix for OpenRISC TLB flush code that allows the the SMP kernel to boot again. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE2cRzVK74bBA6Je/xw7McLV5mJ+QFAmGIOh4ACgkQw7McLV5m J+Ts7hAApLDmupgPlJfc9hsIIxiCZcMnP5vyF2gNSkaDVkWkLkM24WuAfcCusEUE VIrI/K6AV8wQUfG9n3sAQ/v5bkKrLCkt3UYsOViidHJsrTeW7CO8v+Nzair3gbNa zKZEs6EyoGrd9l5nMv3D0z/PFWnTCMehRBHYYp6D/chgT+2cpQsVsq9SbN3gV5CR sdFrGOzWSemtWCqVibdsKa4j0IqLVNRAPv1MYC7jECOkRvMkbkXFWZK8Q8ijs0Mv th4nt9SJOSH+OAzBhXoH7bW9BUwfCEwnBYXhsidHVwf1S6+5/yFj24zQAz/1YZ21 ydTS6PHS66oFqa5QITNbfNuqeprrnb+8JavkYvJnWiPfg8yJ2gURZ/4pCHS8iMsB mQpLaQlTsEHODj7Vc4GSTlbzWDgtCSB6v0fOPJP01Bzf+z2wsqG6dLJVrWMl6nT9 0sNvfnU9egne8UhbTJ8fot9SISRjc/sFMXKnMMdcY+7KWdw+Kh4t7/hL5ZbKCZ8H Pg3NNTA37puqTn9qvd0ZA5Ey9L0bSjeQpGa2A+nutE3zfRL9MQFWWrpHPgfJyoZe VOZSK0dtSfYtTsCnno5S2px2XqPpXIf0P5LiXzvtc+x7JjZsP3G4jZxB8ql+ZaQF IeBI4Kvg4QPU2T/TUwt5VIMEL/I3yZ63/0L+d1xnIqaVZVK0RDQ= =c5nj -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://github.com/openrisc/linux Pull OpenRISC updates from Stafford Horne: "This includes two minor cleanups, plus a bug fix for OpenRISC TLB flush code that allows the the SMP kernel to boot again" * tag 'for-linus' of git://github.com/openrisc/linux: openrisc: fix SMP tlb flush NULL pointer dereference openrisc: signal: remove unused DEBUG_SIG macro openrisc: time: don't mark comment as kernel-doc	2021-11-08 09:31:25 -08:00
Linus Torvalds	bbdbeb0048	perf tools changes for v5.16: perf annotate: - Add riscv64 support. - Add fusion logic for AMD microarchs. perf record: - Add an option to control the synthesizing behavior: --synth <no\|all\|task\|mmap\|cgroup> Fine-tune event synthesis: default=all core: - Allow controlling synthesizing PERF_RECORD_ metadata events during record. - perf.data reader prep work for multithreaded processing. - Fix missing exclude_{host,guest} setting in PMUs that don't support it and that were causing the feature detection code to disable it for all events, even the ones in PMUs that support it. - Fix the default use of precise events on AMD, that were always falling back to non-precise because perf_event_attr.exclude_guest=1 was set and IBS does not have filtering capability, refusing precise + exclude_guest. - Add bitfield_swap() to handle branch_stack endian issue. perf script: - Show binary offsets for userspace addresses in callchains. - Support instruction latency via new "ins_lat" selectable field. - Add dlfilter-show-cycles perf inject: - Add vmlinux and ignore-vmlinux arguments, similar to other tools. perf list: - Display PMU prefix for partially supported hybrid cache events. - Display hybrid PMU events with cpu type. perf stat: - Improve metrics documentation of data structures. - Fix memory leaks in the metric code. - Use NAN for missing event IDs. - Don't compute unused events. - Fix memory leak on error path. - Encode and use metric-id as a metric qualifier. - Allow metrics with no events. - Avoid events for an 'if' constant result. - Only add a referenced metric once. - Simplify metric_refs calculation. - Allow modifiers on metrics. perf test: - Add workload test of metric and metric groups. - Workload test of all PMUs. - vmlinux-kallsyms: Ignore hidden symbols. - Add pmu-event test for event described as "config=". - Verify more event members in pmu-events test. - Add endian test for struct branch_flags on the sample-parsing test. - Improve temp file cleanup in several tests. perf daemon: - Address MSAN warnings on send_cmd(). perf kmem: - Improve man page for record options perf srcline: - Use long-running addr2line per DSO, greatly speeding up the 'srcline' sort order. perf symbols: - Ignore $a/$d symbols for ARM modules. - Fix /proc/kcore access on 32 bit systems. Kernel UAPI copies: - Update copy of linux/socket.h with the kernel sources, no change in tooling output. libbpf: - Pull in bpf_program__get_prog_info_linear() from libbpf, too much specific to perf. - Deprecate bpf_map__resize() in favor of bpf_map_set_max_entries() - Install libbpf headers locally when building. - Bump minimum LLVM C++ std to GNU++14. libperf: - Use binary search in perf_cpu_map__idx() as array are sorted. libtracefs: - Enable libtracefs dynamic linking. libtraceevent: - Increase logging when verbose. Arch specific: PowerPC: - Add support to expose instruction and data address registers as part of extended regs. Vendor events: JSON parser: - Support ConfigCode to set the config= in PMUs like: $ cat /sys/bus/event_source/devices/hisi_sccl1_ddrc3/events/act_cmd config=0x5 - Make the JSON parser more conformant when in strict mode. All JSON files: - Fix all remaining invalid JSON files. ARM: - Syntax corrections in Neoverse N1 json. - Categorise the Neoverse V1 counters. - Add new armv8 PMU events. - Revise hip08 uncore events. Hardware tracing: auxtrace: - Add missing Z option to ITRACE_HELP. - Add itrace A option to approximate IPC. - Add itrace d+o option to direct debug log to stdout. Intel PT: - Add support for PERF_RECORD_AUX_OUTPUT_HW_ID - Support itrace A option to approximate IPC - Support itrace d+o option to direct debug log to stdout. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYYg7RwAKCRCyPKLppCJ+ J1KgAQCGC3802gsI38/xwli1SLBNHsZ9DEy2nX5Ikw/f64K+0QEA6DsU0hpRTR0D i8ZFa2zNX748n+pU2WX6E7AjO01hrQo= =sXHg -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v5.16-2021-11-07-without-bpftool-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools updates from Arnaldo Carvalho de Melo: "perf annotate: - Add riscv64 support. - Add fusion logic for AMD microarchs. perf record: - Add an option to control the synthesizing behavior: --synth <no\|all\|task\|mmap\|cgroup> core: - Allow controlling synthesizing PERF_RECORD_ metadata events during record. - perf.data reader prep work for multithreaded processing. - Fix missing exclude_{host,guest} setting in PMUs that don't support it and that were causing the feature detection code to disable it for all events, even the ones in PMUs that support it. - Fix the default use of precise events on AMD, that were always falling back to non-precise because perf_event_attr.exclude_guest=1 was set and IBS does not have filtering capability, refusing precise + exclude_guest. - Add bitfield_swap() to handle branch_stack endian issue. perf script: - Show binary offsets for userspace addresses in callchains. - Support instruction latency via new "ins_lat" selectable field. - Add dlfilter-show-cycles perf inject: - Add vmlinux and ignore-vmlinux arguments, similar to other tools. perf list: - Display PMU prefix for partially supported hybrid cache events. - Display hybrid PMU events with cpu type. perf stat: - Improve metrics documentation of data structures. - Fix memory leaks in the metric code. - Use NAN for missing event IDs. - Don't compute unused events. - Fix memory leak on error path. - Encode and use metric-id as a metric qualifier. - Allow metrics with no events. - Avoid events for an 'if' constant result. - Only add a referenced metric once. - Simplify metric_refs calculation. - Allow modifiers on metrics. perf test: - Add workload test of metric and metric groups. - Workload test of all PMUs. - vmlinux-kallsyms: Ignore hidden symbols. - Add pmu-event test for event described as "config=". - Verify more event members in pmu-events test. - Add endian test for struct branch_flags on the sample-parsing test. - Improve temp file cleanup in several tests. perf daemon: - Address MSAN warnings on send_cmd(). perf kmem: - Improve man page for record options perf srcline: - Use long-running addr2line per DSO, greatly speeding up the 'srcline' sort order. perf symbols: - Ignore $a/$d symbols for ARM modules. - Fix /proc/kcore access on 32 bit systems. Kernel UAPI copies: - Update copy of linux/socket.h with the kernel sources, no change in tooling output. libbpf: - Pull in bpf_program__get_prog_info_linear() from libbpf, too much specific to perf. - Deprecate bpf_map__resize() in favor of bpf_map_set_max_entries() - Install libbpf headers locally when building. - Bump minimum LLVM C++ std to GNU++14. libperf: - Use binary search in perf_cpu_map__idx() as array are sorted. libtracefs: - Enable libtracefs dynamic linking. libtraceevent: - Increase logging when verbose. Arch specific: * PowerPC: - Add support to expose instruction and data address registers as part of extended regs. Vendor events: * JSON parser: - Support ConfigCode to set the config= in PMUs - Make the JSON parser more conformant when in strict mode. * All JSON files: - Fix all remaining invalid JSON files. * ARM: - Syntax corrections in Neoverse N1 json. - Categorise the Neoverse V1 counters. - Add new armv8 PMU events. - Revise hip08 uncore events. Hardware tracing: * auxtrace: - Add missing Z option to ITRACE_HELP. - Add itrace A option to approximate IPC. - Add itrace d+o option to direct debug log to stdout. * Intel PT: - Add support for PERF_RECORD_AUX_OUTPUT_HW_ID - Support itrace A option to approximate IPC - Support itrace d+o option to direct debug log to stdout" * tag 'perf-tools-for-v5.16-2021-11-07-without-bpftool-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (120 commits) perf build: Install libbpf headers locally when building perf MANIFEST: Add bpftool files to allow building with BUILD_BPF_SKEL=1 perf metric: Fix memory leaks perf parse-event: Add init and exit to parse_event_error perf parse-events: Rename parse_events_error functions perf stat: Fix memory leak on error path perf tools: Use __BYTE_ORDER__ perf inject: Add vmlinux and ignore-vmlinux arguments perf tools: Check vmlinux/kallsyms arguments in all tools perf tools: Refactor out kernel symbol argument sanity checking perf symbols: Ignore $a/$d symbols for ARM modules perf evsel: Don't set exclude_guest by default perf evsel: Fix missing exclude_{host,guest} setting perf bpf: Add missing free to bpf_event__print_bpf_prog_info() perf beauty: Update copy of linux/socket.h with the kernel sources perf clang: Fixes for more recent LLVM/clang tools: Bump minimum LLVM C++ std to GNU++14 perf bpf: Pull in bpf_program__get_prog_info_linear() Revert "perf bench futex: Add support for 32-bit systems with 64-bit time_t" perf test sample-parsing: Add endian test for struct branch_flags ...	2021-11-08 09:25:26 -08:00

... 2 3 4 5 6 ...

1057367 Commits