linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-21 10:31:54 +00:00

Author	SHA1	Message	Date
Josef Bacik	9ddc959e80	btrfs: use the file extent tree infrastructure We want to use this everywhere we modify the file extent items permanently. These include: 1) Inserting new file extents for writes and prealloc extents. 2) Truncating inode items. 3) btrfs_cont_expand(). 4) Insert inline extents. 5) Insert new extents from log replay. 6) Insert a new extent for clone, as it could be past i_size. 7) Hole punching For hole punching in particular it might seem it's not necessary because anybody extending would use btrfs_cont_expand, however there is a corner that still can give us trouble. Start with an empty file and fallocate KEEP_SIZE 1M-2M We now have a 0 length file, and a hole file extent from 0-1M, and a prealloc extent from 1M-2M. Now punch 1M-1.5M Because this is past i_size we have [HOLE EXTENT][ NOTHING ][PREALLOC] [0 1M][1M 1.5M][1.5M 2M] with an i_size of 0. Now if we pwrite 0-1.5M we'll increas our i_size to 1.5M, but our disk_i_size is still 0 until the ordered extent completes. However if we now immediately truncate 2M on the file we'll just call btrfs_cont_expand(inode, 1.5M, 2M), since our old i_size is 1.5M. If we commit the transaction here and crash we'll expose the gap. To fix this we need to clear the file extent mapping for the range that we punched but didn't insert a corresponding file extent for. This will mean the truncate will only get an disk_i_size set to 1M if we crash before the finish ordered io happens. I've written an xfstest to reproduce the problem and validate this fix. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:24 +01:00
Josef Bacik	41a2ee75aa	btrfs: introduce per-inode file extent tree In order to keep track of where we have file extents on disk, and thus where it is safe to adjust the i_size to, we need to have a tree in place to keep track of the contiguous areas we have file extents for. Add helpers to use this tree, as it's not required for NO_HOLES file systems. We will use this by setting DIRTY for areas we know we have file extent item's set, and clearing it when we remove file extent items for truncation. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:24 +01:00
Josef Bacik	790a1d44f9	btrfs: use btrfs_ordered_update_i_size in clone_finish_inode_update We were using btrfs_i_size_write(), which unconditionally jacks up inode->disk_i_size. However since clone can operate on ranges we could have pending ordered extents for a range prior to the start of our clone operation and thus increase disk_i_size too far and have a hole with no file extent. Fix this by using the btrfs_ordered_update_i_size helper which will do the right thing in the face of pending ordered extents outside of our clone range. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:24 +01:00
Su Yue	cfe953c824	btrfs: update the comment of btrfs_control_ioctl() Btrfsctl was removed in 2012, now the function btrfs_control_ioctl() is only used for devices ioctls. So update the comment. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Su Yue <Damenly_Su@gmx.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:23 +01:00
Qu Wenruo	0c89138970	btrfs: relocation: Add introduction of how relocation works Relocation is one of the most complex part of btrfs, while it's also the foundation stone for online resizing, profile converting. For such a complex facility, we should at least have some introduction to it. This patch will add an basic introduction at pretty a high level, explaining: - What relocation does - How relocation is done Only mentioning how data reloc tree and reloc tree are involved in the operation. No details like the backref cache, or the data reloc tree contents. - Which function to refer. More detailed comments will be added for reloc tree creation, data reloc tree creation and backref cache. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:23 +01:00
Filipe Manana	42836cf4ba	Btrfs: don't iterate mod seq list when putting a tree mod seq Each new element added to the mod seq list is always appended to the list, and each one gets a sequence number coming from a counter which gets incremented everytime a new element is added to the list (or a new node is added to the tree mod log rbtree). Therefore the element with the lowest sequence number is always the first element in the list. So just remove the list iteration at btrfs_put_tree_mod_seq() that computes the minimum sequence number in the list and replace it with a check for the first element's sequence number. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:23 +01:00
Qu Wenruo	30b3688e1f	btrfs: Add overview of device replace The overview of btrfs dev-replace. It mentions some corner cases caused by the write duplication and scrub based data copy. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ adjust wording ] Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:23 +01:00
Linus Torvalds	16fbf79b0f	Linux 5.6-rc7	2020-03-22 18:31:56 -07:00
Linus Torvalds	67d584e33e	for-5.6-rc6-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl53Vh0ACgkQxWXV+ddt WDtfOQ//bbUyKXcdH0FBZOCEcJmegcK1eUFYqKrwR2bHGe5JRdLM8pAvjCcqmWeO jtaRiFC4NSCqTIl3mkBUb+XmQtjZwixBUHRxJpuEO8zqawvFZXTqg/KJklNvi2rd KdflSNia6KrozTT+B/lpwZ5emS+wSdj5XTZ6VGj4riwtphSfWAjOu+4cOASMeFu+ Gfn+N9xu0ZcR/6zO20xAg0Xz+WU2uj4EfeM35dtRP2bPLG0yOGmiYT15Ll9h74Wm 7F+28iNTQfYutAexGvUpiouanGXE+ka3TCsJg5LuVTpdKGraOVGEuX+RhsyoKQrB E8bk91fbkLlooluhUC306iNA9/+RN/yFGtILX8JsgI2Od26ZuU01l/OHrc19MDIm gw1w3PMsD/hXLsG5ba4QsIYOzXofSrPdWej29h/o5p0VEQrAoCJEpAi7fVsiJDR1 sx6kCodw5jYhVs1P6DdXO1pgjE7iFUmjUQCFkl40edPMLy/LwB99A4zNnCOwI0KZ 49CMWHDe+tXVJBTzPvtma/PycQHIxJYMf1f8ko9E4stB7HtfH4dnUERDkb1UwQ5n aJgyhsCCnp/EJoPunUT7g9nLUdyu0Rtwknn3NascWZEieX2QhKEF5RcjAUSL+Hlo jbGGvoLhG0nOtYkU7BNSQbL8wxPJEEAq8e6F4tWMcOkhX4pNZP8= =YkB0 -----END PGP SIGNATURE----- Merge tag 'for-5.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Two fixes. The first is a regression: when dropping some incompat bits the conditions were reversed. The other is a fix for rename whiteout potentially leaving stack memory linked to a list" * tag 'for-5.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix removal of raid[56\|1c34} incompat flags after removing block group btrfs: fix log context list corruption after rename whiteout error	2020-03-22 11:35:33 -07:00
Linus Torvalds	b3c03db67e	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "10 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: x86/mm: split vmalloc_sync_all() mm, slub: prevent kmalloc_node crashes and memory leaks mm/mmu_notifier: silence PROVE_RCU_LIST warnings epoll: fix possible lost wakeup on epoll_ctl() path mm: do not allow MADV_PAGEOUT for CoW pages mm, memcg: throttle allocators based on ancestral memory.high mm, memcg: fix corruption on 64-bit divisor in memory.high throttling page-flags: fix a crash at SetPageError(THP_SWAP) mm/hotplug: fix hot remove failure in SPARSEMEM\|!VMEMMAP case memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event	2020-03-22 10:46:50 -07:00
Joerg Roedel	763802b53a	x86/mm: split vmalloc_sync_all() Commit `3f8fd02b1b` ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the vunmap() code-path. While this change was necessary to maintain correctness on x86-32-pae kernels, it also adds additional cycles for architectures that don't need it. Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported severe performance regressions in micro-benchmarks because it now also calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But the vmalloc_sync_all() implementation on x86-64 is only needed for newly created mappings. To avoid the unnecessary work on x86-64 and to gain the performance back, split up vmalloc_sync_all() into two functions: * vmalloc_sync_mappings(), and * vmalloc_sync_unmappings() Most call-sites to vmalloc_sync_all() only care about new mappings being synchronized. The only exception is the new call-site added in the above mentioned commit. Shile Zhang directed us to a report of an 80% regression in reaim throughput. Fixes: `3f8fd02b1b` ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Borislav Petkov <bp@suse.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [GHES] Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Vlastimil Babka	0715e6c516	mm, slub: prevent kmalloc_node crashes and memory leaks Sachin reports [1] a crash in SLUB __slab_alloc(): BUG: Kernel NULL pointer dereference on read at 0x000073b0 Faulting instruction address: 0xc0000000003d55f4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1 NIP: c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000 REGS: c0000008b37836d0 TRAP: 0300 Not tainted (5.6.0-rc2-next-20200218-autotest) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24004844 XER: 00000000 CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1 GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500 GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620 GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000 GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000 GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002 GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122 GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8 GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180 NIP ___slab_alloc+0x1f4/0x760 LR __slab_alloc+0x34/0x60 Call Trace: ___slab_alloc+0x334/0x760 (unreliable) __slab_alloc+0x34/0x60 __kmalloc_node+0x110/0x490 kvmalloc_node+0x58/0x110 mem_cgroup_css_online+0x108/0x270 online_css+0x48/0xd0 cgroup_apply_control_enable+0x2ec/0x4d0 cgroup_mkdir+0x228/0x5f0 kernfs_iop_mkdir+0x90/0xf0 vfs_mkdir+0x110/0x230 do_mkdirat+0xb0/0x1a0 system_call+0x5c/0x68 This is a PowerPC platform with following NUMA topology: available: 2 nodes (0-1) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 1 size: 35247 MB node 1 free: 30907 MB node distances: node 0 1 0: 10 40 1: 40 10 possible numa nodes: 0-31 This only happens with a mmotm patch "mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node" [2] which effectively calls kmalloc_node for each possible node. SLUB however only allocates kmem_cache_node on online N_NORMAL_MEMORY nodes, and relies on node_to_mem_node to return such valid node for other nodes since commit `a561ce00b0` ("slub: fall back to node_to_mem_node() node if allocating on memoryless node"). This is however not true in this configuration where the _node_numa_mem_ array is not initialized for nodes 0 and 2-31, thus it contains zeroes and get_partial() ends up accessing non-allocated kmem_cache_node. A related issue was reported by Bharata (originally by Ramachandran) [3] where a similar PowerPC configuration, but with mainline kernel without patch [2] ends up allocating large amounts of pages by kmalloc-1k kmalloc-512. This seems to have the same underlying issue with node_to_mem_node() not behaving as expected, and might probably also lead to an infinite loop with CONFIG_SLUB_CPU_PARTIAL [4]. This patch should fix both issues by not relying on node_to_mem_node() anymore and instead simply falling back to NUMA_NO_NODE, when kmalloc_node(node) is attempted for a node that's not online, or has no usable memory. The "usable memory" condition is also changed from node_present_pages() to N_NORMAL_MEMORY node state, as that is exactly the condition that SLUB uses to allocate kmem_cache_node structures. The check in get_partial() is removed completely, as the checks in ___slab_alloc() are now sufficient to prevent get_partial() being reached with an invalid node. [1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/ [2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/ [3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/ [4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/ Fixes: `a561ce00b0` ("slub: fall back to node_to_mem_node() node if allocating on memoryless node") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Reported-by: PUVICHAKRAVARTHY RAMACHANDRAN <puvichakravarthy@in.ibm.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Tested-by: Bharata B Rao <bharata@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christopher Lameter <cl@linux.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200320115533.9604-1-vbabka@suse.cz Debugged-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Qian Cai	63886bad90	mm/mmu_notifier: silence PROVE_RCU_LIST warnings It is safe to traverse mm->notifier_subscriptions->list either under SRCU read lock or mm->notifier_subscriptions->lock using hlist_for_each_entry_rcu(). Silence the PROVE_RCU_LIST false positives, for example, WARNING: suspicious RCU usage ----------------------------- mm/mmu_notifier.c:484 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by libvirtd/802: #0: ffff9321e3f58148 (&mm->mmap_sem#2){++++}, at: do_mprotect_pkey+0xe1/0x3e0 #1: ffffffff91ae6160 (mmu_notifier_invalidate_range_start){+.+.}, at: change_p4d_range+0x5fa/0x800 #2: ffffffff91ae6e08 (srcu){....}, at: __mmu_notifier_invalidate_range_start+0x178/0x460 stack backtrace: CPU: 7 PID: 802 Comm: libvirtd Tainted: G I 5.6.0-rc6-next-20200317+ #2 Hardware name: HP ProLiant BL460c Gen8, BIOS I31 11/02/2014 Call Trace: dump_stack+0xa4/0xfe lockdep_rcu_suspicious+0xeb/0xf5 __mmu_notifier_invalidate_range_start+0x3ff/0x460 change_p4d_range+0x746/0x800 change_protection+0x1df/0x300 mprotect_fixup+0x245/0x3e0 do_mprotect_pkey+0x23b/0x3e0 __x64_sys_mprotect+0x51/0x70 do_syscall_64+0x91/0xae8 entry_SYSCALL_64_after_hwframe+0x49/0xb3 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Link: http://lkml.kernel.org/r/20200317175640.2047-1-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Roman Penyaev	1b53734bd0	epoll: fix possible lost wakeup on epoll_ctl() path This fixes possible lost wakeup introduced by commit `a218cc4914`. Originally modifications to ep->wq were serialized by ep->wq.lock, but in commit `a218cc4914` ("epoll: use rwlock in order to reduce ep_poll_callback() contention") a new rw lock was introduced in order to relax fd event path, i.e. callers of ep_poll_callback() function. After the change ep_modify and ep_insert (both are called on epoll_ctl() path) were switched to ep->lock, but ep_poll (epoll_wait) was using ep->wq.lock on wqueue list modification. The bug doesn't lead to any wqueue list corruptions, because wake up path and list modifications were serialized by ep->wq.lock internally, but actual waitqueue_active() check prior wake_up() call can be reordered with modifications of ep ready list, thus wake up can be lost. And yes, can be healed by explicit smp_mb(): list_add_tail(&epi->rdlink, &ep->rdllist); smp_mb(); if (waitqueue_active(&ep->wq)) wake_up(&ep->wp); But let's make it simple, thus current patch replaces ep->wq.lock with the ep->lock for wqueue modifications, thus wake up path always observes activeness of the wqueue correcty. Fixes: `a218cc4914` ("epoll: use rwlock in order to reduce ep_poll_callback() contention") Reported-by: Max Neunhoeffer <max@arangodb.com> Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Max Neunhoeffer <max@arangodb.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Jason Baron <jbaron@akamai.com> Cc: Jes Sorensen <jes.sorensen@gmail.com> Cc: <stable@vger.kernel.org> [5.1+] Link: http://lkml.kernel.org/r/20200214170211.561524-1-rpenyaev@suse.de References: https://bugzilla.kernel.org/show_bug.cgi?id=205933 Bisected-by: Max Neunhoeffer <max@arangodb.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Michal Hocko	12e967fd8e	mm: do not allow MADV_PAGEOUT for CoW pages Jann has brought up a very interesting point [1]. While shared pages are excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed that way. This can lead to all sorts of hard to debug problems. E.g. performance problems outlined by Daniel [2]. There are runtime environments where there is a substantial memory shared among security domains via CoW memory and a easy to reclaim way of that memory, which MADV_{COLD,PAGEOUT} offers, can lead to either performance degradation in for the parent process which might be more privileged or even open side channel attacks. The feasibility of the latter is not really clear to me TBH but there is no real reason for exposure at this stage. It seems there is no real use case to depend on reclaiming CoW memory via madvise at this stage so it is much easier to simply disallow it and this is what this patch does. Put it simply MADV_{PAGEOUT,COLD} can operate only on the exclusively owned memory which is a straightforward semantic. [1] http://lkml.kernel.org/r/CAG48ez0G3JkMq61gUmyQAaCq=_TwHbi1XKzWRooxZkv08PQKuw@mail.gmail.com [2] http://lkml.kernel.org/r/CAKOZueua_v8jHCpmEtTB6f3i9e2YnmX4mqdYVWhV4E=Z-n+zRQ@mail.gmail.com Fixes: `9c276cc65a` ("mm: introduce MADV_COLD") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200312082248.GS23944@dhcp22.suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Chris Down	e26733e0d0	mm, memcg: throttle allocators based on ancestral memory.high Prior to this commit, we only directly check the affected cgroup's memory.high against its usage. However, it's possible that we are being reclaimed as a result of hitting an ancestor memory.high and should be penalised based on that, instead. This patch changes memory.high overage throttling to use the largest overage in its ancestors when considering how many penalty jiffies to charge. This makes sure that we penalise poorly behaving cgroups in the same way regardless of at what level of the hierarchy memory.high was breached. Fixes: `0e4b01df86` ("mm, memcg: throttle allocators when failing reclaim over memory.high") Reported-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Chris Down <chris@chrisdown.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> [5.4.x+] Link: http://lkml.kernel.org/r/8cd132f84bd7e16cdb8fde3378cdbf05ba00d387.1584036142.git.chris@chrisdown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Chris Down	d397a45fc7	mm, memcg: fix corruption on 64-bit divisor in memory.high throttling Commit `0e4b01df86` had a bunch of fixups to use the right division method. However, it seems that after all that it still wasn't right -- div_u64 takes a 32-bit divisor. The headroom is still large (2^32 pages), so on mundane systems you won't hit this, but this should definitely be fixed. Fixes: `0e4b01df86` ("mm, memcg: throttle allocators when failing reclaim over memory.high") Reported-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Chris Down <chris@chrisdown.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: <stable@vger.kernel.org> [5.4.x+] Link: http://lkml.kernel.org/r/80780887060514967d414b3cd91f9a316a16ab98.1584036142.git.chris@chrisdown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Qian Cai	d72520ad00	page-flags: fix a crash at SetPageError(THP_SWAP) Commit `bd4c82c22c` ("mm, THP, swap: delay splitting THP after swapped out") supported writing THP to a swap device but forgot to upgrade an older commit `df8c94d13c` ("page-flags: define behavior of FS/IO-related flags on compound pages") which could trigger a crash during THP swapping out with DEBUG_VM_PGFLAGS=y, kernel BUG at include/linux/page-flags.h:317! page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page)) page:fffff3b2ec3a8000 refcount:512 mapcount:0 mapping:000000009eb0338c index:0x7f6e58200 head:fffff3b2ec3a8000 order:9 compound_mapcount:0 compound_pincount:0 anon flags: 0x45fffe0000d8454(uptodate\|lru\|workingset\|owner_priv_1\|writeback\|head\|reclaim\|swapbacked) end_swap_bio_write() SetPageError(page) VM_BUG_ON_PAGE(1 && PageCompound(page)) <IRQ> bio_endio+0x297/0x560 dec_pending+0x218/0x430 [dm_mod] clone_endio+0xe4/0x2c0 [dm_mod] bio_endio+0x297/0x560 blk_update_request+0x201/0x920 scsi_end_request+0x6b/0x4b0 scsi_io_completion+0x509/0x7e0 scsi_finish_command+0x1ed/0x2a0 scsi_softirq_done+0x1c9/0x1d0 __blk_mqnterrupt+0xf/0x20 </IRQ> Fix by checking PF_NO_TAIL in those places instead. Fixes: `bd4c82c22c` ("mm, THP, swap: delay splitting THP after swapped out") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200310235846.1319-1-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Baoquan He	d41e2f3bd5	mm/hotplug: fix hot remove failure in SPARSEMEM\|!VMEMMAP case In section_deactivate(), pfn_to_page() doesn't work any more after ms->section_mem_map is resetting to NULL in SPARSEMEM\|!VMEMMAP case. It causes a hot remove failure: kernel BUG at mm/page_alloc.c:4806! invalid opcode: 0000 [#1] SMP PTI CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:free_pages+0x85/0xa0 Call Trace: __remove_pages+0x99/0xc0 arch_remove_memory+0x23/0x4d try_remove_memory+0xc8/0x130 __remove_memory+0xa/0x11 acpi_memory_device_remove+0x72/0x100 acpi_bus_trim+0x55/0x90 acpi_device_hotplug+0x2eb/0x3d0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x1a7/0x370 worker_thread+0x30/0x380 kthread+0x112/0x130 ret_from_fork+0x35/0x40 Let's move the ->section_mem_map resetting after depopulate_section_memmap() to fix it. [akpm@linux-foundation.org: remove unneeded initialization, per David] Fixes: `ba72b4c8cf` ("mm/sparsemem: support sub-section hotplug") Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200307084229.28251-2-bhe@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Chunguang Xu	7d36665a58	memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event An eventfd monitors multiple memory thresholds of the cgroup, closes them, the kernel deletes all events related to this eventfd. Before all events are deleted, another eventfd monitors the memory threshold of this cgroup, leading to a crash: BUG: kernel NULL pointer dereference, address: 0000000000000004 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce067 PMD 0 Oops: 0002 [#1] SMP PTI CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded Not tainted 5.6.0-rc4 #3 Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GLET70WW (2.24 ) 05/21/2014 Workqueue: events memcg_event_remove RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0x190 RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 0000000000000001 RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 0000000000000001 RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 0000000000000010 R10: 000000000000000c R11: 071c71c71c71c71c R12: ffff8bb226aba880 R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8bb242680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 00000000001606e0 Call Trace: memcg_event_remove+0x32/0x90 process_one_work+0x172/0x380 worker_thread+0x49/0x3f0 kthread+0xf8/0x130 ret_from_fork+0x35/0x40 CR2: 0000000000000004 We can reproduce this problem in the following ways: 1. We create a new cgroup subdirectory and a new eventfd, and then we monitor multiple memory thresholds of the cgroup through this eventfd. 2. closing this eventfd, and __mem_cgroup_usage_unregister_event () will be called multiple times to delete all events related to this eventfd. The first time __mem_cgroup_usage_unregister_event() is called, the kernel will clear all items related to this eventfd in thresholds-> primary. Since there is currently only one eventfd, thresholds-> primary becomes empty, so the kernel will set thresholds-> primary and hresholds-> spare to NULL. If at this time, the user creates a new eventfd and monitor the memory threshold of this cgroup, kernel will re-initialize thresholds-> primary. Then when __mem_cgroup_usage_unregister_event () is called for the second time, because thresholds-> primary is not empty, the system will access thresholds-> spare, but thresholds-> spare is NULL, which will trigger a crash. In general, the longer it takes to delete all events related to this eventfd, the easier it is to trigger this problem. The solution is to check whether the thresholds associated with the eventfd has been cleared when deleting the event. If so, we do nothing. [akpm@linux-foundation.org: fix comment, per Kirill] Fixes: `907860ed38` ("cgroups: make cftype.unregister_event() void-returning") Signed-off-by: Chunguang Xu <brookxu@tencent.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/077a6f67-aefa-4591-efec-f2f3af2b0b02@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-21 18:56:06 -07:00
Linus Torvalds	b74b991fb8	block-5.6-20200320 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl51dZoQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpt1GD/4qD5KahkC9cdRRcliGoYrY5CLJbEx+Hwlu QNJKggGJKMs3f4KqD1JwGsZCppD9bPTD2qgjs8Hjkw6HCOabXx8elQBKwyhjVglu 5ogv981fmbzPau3n5gQ1llqjF6aslKpSkA9arQPgKov7gIoa2U3Gc1ZIO5Mz/X/T J0Z4TqmjMhbjAyP9BqfVIyQyDR9WGvO4U/9XmTclKU4Rex7lT5JRiGVF0ZpqfXDQ pkJOaqsltZVXN0J6Uy2e0qL5nkWIFhfrqjvoBG2V/ivt9zOfiPzmt9DLNl3S/QyU TYtNvAg6wuw/DBOdDsLoHztQUWbBqUMhn6892ADc6786TwFZv0/Ytv+CqL2mxbYy wImli5cnowWNevkNFpm3RLAMA0Oi8NiULb31AmRP23OyGSPB51JWhpnlqEylRnz2 aa8KkA+VHiuaMnsII6Caq6tXsXyNfoDPGYvy5vCIyzZXPTvvH4i7rPr0QYb6VjAa v89mGE+Nx/eiC9FFEzPCXU5tgA6AMiMzoqXodRoSUSl7Lm+iGm4pPhUX4EIia1NG 6Jc1A4cOQTjM8mptJKPIEbAovHsVMRUext5pinQYtMB58R16ZQfpdzQY1ojJUllk u2nWPyGswifMfpJ7daMhhYx/z6yQNy/MOqEPG8dNyc96R6OJoFGe+Wp41/9oP4Ly OjnPSFygKA== =q5rs -----END PGP SIGNATURE----- Merge tag 'block-5.6-20200320' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Just two NVMe fabrics fixes that should go into 5.6" * tag 'block-5.6-20200320' of git://git.kernel.dk/linux-block: nvmet-tcp: set MSG_MORE only if we actually have more to send nvme-rdma: Avoid double freeing of async event data	2020-03-21 12:08:26 -07:00
Linus Torvalds	1ab7ea1f83	io_uring-5.6-20200320 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl51dbQQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpiV3EADJHB2r2hTTEym5u1PbrEEVkjvdL6InU8lD lFM7m2g6yZUncwm+aSZynHqAFY6Rd5Jk+gmYMuioi3ZxC2rs7jG1AOTpaeJYmhle lzkjqSLtl+gdPMA9ydivk1UwILFjtZKG1JNc++tnCn3q7+eCkgnWAlq5b7idG2eF BS0AEZP6Yz1zStTHLbHSB0StY8ovMIw0VaVQvguHLL9EBpbHmrs0cq3tipWkAyPR 2YwnXbxsJySukkwmBKxEWrGUYDze56jqJIqdFsOE0+WtGV+nk7OScPseXAaP4/+G Vl23VNfryuZcsBUwI9tY1SzCFEXIwdXVGpCAYwQ/kU5WfvFpYaei+fXVNnL4kjR0 PfpA6XnMsZ3DzqgepmUd92sAA56ZtBxuGjqcSYlg/JwjvUHdpaZDkE2WLqkAMeUN 8A7cUw+R6XWQ2/y6ob7QvKiT/ZDR8GrYUl3EdGE3LhB1ZsvLXJDZpWipwQBzuk9R vJJOkGst38rjsWnb+nfeLh3AsgjF14wo+2vQL4mKs24xKTIvadHsFAZjKLXZ93Wf Vn58FaPOYIkjBidYLWb3dlO1ZR8S0803gohLkLV6adH8bCNCWxGTOR51DZLomAsb nAUCEAJaZrOqaQAuJAFNNpS8+/da3AIF4HVd2EdZ1yFXU15y0+zIxtROjKzg+OxO M3jC/Aet1Q== =IMcu -----END PGP SIGNATURE----- Merge tag 'io_uring-5.6-20200320' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "Two different fixes in here: - Fix for a potential NULL pointer deref for links with async or drain marked (Pavel) - Fix for not properly checking RLIMIT_NOFILE for async punted operations. This affects openat/openat2, which were added this cycle, and accept4. I did a full audit of other cases where we might check current->signal->rlim[] and found only RLIMIT_FSIZE for buffered writes and fallocate. That one is fixed and queued for 5.7 and marked stable" * tag 'io_uring-5.6-20200320' of git://git.kernel.dk/linux-block: io_uring: make sure accept honor rlimit nofile io_uring: make sure openat/openat2 honor rlimit nofile io_uring: NULL-deref for IOSQE_{ASYNC,DRAIN}	2020-03-21 11:54:47 -07:00
Linus Torvalds	6c1bae744d	Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: "Update to turbostat v20.03.20. These patches unlock the full turbostat features for some new machines, plus a couple other minor tweaks" * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: update version tools/power turbostat: Print cpuidle information tools/power turbostat: Fix 32-bit capabilities warning tools/power turbostat: Fix missing SYS_LPI counter on some Chromebooks tools/power turbostat: Support Elkhart Lake tools/power turbostat: Support Jasper Lake tools/power turbostat: Support Ice Lake server tools/power turbostat: Support Tiger Lake tools/power turbostat: Fix gcc build warnings tools/power turbostat: Support Cometlake	2020-03-21 11:50:36 -07:00
Linus Torvalds	c63c50fc2e	powerpc fixes for 5.6 #5 Two fixes for bugs introduced this cycle. One for a crash when shutting down a KVM PR guest (our original style of KVM which doesn't use hypervisor mode). And one fix for the recently added 32-bit KASAN_VMALLOC support. Thanks to: Christophe Leroy, Greg Kurz, Sean Christopherson. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl51/+oTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLuRD/0fZ4kTt+dyEMt7udP8ocBBNy4uRsNr mC6nxxSIQ9XhsD/oJJ+wey8c/+/FUYpqAEg/IlWIZg4xeBks7imESIgbA9JSQFp5 pAObxc2AiZ97Fr8OuNrGcsakl2nQwX2tge8pNWJWN6Ukh+CsdbN9evpxFDe2u9h2 acpWFO/KBCm6CD1fd9y94pBQTKRCNGXpMX32SO1dn8UB+ApesUS3YWs209sjETpw koKY7ZE9rd2YVNuaQp5fO/C/jdBIh+dzHbmeooNo4BFJAys6RNrxEbWqjwoy8es7 OhwmtDn3aWo6HvB5SvWKKvSFOvIyXzcnAV6spCkyk0vhTckEY8tEvjgKAhTNP9nA Gu7VHx/uEkN+GIW76SyK7XAEUgKwMqfRNbWk8YpI1dNj7zW8Z/3kh/PhKja7lnj6 C77i6FXu0xkOd1+ZL73JNof98EfdrVl2mZGDvv3bCY+ME9zkYjMUj6uKzZ5+OJvq ALJXMvegQM1Ma8mx3Ah5HwcPqUmIhNNHhCFXtzP9Gcrg4dNNGOCzGbyomcAA6gDb rzzhe1ErI5HPZ524cMZFHjf0d04X9AOeVk2TyyL0tWgl/gNVaYSHKdaECc2xy3/Q Pu8T8CvGBhR2uHb74DLxZPtOUhQYdMU7lvJCq9IKLfRWi5E/rsDAVoVJu46vG3QP cIsl2hgW/7eW0g== =JOM7 -----END PGP SIGNATURE----- Merge tag 'powerpc-5.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Two fixes for bugs introduced this cycle: - fix a crash when shutting down a KVM PR guest (our original style of KVM which doesn't use hypervisor mode) - fix for the recently added 32-bit KASAN_VMALLOC support Thanks to: Christophe Leroy, Greg Kurz, Sean Christopherson" * tag 'powerpc-5.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: KVM: PPC: Fix kernel crash with PR KVM powerpc/kasan: Fix shadow memory protection with CONFIG_KASAN_VMALLOC	2020-03-21 08:51:45 -07:00
Len Brown	b95fffb9b4	tools/power turbostat: update version A stitch in time saves nine. Signed-off-by: Len Brown <len.brown@intel.com>	2020-03-21 00:48:02 -04:00
Len Brown	abdcbdb265	tools/power turbostat: Print cpuidle information Print cpuidle driver and governor. Originally-by: Antti Laakso <antti.laakso@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2020-03-21 00:47:47 -04:00
Jens Axboe	83166ac82b	Merge branch 'nvme-5.6-rc6' of git://git.infradead.org/nvme into block-5.6 Pull NVMe fixes from Keith: "Two late nvme fabrics fixes for 5.6: a double free with the rdma transport, and a regression fix for tcp; please pull." * 'nvme-5.6-rc6' of git://git.infradead.org/nvme: nvmet-tcp: set MSG_MORE only if we actually have more to send nvme-rdma: Avoid double freeing of async event data	2020-03-20 19:14:16 -06:00
Filipe Manana	d8e6fd5c79	btrfs: fix removal of raid[56\|1c34} incompat flags after removing block group We are incorrectly dropping the raid56 and raid1c34 incompat flags when there are still raid56 and raid1c34 block groups, not when we do not any of those anymore. The logic just got unintentionally broken after adding the support for the raid1c34 modes. Fix this by clear the flags only if we do not have block groups with the respective profiles. Fixes: `9c907446dc` ("btrfs: drop incompat bit for raid1c34 after last block group is gone") Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-20 21:31:32 +01:00
Sagi Grimberg	98fd5c7237	nvmet-tcp: set MSG_MORE only if we actually have more to send When we send PDU data, we want to optimize the tcp stack operation if we have more data to send. So when we set MSG_MORE when: - We have more fragments coming in the batch, or - We have a more data to send in this PDU - We don't have a data digest trailer - We optimize with the SUCCESS flag and omit the NVMe completion (used if sq_head pointer update is disabled) This addresses a regression in QD=1 with SUCCESS flag optimization as we unconditionally set MSG_MORE when we didn't actually have more data to send. Fixes: `7058329538` ("nvmet-tcp: implement C2HData SUCCESS optimization") Reported-by: Mark Wunderlich <mark.wunderlich@intel.com> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>	2020-03-21 04:37:53 +09:00
Linus Torvalds	5ad0ec0b86	arm64 fixes for -rc7 - Fix panic() when it occurs during secondary CPU startup - Fix "kpti=off" when KASLR is enabled - Fix howler in compat syscall table for vDSO clock_getres() fallback -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl5zxtcQHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNG/BB/9BVSLbqBdm6Op14J9zi3S8Qs7udcbo6dAr vBkBvIl6JK4e284DSoPdCQoXp4QgExm6QEYzl2EjBYMqKCmCzng4w14ctm9FnCry W8LNKRBaKyml7nDdT2UH1PnKB+Nh6ufv1PZQttN2e664bUl28pqC7MgJ3meJAjj8 a+lVRxIOVFKD5AwV1jfbS1Byx/w8n9Lo/C4wbswFrbHdq6puTuEZbtJiYbkxfqa3 wMXwNeIj1Xh2yVgz2gC02QLuTtLqJlPelhGHYec1hTQkmaSeNy0WvQr6t3oc6c5T Bngzv7dM5lwXxjT82AqQhSpBUAp+MjYxnWW+hRpy+2BIEnbgGnDR =w1Z9 -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Fix panic() when it occurs during secondary CPU startup - Fix "kpti=off" when KASLR is enabled - Fix howler in compat syscall table for vDSO clock_getres() fallback * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: compat: Fix syscall number of compat_clock_getres arm64: kpti: Fix "kpti=off" when KASLR is enabled arm64: smp: fix crash_smp_send_stop() behaviour arm64: smp: fix smp_send_stop() behaviour	2020-03-20 09:28:25 -07:00
Linus Torvalds	f014d2b858	Char/misc driver fixes for 5.6-rc7 Here are some small different driver fixes for 5.6-rc7 - binderfs fix, yet again - slimbus new device id added - hwtracing bugfixes for reported issues and a new device id All of these have been in linux-next with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXnTQmw8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ymASwCeLhlhgP/F9W7WqG1hWzJ0Dq8mkEUAn3W7GZs6 0FrrI5ksBmjd0kTX5dbt =vMc6 -----END PGP SIGNATURE----- Merge tag 'char-misc-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small different driver fixes for 5.6-rc7: - binderfs fix, yet again - slimbus new device id added - hwtracing bugfixes for reported issues and a new device id All of these have been in linux-next with no reported issues" * tag 'char-misc-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: intel_th: pci: Add Elkhart Lake CPU support intel_th: Fix user-visible error codes intel_th: msu: Fix the unexpected state warning stm class: sys-t: Fix the use of time_after() slimbus: ngd: add v2.1.0 compatible binderfs: use refcount for binder control devices too	2020-03-20 09:24:22 -07:00
Linus Torvalds	3bd14829d3	Staging/IIO fixes for 5.6-rc7 Here are a number of small staging and IIO driver fixes for 5.6-rc7 Nothing major here, just resolutions for some reported problems: - iio bugfixes for a number of different drivers - greybus loopback_test fixes - wfx driver fixes All of these have been in linux-next with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXnTRmQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykkCwCfS8zLibsZtuv655yAK17C/6YghPQAoJ3iSew+ EkzEDWQ9j2NLE+lpWtKQ =fZ2X -----END PGP SIGNATURE----- Merge tag 'staging-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging/IIO fixes from Greg KH: "Here are a number of small staging and IIO driver fixes for 5.6-rc7 Nothing major here, just resolutions for some reported problems: - iio bugfixes for a number of different drivers - greybus loopback_test fixes - wfx driver fixes All of these have been in linux-next with no reported issues" * tag 'staging-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: rtl8188eu: Add device id for MERCUSYS MW150US v2 staging: greybus: loopback_test: fix potential path truncations staging: greybus: loopback_test: fix potential path truncation staging: greybus: loopback_test: fix poll-mask build breakage staging: wfx: fix RCU usage between hif_join() and ieee80211_bss_get_ie() staging: wfx: fix RCU usage in wfx_join_finalize() staging: wfx: make warning about pending frame less scary staging: wfx: fix lines ending with a comma instead of a semicolon staging: wfx: fix warning about freeing in-use mutex during device unregister staging/speakup: fix get_word non-space look-ahead iio: ping: set pa_laser_ping_cfg in of_ping_match iio: chemical: sps30: fix missing triggered buffer dependency iio: st_sensors: remap SMO8840 to LIS2DH12 iio: light: vcnl4000: update sampling periods for vcnl4040 iio: light: vcnl4000: update sampling periods for vcnl4200 iio: accel: adxl372: Set iio_chan BE iio: magnetometer: ak8974: Fix negative raw values in sysfs iio: trigger: stm32-timer: disable master mode when stopping iio: adc: stm32-dfsdm: fix sleep in atomic context iio: adc: at91-sama5d2_adc: fix differential channels in triggered mode	2020-03-20 09:20:38 -07:00
Linus Torvalds	b07c2e76c4	USB fixes for 5.6-rc7 Here are some small USB fixes for 5.6-rc7. And there's a thunderbolt driver fix thrown in for good measure as well. These fixes are: - new device ids for usb-serial drivers - thunderbolt error code fix - xhci driver fixes - typec fixes - cdc-acm driver fixes - chipidea driver fix - more USB quirks added for devices that need them. All of these have been in linux-next with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iGwEABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXnTSfQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+yni+QCVEoBjFNoMaMgtQJ5Ogr0dYOLILwCeLRP7Ce0K GTeo+M+qrTz4DN+Ejqo= =xWvw -----END PGP SIGNATURE----- Merge tag 'usb-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB fixes for 5.6-rc7. And there's a thunderbolt driver fix thrown in for good measure as well. These fixes are: - new device ids for usb-serial drivers - thunderbolt error code fix - xhci driver fixes - typec fixes - cdc-acm driver fixes - chipidea driver fix - more USB quirks added for devices that need them. All of these have been in linux-next with no reported issues" * tag 'usb-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: cdc-acm: fix rounding error in TIOCSSERIAL USB: cdc-acm: fix close_delay and closing_wait units in TIOCSSERIAL usb: quirks: add NO_LPM quirk for RTL8153 based ethernet adapters usb: chipidea: udc: fix sleeping function called from invalid context USB: serial: pl2303: add device-id for HP LD381 USB: serial: option: add ME910G1 ECM composition 0x110b usb: host: xhci-plat: add a shutdown usb: typec: ucsi: displayport: Fix a potential race during registration usb: typec: ucsi: displayport: Fix NULL pointer dereference USB: Disable LPM on WD19's Realtek Hub usb: xhci: apply XHCI_SUSPEND_DELAY to AMD XHCI controller 1022:145c xhci: Do not open code __print_symbolic() in xhci trace events thunderbolt: Fix error code in tb_port_is_width_supported()	2020-03-20 09:16:35 -07:00
Linus Torvalds	fa91418b72	TTY fixes for 5.6-rc7 Here are 3 small tty_io bugfixes for reported issues that Eric has resolved for 5.6-rc7 All of these have been in linux-next with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXnTR9A8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ylsYACfZM/J2PW6TXpZfCv01feD3Rx/prkAnj+2DCI7 FeV2J/iXPTulT2eItbsJ =xkaZ -----END PGP SIGNATURE----- Merge tag 'tty-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty fixes from Greg KH: "Here are three small tty_io bugfixes for reported issues that Eric has resolved for 5.6-rc7 All of these have been in linux-next with no reported issues" * tag 'tty-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty: fix compat TIOCGSERIAL checking wrong function ptr tty: fix compat TIOCGSERIAL leaking uninitialized memory tty: drop outdated comments about release_tty() locking	2020-03-20 09:13:35 -07:00
Linus Torvalds	12bf19c926	sound fixes for 5.6-rc7 A few fixes covering the issues reported by syzkaller, a couple of fixes for the MIDI decoding bug, and a few usual HD-audio quirks. Some of them are about ALSA core stuff, but they are small fixes just for corner cases, and nothing thrilling. -----BEGIN PGP SIGNATURE----- iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAl50gKMOHHRpd2FpQHN1 c2UuZGUACgkQLtJE4w1nLE9+vRAAlPwrUSIWAmOtTPTozX9kqJmTspC5FJqXEZZK 2XW3eaaRaYePiRRt2l7cqCiZUDySmZCJF6VivAt3Hx1n4nDRubs+p6JzHtJ+Swqk vRVZaM7GI5uUcrbVGSGteVGzXsjodaD+p811AdM3hzgzxooa5LOCkKOXGxWUXP05 tsURjTPlRXvmYzm8vlINeQKE9M9aKHt1USthVxRg9iuyXxugFfZVWn4xMCmF4lPH +Z5B+DGbJfp8FUdYBFjMTew67mm6a760kOwm86bBcCMDbafiVtWb181ZCsnST4YG 4IwRcIr+8XnJhufeOX3tkRJu8+TJvzTVUoJ8d5kr/fNpSBFuM+rB0GcwSANm5k6d cQKTpkKutDt7F22kk1rRkBqiD+JeCZgcOq+5rM0OcDtgT97rcvb4Ddx/ckppIpeu w3HSsfuq+tzoJp0EF56ruFUGEnlJcNFAM0wJqiAvnUQQDfowEU3qGb5WiU2Jjgs1 LarcAoTKi3Wz/Sa6eIt+D139YqvgvZ5oEmDMdU9yLWeFvQNLlR5UqqXTGgVwGndm GL2nanMtK3k/rWaqZHDvmtLRtj/cq5so+7/VGCrAYfp9sFYZt9y4xfmzwRkNveE/ 0cDFdE1s1v3dwoZmq+/HY5gIAwn8vYATNlXG+dDDGsdwAGhtE+XoIn+WOFGZh5Dz ClyqYbI= =ejlI -----END PGP SIGNATURE----- Merge tag 'sound-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A few fixes covering the issues reported by syzkaller, a couple of fixes for the MIDI decoding bug, and a few usual HD-audio quirks. Some of them are about ALSA core stuff, but they are small fixes just for corner cases, and nothing thrilling" * tag 'sound-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek - Enable the headset of Acer N50-600 with ALC662 ALSA: hda/realtek - Enable headset mic of Acer X2660G with ALC662 ALSA: seq: oss: Fix running status after receiving sysex ALSA: seq: virmidi: Fix running status after receiving sysex ALSA: pcm: oss: Remove WARNING from snd_pcm_plug_alloc() checks ALSA: hda/realtek: Fix pop noise on ALC225 ALSA: line6: Fix endless MIDI read loop ALSA: pcm: oss: Avoid plugin buffer overflow	2020-03-20 09:10:29 -07:00
Linus Torvalds	69d3e5a5a6	drm fixes for 5.6-rc7 core: - fix lease warning i915: - Track active elements during dequeue - Fix failure to handle all MCR ranges - Revert unnecessary workaround amdgpu: - Pageflip fix - VCN clockgating fixes - GPR debugfs fix for umr - GPU reset fix - eDP fix for MBP - DCN2.x fix dw-hdmi: - fix AVI frame colorimetry komeda: - fix compiler warning bochs: - downgrade a binding failure to a warning -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJedDFEAAoJEAx081l5xIa+Z8QP/iWnU8skiMi3SPnO/ki/2y1D ryJMMnnS7tQBW8hJ7UGCakimWq9VBuNEzOM7xYeTkKJ2vu2oT7gJkhlpscnPkPrn zJgaWZh31PbtnusP+8PdQiG1N3EN/FKnzx9P3WMjDrG85CNm745GRrU8VMUsPCvf W57QdX1LfQeajkNeBdqpYNDhP5Tu00AMeqq3KuR7je14juui8EOzSKGADkHpKqSw 9UqnPMRrum9CHbSBGAQ3qQcAYA4vWrSxsHQdbwd4jN4+pxDZbtZscvxn28VKRd7k TR6YnLaXAciDRxe3gvfbEbE1zn72Lrt6R97d5ojV+VDV9sdrkN9eDYwhoYXRRIA0 sQFg29WwBRHMeI+3ucsEDdDKx5hLR+fxD6JcGDv8dZeHzTCWZ7PFFG9YtZVUmkQq bMXWlvNzm7fQIA9DwfQt0nkDqt/Re6qbqRnd/ayOz/IzthMmIyPfIzA5d5u/b354 GZVOLaD9iimCIWMD3qkwfGgZsn0Ss+jewBQ6navKaY15ADDM0OEF9Oa/RBi97DUC QGcpHYe6eNaMByef7/GVyDfZU1GmTAa8PepAe9x9ItRQxCICFl9fb19iyXu25ZCx wpSNFgHKfxMigjuLe3czW3If1SbctVMVl2BDeXh5TMFXUGKXHQZa9uf8AhtuHjj4 F6D1JOZQPBfuoY1xukv+ =odvv -----END PGP SIGNATURE----- Merge tag 'drm-fixes-2020-03-20' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Dave Airlie: "Hope you are well hiding out above the garage. A few amdgpu changes but nothing too major. I've had a wisdom tooth out this week so haven't been to on top of things, but all seems good. core: - fix lease warning i915: - Track active elements during dequeue - Fix failure to handle all MCR ranges - Revert unnecessary workaround amdgpu: - Pageflip fix - VCN clockgating fixes - GPR debugfs fix for umr - GPU reset fix - eDP fix for MBP - DCN2.x fix dw-hdmi: - fix AVI frame colorimetry komeda: - fix compiler warning bochs: - downgrade a binding failure to a warning" * tag 'drm-fixes-2020-03-20' of git://anongit.freedesktop.org/drm/drm: drm/amd/display: Fix pageflip event race condition for DCN. drm/amdgpu: fix typo for vcn2.5/jpeg2.5 idle check drm/amdgpu: fix typo for vcn2/jpeg2 idle check drm/amdgpu: fix typo for vcn1 idle check drm/lease: fix WARNING in idr_destroy drm/i915: Handle all MCR ranges Revert "drm/i915/tgl: Add extra hdc flush workaround" drm/i915/execlists: Track active elements during dequeue drm/bochs: downgrade pci_request_region failure from error to warning drm/amd/display: Add link_rate quirk for Apple 15" MBP 2017 drm/amdgpu: add fbdev suspend/resume on gpu reset drm/amd/amdgpu: Fix GPR read from debugfs (v2) drm/amd/display: fix typos for dcn20_funcs and dcn21_funcs struct drm/komeda: mark PM functions as __maybe_unused drm/bridge: dw-hdmi: fix AVI frame colorimetry	2020-03-20 09:03:54 -07:00
Jens Axboe	09952e3e78	io_uring: make sure accept honor rlimit nofile Just like commit `4022e7af86`, this fixes the fact that IORING_OP_ACCEPT ends up using get_unused_fd_flags(), which checks current->signal->rlim[] for limits. Add an extra argument to __sys_accept4_file() that allows us to pass in the proper nofile limit, and grab it at request prep time. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-20 08:48:36 -06:00
Jens Axboe	4022e7af86	io_uring: make sure openat/openat2 honor rlimit nofile Dmitry reports that a test case shows that io_uring isn't honoring a modified rlimit nofile setting. get_unused_fd_flags() checks the task signal->rlimi[] for the limits. As this isn't easily inheritable, provide a __get_unused_fd_flags() that takes the value instead. Then we can grab it when the request is prepared (from the original task), and pass that in when we do the async part part of the open. Reported-by: Dmitry Kadashev <dkadashev@gmail.com> Tested-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-20 08:47:27 -06:00
Len Brown	fcaa681c03	tools/power turbostat: Fix 32-bit capabilities warning warning: `turbostat' uses 32-bit capabilities (legacy support in use) Signed-off-by: Len Brown <len.brown@intel.com>	2020-03-20 00:32:28 -04:00
Len Brown	1f81c5efc0	tools/power turbostat: Fix missing SYS_LPI counter on some Chromebooks Some Chromebook BIOS' do not export an ACPI LPIT, which is how Linux finds the residency counter for CPU and SYSTEM low power states, that is exports in /sys/devices/system/cpu/cpuidle/*residency_us When these sysfs attributes are missing, check the debugfs attrubte from the pmc_core driver, which accesses the same counter value. Signed-off-by: Len Brown <len.brown@intel.com>	2020-03-20 00:32:28 -04:00
Chen Yu	f670840070	tools/power turbostat: Support Elkhart Lake From a turbostat point of view the Tremont-based Elkhart Lake is very similar to Goldmont, reuse the code of Goldmont. Elkhart Lake does not support 'group turbo limit counter' nor C3, adjust the code accordingly. Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2020-03-20 00:32:27 -04:00
Chen Yu	d7814c3098	tools/power turbostat: Support Jasper Lake Jasper Lake, like Elkhart Lake, uses a Tremont CPU. So reuse the code. Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2020-03-20 00:32:27 -04:00
Chen Yu	23274faf96	tools/power turbostat: Support Ice Lake server From a turbostat point of view, Ice Lake server looks like Sky Lake server. Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2020-03-20 00:32:27 -04:00
Chen Yu	4bf7132a0a	tools/power turbostat: Support Tiger Lake From a turbostat point of view, Tiger Lake looks like Ice Lake. Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2020-03-20 00:32:27 -04:00
Len Brown	d8d005ba6a	tools/power turbostat: Fix gcc build warnings Warning: ‘__builtin_strncpy’ specified bound 20 equals destination size [-Wstringop-truncation] reduce param to strncpy, to guarantee that a null byte is always copied into destination buffer. Signed-off-by: Len Brown <len.brown@intel.com>	2020-03-20 00:32:27 -04:00
Chen Yu	081c54323b	tools/power turbostat: Support Cometlake From a turbostat point of view, Cometlake is like Kabylake. Suggested-by: Rui Zhang <rui.zhang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2020-03-20 00:32:27 -04:00
Dave Airlie	5366b96b19	drm/i915 fixes for v5.6-rc7: - Track active elements during dequeue - Fix failure to handle all MCR ranges - Revert unnecessary workaround -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEFWWmW3ewYy4RJOWc05gHnSar7m8FAl5zJl4ACgkQ05gHnSar 7m+wJhAAreffDNdNJQ7aNspMJiysrSaknCGQGdFtgih0qb3f9t/A7VccpHBNN2GB d5CKrBE72PWQT6r4PYux56Fg1mj0wq8ajkvyqvVImkG2NWGWN6hl2Vo7vwGBa35Z INSCSKSMOfQfWzJGhl5SAp8ieF12gFTB251THFQ/4lt0DlMlYDfq48UxXyr7TdXs 1Cmrgch4zDzntqMXvCrSkX3WjoPrWXqpHJFIO3AuQ6+EA14Zb4TVB7VetbpKqm26 4TUtKPjDYaNQ2JHE9KW82vq4P1Hw55fuP0P4umL4l2M5NXkbGkWGCQiOaoaKfzM9 u6JRhSGtg5ho4Hj05L2ai7082tH2ME1/o4nC2Wi2DHR8FW6QnMl3XC6oSmK3lpDx 3P72D1jxW15lFpmarT/BTP9C4Puk8iyDtdkf6XRtFy2SkfEtfG8qBlBuXEcK8sz7 LEJbJDbBr1CAWSqVkQfno1hQWdRY4847OeksVgLgAufI4bhPNxQNrZvENv1IdOwx 78tf+y0L1dlArvYuCzNjTNotedsJcDBCgqIeMsBUYvlw7mJuWnWScU5eK7YRFviK AxZ0ZLxCyj9mBZp1S9Y20awjUTjjR6O1TMVmpFWYoK7UZm97CfoIAkV5Fl1Bm1CT epssLTY7Bbp7NdGY5bcEMRmslJj94Wm13VCtL+h7jY6rjE8h7IU= =ELTv -----END PGP SIGNATURE----- Merge tag 'drm-intel-fixes-2020-03-19' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.6-rc7: - Track active elements during dequeue - Fix failure to handle all MCR ranges - Revert unnecessary workaround Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/877dzgepvu.fsf@intel.com	2020-03-20 12:52:35 +10:00
Dave Airlie	362b86a3d3	Merge tag 'amd-drm-fixes-5.6-2020-03-19' of git://people.freedesktop.org/~agd5f/linux into drm-fixes amd-drm-fixes-5.6-2020-03-19: amdgpu: - Pageflip fix - VCN clockgating fixes - GPR debugfs fix for umr - GPU reset fix - eDP fix for MBP - DCN2.x fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200319204054.1036478-1-alexander.deucher@amd.com	2020-03-20 12:48:17 +10:00
Greg Kurz	1d0c32ec3b	KVM: PPC: Fix kernel crash with PR KVM With PR KVM, shutting down a VM causes the host kernel to crash: [ 314.219284] BUG: Unable to handle kernel data access on read at 0xc00800000176c638 [ 314.219299] Faulting instruction address: 0xc008000000d4ddb0 cpu 0x0: Vector: 300 (Data Access) at [c00000036da077a0] pc: c008000000d4ddb0: kvmppc_mmu_pte_flush_all+0x68/0xd0 [kvm_pr] lr: c008000000d4dd94: kvmppc_mmu_pte_flush_all+0x4c/0xd0 [kvm_pr] sp: c00000036da07a30 msr: 900000010280b033 dar: c00800000176c638 dsisr: 40000000 current = 0xc00000036d4c0000 paca = 0xc000000001a00000 irqmask: 0x03 irq_happened: 0x01 pid = 1992, comm = qemu-system-ppc Linux version 5.6.0-master-gku+ (greg@palmb) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #17 SMP Wed Mar 18 13:49:29 CET 2020 enter ? for help [c00000036da07ab0] c008000000d4fbe0 kvmppc_mmu_destroy_pr+0x28/0x60 [kvm_pr] [c00000036da07ae0] c0080000009eab8c kvmppc_mmu_destroy+0x34/0x50 [kvm] [c00000036da07b00] c0080000009e50c0 kvm_arch_vcpu_destroy+0x108/0x140 [kvm] [c00000036da07b30] c0080000009d1b50 kvm_vcpu_destroy+0x28/0x80 [kvm] [c00000036da07b60] c0080000009e4434 kvm_arch_destroy_vm+0xbc/0x190 [kvm] [c00000036da07ba0] c0080000009d9c2c kvm_put_kvm+0x1d4/0x3f0 [kvm] [c00000036da07c00] c0080000009da760 kvm_vm_release+0x38/0x60 [kvm] [c00000036da07c30] c000000000420be0 __fput+0xe0/0x310 [c00000036da07c90] c0000000001747a0 task_work_run+0x150/0x1c0 [c00000036da07cf0] c00000000014896c do_exit+0x44c/0xd00 [c00000036da07dc0] c0000000001492f4 do_group_exit+0x64/0xd0 [c00000036da07e00] c000000000149384 sys_exit_group+0x24/0x30 [c00000036da07e20] c00000000000b9d0 system_call+0x5c/0x68 This is caused by a use-after-free in kvmppc_mmu_pte_flush_all() which dereferences vcpu->arch.book3s which was previously freed by kvmppc_core_vcpu_free_pr(). This happens because kvmppc_mmu_destroy() is called after kvmppc_core_vcpu_free() since commit `ff030fdf55` ("KVM: PPC: Move kvm_vcpu_init() invocation to common code"). The kvmppc_mmu_destroy() helper calls one of the following depending on the KVM backend: - kvmppc_mmu_destroy_hv() which does nothing (Book3s HV) - kvmppc_mmu_destroy_pr() which undoes the effects of kvmppc_mmu_init() (Book3s PR 32-bit) - kvmppc_mmu_destroy_pr() which undoes the effects of kvmppc_mmu_init() (Book3s PR 64-bit) - kvmppc_mmu_destroy_e500() which does nothing (BookE e500/e500mc) It turns out that this is only relevant to PR KVM actually. And both 32 and 64 backends need vcpu->arch.book3s to be valid when calling kvmppc_mmu_destroy_pr(). So instead of calling kvmppc_mmu_destroy() from kvm_arch_vcpu_destroy(), call kvmppc_mmu_destroy_pr() at the beginning of kvmppc_core_vcpu_free_pr(). This is consistent with kvmppc_mmu_init() being the last call in kvmppc_core_vcpu_create_pr(). For the same reason, if kvmppc_core_vcpu_create_pr() returns an error then this means that kvmppc_mmu_init() was either not called or failed, in which case kvmppc_mmu_destroy() should not be called. Drop the line in the error path of kvm_arch_vcpu_create(). Fixes: `ff030fdf55` ("KVM: PPC: Move kvm_vcpu_init() invocation to common code") Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/158455341029.178873.15248663726399374882.stgit@bahia.lan	2020-03-20 13:39:10 +11:00
Mario Kleiner	eb916a5a93	drm/amd/display: Fix pageflip event race condition for DCN. Commit '16f17eda8bad ("drm/amd/display: Send vblank and user events at vsartup for DCN")' introduces a new way of pageflip completion handling for DCN, and some trouble. The current implementation introduces a race condition, which can cause pageflip completion events to be sent out one vblank too early, thereby confusing userspace and causing flicker: prepare_flip_isr(): 1. Pageflip programming takes the ddev->event_lock. 2. Sets acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED 3. Releases ddev->event_lock. --> Deadline for surface address regs double-buffering passes on target pipe. 4. dc_commit_updates_for_stream() MMIO programs the new pageflip into hw, but too late for current vblank. => pflip_status == AMDGPU_FLIP_SUBMITTED, but flip won't complete in current vblank due to missing the double-buffering deadline by a tiny bit. 5. VSTARTUP trigger point in vblank is reached, VSTARTUP irq fires, dm_dcn_crtc_high_irq() gets called. 6. Detects pflip_status == AMDGPU_FLIP_SUBMITTED and assumes the pageflip has been completed/will complete in this vblank and sends out pageflip completion event to userspace and resets pflip_status = AMDGPU_FLIP_NONE. => Flip completion event sent out one vblank too early. This behaviour has been observed during my testing with measurement hardware a couple of time. The commit message says that the extra flip event code was added to dm_dcn_crtc_high_irq() to prevent missing to send out pageflip events in case the pflip irq doesn't fire, because the "DCH HUBP" component is clock gated and doesn't fire pflip irqs in that state. Also that this clock gating may happen if no planes are active. This suggests that the problem addressed by that commit can't happen if planes are active. The proposed solution is therefore to only execute the extra pflip completion code iff the count of active planes is zero and otherwise leave pflip completion handling to the pflip irq handler, for a more race-free experience. Note that i don't know if this fixes the problem the original commit tried to address, as i don't know what the test scenario was. It does fix the observed too early pageflip events though and points out the problem introduced. Fixes: `16f17eda8b` ("drm/amd/display: Send vblank and user events at vsartup for DCN") Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-03-19 16:18:45 -04:00

1 2 3 4 5 ...

902312 Commits